Top 10 Frequent Words Extractor

  • Share this:

Code introduction


This function takes a string as an input, removes punctuation, counts the frequency of each word, and returns the top 10 most frequent words.


Technology Stack : string, re, collections, heapq

Code Type : Function

Code Difficulty : Intermediate


                
                    
def sorted_words(text):
    from string import punctuation
    import re
    from collections import Counter
    from heapq import nlargest

    # 移除文本中的标点符号
    text = re.sub(rf'[{punctuation}]', '', text)
    # 将文本转换为小写并分割成单词
    words = text.lower().split()
    # 计数单词出现的频率
    word_counts = Counter(words)
    # 获取出现频率最高的10个单词
    most_common_words = nlargest(10, word_counts.items())
    # 返回排序后的单词列表
    return [word for word, count in most_common_words]