You can download this code by clicking the button below.
This code is now available for download.
This function takes a string of text and a language parameter, randomly selects a lemmatizer based on the language parameter, tokenizes the text, removes stop words, and applies the lemmatizer to reduce each word to its basic form.
Technology Stack : nltk.tokenize.word_tokenize, nltk.corpus.stopwords, nltk.stem.WordNetLemmatizer
Code Type : Text processing function
Code Difficulty : Intermediate
import random
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
def random_lemmatization(text, language='english'):
"""
This function takes a string of text and returns a lemmatized version of the text.
It randomly chooses a lemmatizer based on the language parameter.
"""
lemmatizers = {
'english': WordNetLemmatizer(),
'spanish': WordNetLemmatizer(),
'french': WordNetLemmatizer()
}
lemmatizer = random.choice(list(lemmatizers.values()))
words = word_tokenize(text)
stop_words = set(stopwords.words(language))
lemmatized_words = [lemmatizer.lemmatize(word) for word in words if word.isalpha() and word not in stop_words]
return ' '.join(lemmatized_words)