Generating Random Sentences with Word2Vec or Doc2Vec

  • Share this:

Code introduction


This function uses Word2Vec or Doc2Vec models from the gensim library to generate a sentence with random words. The user can specify the model type (dbow or doc2vec) and the number of words to generate.


Technology Stack : gensim

Code Type : Text generation

Code Difficulty : Intermediate


                
                    
def random_sentence_model(text, model_type='dbow', num_words=100):
    from gensim.models import Word2Vec
    from gensim.models.doc2vec import Doc2Vec
    import gensim

    # Split the text into sentences
    sentences = gensim.utils.simple_preprocess(text)

    # Create a Word2Vec model
    if model_type == 'dbow':
        model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)
        # Generate random sentence using the DBOW model
        random_sentence = model.wv.most_similar(positive=['the', 'sentence', 'we', 'want'], topn=num_words)
    elif model_type == 'doc2vec':
        model = Doc2Vec(sentences, vector_size=100, window=5, min_count=1, epochs=40)
        # Generate random sentence using the Doc2Vec model
        random_sentence = model.infer_vector(['the', 'sentence', 'we', 'want']).most_similar(positive=['the', 'sentence', 'we', 'want'], topn=num_words)
    else:
        raise ValueError("Unsupported model type. Use 'dbow' or 'doc2vec'.")

    return ' '.join(random_sentence[0][0] for _ in range(num_words))                
              
Tags: