Generating and Filtering Random Data with Luigi

  • Share this:

Code introduction


The code defines a function named xxx that takes two parameters: the number of rows to generate and an instance of the Luigi task. The function first checks the parameter types, then runs the task to generate data, reads the generated data, filters the data based on age, and returns the filtered data.


Technology Stack : The code uses the following packages and technologies: Luigi, pandas, and datetime.

Code Type : The type of code

Code Difficulty :


                
                    
import random
import luigi
import pandas as pd
from datetime import datetime

class GenerateRandomData(luigi.Task):
    def output(self):
        return luigi.LocalTarget('random_data.csv')

    def run(self):
        n = random.randint(10, 100)
        data = pd.DataFrame({
            'Name': [f'Name_{i}' for i in range(n)],
            'Age': [random.randint(18, 60) for _ in range(n)],
            'Date': [datetime.now() for _ in range(n)]
        })
        data.to_csv(self.output().path, index=False)

def xxx(arg1, arg2):
    # arg1 is the number of rows to generate, arg2 is the Luigi Task instance
    if not isinstance(arg1, int) or not isinstance(arg2, GenerateRandomData):
        raise ValueError("arg1 must be an integer and arg2 must be an instance of GenerateRandomData")
    
    task_instance = arg2()
    data_path = task_instance.output().path
    
    # Read the generated data
    data = pd.read_csv(data_path)
    
    # Filter the data based on age
    filtered_data = data[data['Age'] > arg1]
    
    # Return the filtered data
    return filtered_data