Calculating Entropy for Dataset Disorder Measurement

  • Share this:

Code introduction


This function calculates the entropy of the given dataset, which is a measure of the level of disorder in the data set, commonly used in information theory and machine learning.


Technology Stack : collections.Counter, scipy.stats.entropy, math.log2

Code Type : Custom function

Code Difficulty : Intermediate


                
                    
import random
from collections import Counter
from scipy.stats import entropy

def calculate_entropy(data):
    """
    Calculate the entropy of the given data.

    :param data: List of data points
    :return: Entropy value
    """
    # Count the occurrences of each unique value in the data
    value_counts = Counter(data)
    # Calculate the probability of each unique value
    probabilities = [count / len(data) for count in value_counts.values()]
    # Calculate the entropy using the formula -sum(p * log2(p))
    return -sum(p * random.log2(p) for p in probabilities if p > 0)