Skip to content

New Plugin Calculates Correlation Matrix 10x Faster Than Pandas!

A new Polars plugin has been developed to calculate the symmetric correlation matrix, matching the functionality of pd.DataFrame.corr(). The plugin supports Pearson, Spearman, and Kendall correlation methods. It’s written in Rust and Python, with the Pearson method being significantly faster than pandas, while the Spearman method lags due to its rank calculation. The plugin allows users to specify a minimum number of periods for correlation calculation, with a default of 1. It also offers options for rank methods in Spearman correlation, including average, min, max, dense, and ordinal. The plugin can handle both eager and lazy data frames, casting all numeric columns to Float32 for computation. However, the current implementation on the Python side involves redundant calculations when assembling the correlation matrix. The developer seeks community feedback on optimizing the Spearman method and reducing unnecessary computations.

Source: stackoverflow.com

Related Links

Related Videos