64 GB RAM Devoured: The Shocking Truth About Loading Parquet Files

A data scientist working with Polars on an EC2 machine with 64 GB RAM and 8 vCP encountered an unexpected issue when loading Parquet files. Despite the columnar nature of Parquet, which should theoretically allow for selective column loading, the entire 64 GB of RAM was consumed when attempting to load just three columns totaling 600 MB from a 240 GB dataset. This incident raises questions about how Polars manages memory when reading Parquet files, especially since the user expected only the required columns to be loaded into memory. The lack of clear documentation on this process has left the user seeking explanations or references to understand the lifecycle of loading Parquet files in Polars.

Source: stackoverflow.com