Python OOM dataframes
adding Daft

Prehistory
Oops, that’s me, prehistoric coughs
So I’ve lately read about a benchmark of FireDucks - pandas-like lazy blazingly fast dataframe library (yeah, all the buzzwords are there).
Initial reviews are pretty good, claiming it’s really indecently fast:
This ran in 1 second (with
._evaluateadded) on the 55 million row dataset. The classic pandas version took 8 minutes!
However, this GIF had sparked a lot of controversy, even despite Avi Chawla doing a comprehensive overview.

The guys at FD seem to have made a comparison with DuckDB and Polars, too + a dashboard of db-benchmark:

Dundundun DAFTMAN
I’ve recently read about another dataframe library, Daft - and decided to add it using the Colab provided; given it’s rather simple to do for a single comparison.
Behold, results from an infinitetibugged colab:

Or, mean time for DULocation and PULocation, respectively:
| Library | Mean, s |
| Fireducks | 4.099 |
| Daft | 5.424 |
| Polars | 6.155 |
| Library | Mean, s |
| Fireducks | 4.26 |
| Polars | 6.338 |
| Daft | 5.268 |
So:
- Yeah, Fireducks seems the fastest - it’s also not open-source:
By providing the beta version of FireDucks free of charge and enabling data scientists to actually use it, NEC will work to improve its functionality while verifying its effectiveness, with the aim of commercializing it within FY2024.
https://www.nec.com/en/press/202310/global_20231019_01.html
- I’ve been interested by Daft’s promise and focus on providing almost all possible interfaces, data types and engines
So I’ll try doing things with Daft in the nearest future!
Welcome to Teleogenic❣️
Other places I cross-post (not always) to:




