MDS FEST 3.0

Blazing Fast I/O of Data in the Cloud

Kevin Wang, Founding Engineer, Eventual Computing

ChanChan Mao, Developer Relations, Eventual Computing

I/O from remote storage consistently bottlenecks large-scale data processing. Even listing hundreds of thousands of S3 files can take minutes, while reading them can be more challenging than processing. Learn strategies to overcome these performance limitations.

Talk overview

I/O from remote storage is a consistent bottleneck for large scale data processing workloads. When you have hundreds of thousands of files in S3 storage, even listing those files can take several minutes and become a bottleneck! Reading those files can be even more painful than the actual processing of the files.Daft Dataframes are built for the cloud and feature many optimizations that make them extremely efficient at reading and working with cloud storage. In this talk, we will showcase and explain some of the optimization that are built into Daft using its Rust I/O layer, but exposed to users as a familiar Python Dataframe interface.

Currently a Metaplane or Monte Carlo user?

Switch to Secoda and get those exact same features for free. Get everything you love now, and your budget back.

Meet us at Snowflake Summit

Unlock the blueprint for enterprise data governance

Benchmarks and actionable strategies to scale governance frameworks effectively.

Get the report