Рет қаралды 132
Unveil the Magic Without Hoodini: Transform Your Machine Learning Pipelines with Apache Hudi - Nadine Farah, Onehouse
ML pipelines integrated with data lakes have emerged as a potent combination, enabling orgs to derive actionable insights from vast reservoirs of raw data. However, this integration presents distinct challenges. The dynamic nature of ML requires data to be consistently fresh, accurate, and available in near real-time. Traditional data lakes, while scalable, are immutable. It’s often hard to grapple with issues like data latency, incremental updates, and ensuring timely data availability for ML models.
Apache Hudi introduces features and services for upserts, incremental processing, and near real-time access for data lakes. Hudi natively supports efficient upserts, record-level updates, and deletions, ensuring that ML models always have access to the latest data. Furthermore, Hudi’s time-travel querying and incremental data pulls allow ML practitioners to harness historical data versions and detect potential model drifts effectively. In this talk, attendees will learn:
Challenges with building ml pipelines on data lakes
How Hudi unlocks analytics on the data lake
Build efficient ml pipelines incremental processing on the data lake