Getting Data Ready for Data Science with Delta Lake and MLflow

  Рет қаралды 9,680

Databricks

Databricks

Күн бұрын

Пікірлер: 4
@KoushikPaulliveandletlive
@KoushikPaulliveandletlive 4 жыл бұрын
Example: I have ran a delete operation with a filter which will take 5 minutes to complete, right after I ran the delete command, I ran an update query on the same filter, what will happen when both the queries finishes? If it was not deltalake I would have got an exception for the second query as the first was not complete and if I would have waited and ran the query after the delete was complete, the update query wont have any effect on the table. Because there will not be any data left on the table for that filter.
@Xiphos76
@Xiphos76 4 жыл бұрын
So for schema changes that involve completely new fields, Delta will handle gracefully with .option("mergeSchema", "true"). However how do you handle situations where the schema change is more subtle? When processing JSON to a table, I've come across situations where the key is logically the same, but differs only in the case; MyField vs myfield. Databrick ingestions fail even when mergeSchema is set to true with the following error: "org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data to save: myfield"
@dennyglee
@dennyglee 4 жыл бұрын
While Apache Spark can be used in case sensitive or insensitive (default) mode, Delta Lake is case-preserving but insensitive when storing the schema. Parquet is case sensitive when storing and returning column information. To avoid issues and possible dat a corruptions, Delta Lake cannot have column names that differ only by case. For more info, please refer to databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html. HTH!
@GregKramerTenaciousData
@GregKramerTenaciousData 4 жыл бұрын
Denny, if I work through all your vids, do you award a certficate? :)
Simplify and Scale Data Engineering Pipelines with Delta Lake
57:53
Introducing MLflow for End-to-End Machine Learning on Databricks
25:06
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 36 МЛН
Chain Game Strong ⛓️
00:21
Anwar Jibawi
Рет қаралды 41 МЛН
Latest Advancements in MLflow
37:40
Databricks
Рет қаралды 901
Delta Live Tables A to Z: Best Practices for Modern Data Pipelines
1:27:52
Beyond Lambda: Introducing Delta Architecture
57:35
Databricks
Рет қаралды 36 М.
The Genesis of Delta Lake - An Interview with Burak Yavuz
43:41
Databricks
Рет қаралды 3,6 М.
Building Production RAG Over Complex Documents
1:22:18
Databricks
Рет қаралды 23 М.
Why Databricks Delta Live Tables?
16:43
Bryan Cafferky
Рет қаралды 19 М.
A Hackers' Guide to Language Models
1:31:13
Jeremy Howard
Рет қаралды 542 М.
Delta Lake Deep Dive: Liquid Clustering
40:54
Delta Lake
Рет қаралды 8 М.
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН