Getting Data Ready for Data Science with Delta Lake and MLflow

  Рет қаралды 9,616

Databricks

Databricks

Күн бұрын

Пікірлер: 4
@GregKramerTenaciousData
@GregKramerTenaciousData 4 жыл бұрын
Denny, if I work through all your vids, do you award a certficate? :)
@KoushikPaulliveandletlive
@KoushikPaulliveandletlive 4 жыл бұрын
Example: I have ran a delete operation with a filter which will take 5 minutes to complete, right after I ran the delete command, I ran an update query on the same filter, what will happen when both the queries finishes? If it was not deltalake I would have got an exception for the second query as the first was not complete and if I would have waited and ran the query after the delete was complete, the update query wont have any effect on the table. Because there will not be any data left on the table for that filter.
@Xiphos76
@Xiphos76 4 жыл бұрын
So for schema changes that involve completely new fields, Delta will handle gracefully with .option("mergeSchema", "true"). However how do you handle situations where the schema change is more subtle? When processing JSON to a table, I've come across situations where the key is logically the same, but differs only in the case; MyField vs myfield. Databrick ingestions fail even when mergeSchema is set to true with the following error: "org.apache.spark.sql.AnalysisException: Found duplicate column(s) in the data to save: myfield"
@dennyglee
@dennyglee 4 жыл бұрын
While Apache Spark can be used in case sensitive or insensitive (default) mode, Delta Lake is case-preserving but insensitive when storing the schema. Parquet is case sensitive when storing and returning column information. To avoid issues and possible dat a corruptions, Delta Lake cannot have column names that differ only by case. For more info, please refer to databricks.com/blog/2019/09/24/diving-into-delta-lake-schema-enforcement-evolution.html. HTH!
Simplify and Scale Data Engineering Pipelines with Delta Lake
57:53
The Genesis of Delta Lake - An Interview with Burak Yavuz
43:41
Databricks
Рет қаралды 3,6 М.
啊?就这么水灵灵的穿上了?
00:18
一航1
Рет қаралды 58 МЛН
小蚂蚁会选到什么呢!#火影忍者 #佐助 #家庭
00:47
火影忍者一家
Рет қаралды 115 МЛН
Сюрприз для Златы на день рождения
00:10
Victoria Portfolio
Рет қаралды 2,1 МЛН
Why Databricks Delta Live Tables?
16:43
Bryan Cafferky
Рет қаралды 17 М.
Beyond Lambda: Introducing Delta Architecture
57:35
Databricks
Рет қаралды 36 М.
Presto and Apache Iceberg - Building out Modern Open Data Lakes
38:09
Presto Foundation
Рет қаралды 7 М.
Microsoft Fabric for Power BI developers - 3.5 HOUR FREE COURSE
3:29:41
Learn Microsoft Fabric with Will
Рет қаралды 40 М.
GEOMETRIC DEEP LEARNING BLUEPRINT
3:33:23
Machine Learning Street Talk
Рет қаралды 189 М.
啊?就这么水灵灵的穿上了?
00:18
一航1
Рет қаралды 58 МЛН