Evolving Data Pipelines at Scale

Рет қаралды 1,598

Күн бұрын

Пікірлер: 4

@jasonkhaihoang781 2 ай бұрын

I have one question. This demo and approach works backward from prod to dev. However, what if I want to start something totally new from dev and then promote that to prod, such as adding a new ingested data source and new transformation models? Also, how do you switch the Views if the ingested data is not yet available on Prod? I think for ingested data, you cannot use the concept of Virtual Layer. The Virtual Layer will work only with "T"/Transformation models. Finally, what if I have more than 2 environments, say dev/test/preprod/prod? I like the concept of allowing developers to work on production data directly to remove the gap between dev and prod. However, there are some concerns that I cannot get my head around, thus I still do not have confidence to start using SQLMesh.

@1988YUVAL 8 ай бұрын

Very interesting presentation. Looks like a very well thought out solution for managing data transformations. I wonder if it will take off like dbt.

@tratkotratkov126 8 ай бұрын

Great, very much needed and promising project ! However, it is not quiet clear what do you mean when you are talking about data versioning (DV) - do you version the data as LakeFS does or you are just versioning the source code which is producing this data. Also the diagrams in the presentation (Virtual/Physical layers) I find confusing and not easy to grasp at first glance. It will be nice in the next iteration if you use some real world/practical entities to describe demo objects like customer, product, sales etc. instead of just “source” and wrap the demo in some quick story like “Meet Alex, the data engineer at TechCorp, a rapidly growing tech company. Alex is responsible for managing the company’s data pipelines, ensuring that data from various sources is clean, consistent, and available for analysis” etc. you got the idea. Finally I would suggest you switch the sequence and the time you spend on the theory and the demo part - show your fantastic open source project demo first and how easy is implementing the 3 concepts in meaningful story then after each segment just mention the theoretical part, but don’t allow the theory to consume 75% of your presentation unless you want to be considered as one of the many Data Governance “gurus” which are presenting on this channel. Whishing you all good luck with this fantastic project !

@jasonkhaihoang781 2 ай бұрын

Agree. Putting the demo first may avoid some confusion from the beginning.