Data Warehouse Ingestion Patterns with Apache NiFi

  Рет қаралды 1,977

Pierre Villard

Pierre Villard

Күн бұрын

Пікірлер: 10
@clintonchikwata4049
@clintonchikwata4049 3 ай бұрын
Thanks Third Option is phenomenal
@clintonchikwata4049
@clintonchikwata4049 3 ай бұрын
@Pierre when using option 3 how would you handle a scenario where you want a surrogate key on the destination table
@LesterMartinATL
@LesterMartinATL 3 ай бұрын
Good stuff!
@nasrinidhal4162
@nasrinidhal4162 Ай бұрын
Thanks for sharing! Insightful content. I am a starter and I am wondering whether Nifi is able to handle cross-team collaboration? if so, I would be glad if you can share some useful links. At the same, I doubt if it is really a good choice for heavy ETL/ELT or even CDC? (even though it is possible to implement it) I see it good only as a mediation and routing tool, am I mistaken? Thank you for your feedback!
@pvillard31
@pvillard31 Ай бұрын
Hi, NiFi is definitely able to handle cross-team collaboration. The concept of registry client is usually what is recommended to version control flow definitions and have multiple people working on the same flows as well as building CI/CD pipelines to test and promote flows in upper environments. NiFi should be considered more as an ELT rather than an ELT. Any kind of transformation is technically doable at FlowFile level in NiFi but if you need to do complex transformations over multiple FlowFiles (joins, aggregations, etc), then a proper engine like Flink for example would likely be better (or delegate this to whatever destination system you're using - data warehouse, etc). Finally, CDC is definitely something you can do very well with NiFi. Some vendors providing support on NiFi are providing NiFi processors based on Debezium for capturing CDC events as well as processors to push those events into systems (Iceberg, Kudu, etc). There are some things to keep in mind when designing a flow to make sure events ordering is preserved but there are many options to do that in NiFi very well. Hope this helps!
@nasrinidhal4162
@nasrinidhal4162 Ай бұрын
@@pvillard31 Hi, So Buckets can be considered as separate projects in Nifi where data engineers can work together without disturbing other teams that are on other buckets using the same Nifi instance? And if a team want to test or deploy a given version it could be done through scripts that they need to implement and maintain? If so, this would be very interesting! I will try to have a closer look. Thank you and keep posting!
@pvillard31
@pvillard31 Ай бұрын
@@nasrinidhal4162 Yeah, buckets can be a separation for different teams or for logically grouping flows serving a similar purpose and then you have flows versioned in that bucket and multiple people can work on the same flow. I have a video coming soon with some nice features of NiFi 2 with branching, filing pull request and comparing versions before merging a pull request for a new flow version. I have a series of blog post and videos coming that are focusing on CI/CD with NiFi.
@nasrinidhal4162
@nasrinidhal4162 Ай бұрын
@@pvillard31 Cool! That would be amazing! Thanks for sharing again and keep posting.
@franckroutier
@franckroutier 2 ай бұрын
Hi, and thanks for the video. I have question through... would there be a way to handle transactions in a scenario where I'm upserting into multiple tables, and I'd like the whole process to succeed or fail ? Coming from Talend, I usually have a pre-job that starts a transaction on a db connection, all "processors" will use the transaction, and in the post-job I will commit or rollback, depending on whether there is an error or not.
@pvillard31
@pvillard31 2 ай бұрын
I guess the closest thing to what you describe is the option in ExecuteSQL and/or ExecuteSQLRecord processors to set SQL queries in the pre-query property and in the post-query properties. But if you mean a transaction to the database that would span across multiple processors in the flow, then it's not possible today. I could see ways of implementing this with custom processors and controller services but there is nothing out of the box today. That could be a valid feature request if you'd like to file a JIRA in the Apache NiFi project.
Pushing data into Snowflake via Snowpipe using Apache NiFi
16:36
Pierre Villard
Рет қаралды 1,9 М.
😜 #aminkavitaminka #aminokka #аминкавитаминка
00:14
Аминка Витаминка
Рет қаралды 2,4 МЛН
А что бы ты сделал? @LimbLossBoss
00:17
История одного вокалиста
Рет қаралды 12 МЛН
Random Emoji Beatbox Challenge #beatbox #tiktok
00:47
BeatboxJCOP
Рет қаралды 35 МЛН
Elza love to eat chiken🍗⚡ #dog #pets
00:17
ElzaDog
Рет қаралды 15 МЛН
Apache NiFi 101 |  How to get started with Apache NiFi
1:59:37
Cloudera, Inc.
Рет қаралды 22 М.
Automating NiFi flow deployments from DEV to PROD
41:50
Pierre Villard
Рет қаралды 3,8 М.
FlowFile Concurrency at Process Group level
21:17
Pierre Villard
Рет қаралды 1 М.
Solving one of PostgreSQL's biggest weaknesses.
17:12
Dreams of Code
Рет қаралды 210 М.
What is ETL | What is Data Warehouse | OLTP vs OLAP
8:07
codebasics
Рет қаралды 425 М.
Cloudera Webinar - GenAI and beyond with NiFi 2.0
1:02:07
Pierre Villard
Рет қаралды 723
Get vs List+Fetch and using a Record Writer
40:16
Pierre Villard
Рет қаралды 2,2 М.
😜 #aminkavitaminka #aminokka #аминкавитаминка
00:14
Аминка Витаминка
Рет қаралды 2,4 МЛН