@Pierre when using option 3 how would you handle a scenario where you want a surrogate key on the destination table
@LesterMartinATL3 ай бұрын
Good stuff!
@nasrinidhal4162Ай бұрын
Thanks for sharing! Insightful content. I am a starter and I am wondering whether Nifi is able to handle cross-team collaboration? if so, I would be glad if you can share some useful links. At the same, I doubt if it is really a good choice for heavy ETL/ELT or even CDC? (even though it is possible to implement it) I see it good only as a mediation and routing tool, am I mistaken? Thank you for your feedback!
@pvillard31Ай бұрын
Hi, NiFi is definitely able to handle cross-team collaboration. The concept of registry client is usually what is recommended to version control flow definitions and have multiple people working on the same flows as well as building CI/CD pipelines to test and promote flows in upper environments. NiFi should be considered more as an ELT rather than an ELT. Any kind of transformation is technically doable at FlowFile level in NiFi but if you need to do complex transformations over multiple FlowFiles (joins, aggregations, etc), then a proper engine like Flink for example would likely be better (or delegate this to whatever destination system you're using - data warehouse, etc). Finally, CDC is definitely something you can do very well with NiFi. Some vendors providing support on NiFi are providing NiFi processors based on Debezium for capturing CDC events as well as processors to push those events into systems (Iceberg, Kudu, etc). There are some things to keep in mind when designing a flow to make sure events ordering is preserved but there are many options to do that in NiFi very well. Hope this helps!
@nasrinidhal4162Ай бұрын
@@pvillard31 Hi, So Buckets can be considered as separate projects in Nifi where data engineers can work together without disturbing other teams that are on other buckets using the same Nifi instance? And if a team want to test or deploy a given version it could be done through scripts that they need to implement and maintain? If so, this would be very interesting! I will try to have a closer look. Thank you and keep posting!
@pvillard31Ай бұрын
@@nasrinidhal4162 Yeah, buckets can be a separation for different teams or for logically grouping flows serving a similar purpose and then you have flows versioned in that bucket and multiple people can work on the same flow. I have a video coming soon with some nice features of NiFi 2 with branching, filing pull request and comparing versions before merging a pull request for a new flow version. I have a series of blog post and videos coming that are focusing on CI/CD with NiFi.
@nasrinidhal4162Ай бұрын
@@pvillard31 Cool! That would be amazing! Thanks for sharing again and keep posting.
@franckroutier2 ай бұрын
Hi, and thanks for the video. I have question through... would there be a way to handle transactions in a scenario where I'm upserting into multiple tables, and I'd like the whole process to succeed or fail ? Coming from Talend, I usually have a pre-job that starts a transaction on a db connection, all "processors" will use the transaction, and in the post-job I will commit or rollback, depending on whether there is an error or not.
@pvillard312 ай бұрын
I guess the closest thing to what you describe is the option in ExecuteSQL and/or ExecuteSQLRecord processors to set SQL queries in the pre-query property and in the post-query properties. But if you mean a transaction to the database that would span across multiple processors in the flow, then it's not possible today. I could see ways of implementing this with custom processors and controller services but there is nothing out of the box today. That could be a valid feature request if you'd like to file a JIRA in the Apache NiFi project.