Kafka Tutorial - Exactly once processing

Рет қаралды 52,978

7 жыл бұрын

Spark Programming and Azure Databricks ILT Master Class by Prashant Kumar Pandey - Fill out the google form for Course inquiry.
forms.gle/Nxk8dQUPq4o4XsA47
-------------------------------------------------------------------
Data Engineering using is one of the highest-paid jobs of today.
It is going to remain in the top IT skills forever.
Are you in database development, data warehousing, ETL tools, data analysis, SQL, PL/QL development?
I have a well-crafted success path for you.
I will help you get prepared for the data engineer and solution architect role depending on your profile and experience.
We created a course that takes you deep into core data engineering technology and masters it.
If you are a working professional:
1. Aspiring to become a data engineer.
2. Change your career to data engineering.
3. Grow your data engineering career.
4. Get Databricks Spark Certification.
5. Crack the Spark Data Engineering interviews.
ScholarNest is offering a one-stop integrated Learning Path.
The course is open for registration.
The course delivers an example-driven approach and project-based learning.
You will be practicing the skills using MCQ, Coding Exercises, and Capstone Projects.
The course comes with the following integrated services.
1. Technical support and Doubt Clarification
2. Live Project Discussion
3. Resume Building
4. Interview Preparation
5. Mock Interviews
Course Duration: 6 Months
Course Prerequisite: Programming and SQL Knowledge
Target Audience: Working Professionals
Batch start: Registration Started
Fill out the below form for more details and course inquiries.
forms.gle/Nxk8dQUPq4o4XsA47
--------------------------------------------------------------------------
Learn more at www.scholarnest.com/
Best place to learn Data engineering, Bigdata, Apache Spark, Databricks, Apache Kafka, Confluent Cloud, AWS Cloud Computing, Azure Cloud, Google Cloud - Self-paced, Instructor-led, Certification courses, and practice tests.
========================================================
SPARK COURSES
-----------------------------
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/s...
www.scholarnest.com/courses/d...
KAFKA COURSES
--------------------------------
www.scholarnest.com/courses/a...
www.scholarnest.com/courses/k...
www.scholarnest.com/courses/s...
AWS CLOUD
------------------------
www.scholarnest.com/courses/a...
www.scholarnest.com/courses/a...
PYTHON
------------------
www.scholarnest.com/courses/p...
========================================
We are also available on the Udemy Platform
Check out the below link for our Courses on Udemy
www.learningjournal.guru/cour...
=======================================
You can also find us on Oreilly Learning
www.oreilly.com/library/view/...
www.oreilly.com/videos/apache...
www.oreilly.com/videos/kafka-...
www.oreilly.com/videos/spark-...
www.oreilly.com/videos/spark-...
www.oreilly.com/videos/apache...
www.oreilly.com/videos/real-t...
www.oreilly.com/videos/real-t...
=========================================
Follow us on Social Media
/ scholarnest
/ scholarnesttechnologies
/ scholarnest
/ scholarnest
github.com/ScholarNest
github.com/learningJournal/
========================================

Пікірлер: 56

@ScholarNest 3 жыл бұрын

Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code. www.learningjournal.guru/courses/

@MaheshSingh-ev8yh 4 жыл бұрын

Hi Sir, really become a big fan of you. The way u r explaining each concept, r up to the marks 5/5. Short videos and u r categorizing them. Are r excellent.I was not expecting this when i got u r link. I was looking kafka with c# for micro-services but ur videos have given me a lot clear idea about it.

@praveenkumar-oy5zt 5 жыл бұрын

your way of teaching is awesome..

@DineshKumar-by4sk 7 жыл бұрын

Excellent and crisp explaination.

@nawaz4321 5 жыл бұрын

very nicely explained, big thank you

@yog2915 4 жыл бұрын

very nice cleared alot of things

@VaibhavPatil-rx7pc 3 жыл бұрын

Excellent explained !! thanks you !!

@vitinho0610 4 жыл бұрын

Hey sir, Thank you once again for your excellent tutorials! I may have have one doubt: 1 - If this consumer dies, will Kafka redistribute the TSS partitions for other consumers? if so, how will the other consumers know where the commited offset stands?

@lytung1532 Жыл бұрын

Thanks for this tutorial. I am a fan of yours in Udemy.

@gopinathGopiRebel 7 жыл бұрын

how do we know how many partitions to assign to a particular topic ? what is the default length of partitioner in kafka ?

@akhilanandbenkalvenkanna5057 7 жыл бұрын

Do we use MYSQL db in real time project as well?? Are there any performance issues with using relational DB??

@SauravOjha94 4 жыл бұрын

Hi Sir. Excellent explanation. Just one doubt, since this is not a case of auto commit, don't you think we have forgotten to commit the offset to kafka?

@gauravluthra7959 6 жыл бұрын

Great Explanation. One doubt, Suppose I want exactly once processing and the consumer is of same type as you wrote in this example, where I write data and offset in database with single commit. But I want to use a group of consumer instead of only one consumer. Then how will it do exactly once processing? (My doubt, if we have three consumers and C0 is reading from P0 and C1 is from P1 and C2 from P2. Then if C0 gets down/killed, and never run again. Then data from P0 will never get read. Can we solve this problem with exactly once?)

@neil3507 6 жыл бұрын

Is this a way to achieve exactly once semantics in kafka?

@max9260712 3 жыл бұрын

Thank you for your detailed videos. I am new to the channel and hope to come here more often. I have a bit of difficulty understanding the problem statement here , If you could please help at 5:11 where you are explaining how storing into the DB and adding offSet to rebalanceListener are not atomic and this is problem. If the consumer crashes , lets say just after storing into the database, then even if the RebalanceListner is triggered it is unable to commit this particular offset ( The record just stored in DB) to Kafka. Reason being our method call .addOffset did not occur. Is my understanding correct?

@Prabhatkumardiwaker 5 жыл бұрын

Hi,I have one question. Why did consumer application consumed 10 records in 2 diff polls. i.e. 6 in 1st poll and 4 records in 2nd poll. It could have got all 10 records in 1 poll as message were already available in topic. Thanks in Advance

@reachmurugeshanm7750 3 жыл бұрын

Hi Sir.. I have one doubt,.. You have explained in this video one consumer with multiple custom partitions but if my requirement is multiple consumer with multiple custom partitions, in this case wht would be the code snippet.. And if one consumer crashes when process the message, how partitions takes away from consumer1 and assign to consumer 2. Do we need to handle any exception when consumer crashes?

@rbsood 4 жыл бұрын

hi Learning Journal - Have a questoin ? I have a kafka log retention policy based on size. So if the size is 1 gig kafka will delete the log. How can i make sure that kafka does not delete the log if Consumer has not finished reading all messages ? In other words kafka should delete the log only when consumer's current offset is same as the latest offset in the log. Does kafka do this automatically or is there some manipulation thats needed ?

@cellisisimo 7 жыл бұрын

Excellent video!! What if, after updating the first table with data, the consumer fails before updating tables with offsets. In this case, the same data will be processed twice, won´t it?

@ScholarNest 7 жыл бұрын

No, The data in the table is not permanent until we execute commit. The commit is the last statement after insert and update both.

@KajalSingh-og7fk 3 жыл бұрын

why is setAutommmit to false.. it should be true right? Am I missing something?

@theashwin007 7 жыл бұрын

Hi I have one doubt. Consider there are two different groups of consumers. And say both the groups subscribed to same topic. Now, how does the Kafka stores these offsets (commit offset & read offset)? I mean whether it stores it per consumer group?

@ScholarNest 7 жыл бұрын

Kafka maintains current offset & Committed offset per consumer. However, rebalace happens at consumer group level.

@lonelybard19 7 жыл бұрын

Hi. In this example you didn't have parallel processing because one single consumer assigned the 3 partitions to itself. How would I achieve "exactly once" processing in a scenario with multiple consumers? I could give each consumer an ID and have a table in the external database to store which partitions should be assigned to each consumer, but then I would have to perform the rebalance myself, which could be some hard work :(

@AmitITpartner 6 жыл бұрын

Answer to your question "How would I achieve "exactly once" processing in a scenario with multiple consumers? "is by implementing multiple consumer within a consumer group. Advantage of this is unique data fetch by each consumer. Hope this helps.

@kumarvairakkannu360 7 жыл бұрын

on poll() first time 6 records, second time 5 records, etc..- Curious how Kafka decides how many records to pull? default max.poll.records=2147483647, is it random below the max poll limit?

@ScholarNest 7 жыл бұрын

The poll method will try to give you as many as it can within the various limits specified by you. The max.poll.records is one of them (default 500). The timeout parameter passed to poll method is another such limit.

@HollyJollyTolly 7 жыл бұрын

Hi sir, What is the difference between high level consumer and low level cnsumer

@ScholarNest 7 жыл бұрын

That's an outdated concept. Old Kafka API used to have high-level consumer, but new Kafka API doesn't have such concept. I cover new API since the old one is not supported now.

@singhsankar 6 жыл бұрын

where do we commit kafka processed message? , we do commit only mysql(db) connection.

@ScholarNest 6 жыл бұрын

The idea is to make a single transaction to commit after processing the message and the offset number.

@JoaoGomes-ff2pz 7 жыл бұрын

There is no Rebalance Listener in this example. What happens if you more than one costumer like via subscribe, one of them received 100 records and after processing and saving 50 records a rebalacing is initialized? The offsets in kafka in will be stored as the actual commited offsets and the next consumer assigned to that partition will receive the data from the beginning?

@ScholarNest 7 жыл бұрын

Good question. When we are not using Automatic group management (Like in this example), There is no rebalance activity. Kafka can't rebalance because there is no group in this case.

@JoaoGomes-ff2pz 7 жыл бұрын

Oh cool ! didn't notice that you aren't using any group. Thank u !

@4ukcs2004 6 жыл бұрын

Great video.Sir I need a reply. I have a kafka topic which contain jobname filed.using consumer when I read the topic with jobname those jobnames should get triggered and start running.it looks to be event triggering or event driven.any link or snippet would help.How do I take care this part.Pls help

@robind999 7 жыл бұрын

HI LJ, I struggled with kafka-mongodb-sink connector setups, github.com/startappdev/kafka-connect-mongodb Seemed it needs curl to convert mongodb configuration file(json file) to xml(need add header too). ... need modify httpd.config file to open port and still could not upload file through curl on the localhost etc. By watching your demo, the process is fully monitored, if I use this kafka connector, I just dont know how to monitor my process, especially the partition part. so question to you, instead of using kafka-mongodb-sink connector, can I use your similar code to sink kafka-mongodb? please advise, yours is the most advanced detailed kafka demo so far. Thanks, Robin

@ScholarNest 7 жыл бұрын

You can always write your own code to sink. However, it may be convenient to use a connector. Unfortunately, there is no certified connector for MongoDB yet. Check this link www.confluent.io/product/connectors/ There are 4 Mongo DB Sinks listed. I never tried any of them, but you can give it a try. One of them should be mature enough.

@robind999 7 жыл бұрын

Thank you so much for your quick feedback, I just found a spark code to sink data to mongodb, since you told me there is no certified connector for mongodb yet, so I will give this a try as following: rklicksolutions.wordpress.com/2017/04/04/read-data-from-kafka-stream-and-store-it-in-to-mongodb/ how you think about this link? Confluent involved another tool installation, and I still dont find use case of this. only find one to pull out data from mongodb to kafka. thank you so much my mentor. Robin

@somethingbig8072 7 жыл бұрын

how to send different data to different consumer from single topic

@ScholarNest 7 жыл бұрын

The answer to your question is in the videos. Watch the full playlist.

@glt123 7 жыл бұрын

Can producer send messages during the rebalance is happening ? Or the Kafka Producer will get exception during the rebalancing process...

@ScholarNest 7 жыл бұрын

Rebalance is an activity for the consumer group. It has nothing to do with a producer.

@glt123 7 жыл бұрын

Okay... When a new partition is added to a topic then how does Producer starts sending the message to new partition?

7 жыл бұрын

I don't think you can add a partition in "real-time". You have to specify them when you create the topic.

@madhuthakur2523 5 жыл бұрын

This will make consumption super slow

@ScholarNest 5 жыл бұрын

This method is obsolete. Kafka streams has got better options.

@humanGenAI 2 жыл бұрын

@@ScholarNest any link?

@hugodeiro 5 жыл бұрын

Very good. But it would be nice if you provide the code in somewhere like Github...

@ScholarNest 5 жыл бұрын

It is already there in github github.com/LearningJournal/ApacheKafkaTutorials

@sujeeshsvalath 6 жыл бұрын

"exactly once" processing have been incorporated now built in starting from Kafka 0.11 version. The concept is the same explained in this video. Please refer www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ to enable "exactly once" processing in Kafka

@ScholarNest 6 жыл бұрын

Thanks for the link.

@reachmurugeshanm7750 3 жыл бұрын

I am a big fan of you Sir, the way of your explanation is awesome. Could you pls share with me your mail id for cpmmunication and clarify my doubts.

@reachmurugeshanm7750 3 жыл бұрын

I will pay for each my doubt, its not free

@reachmurugeshanm7750 3 жыл бұрын

It is very urgent sir, i am currently working on kafka consumer challenges @ my work place

@reachmurugeshanm7750 3 жыл бұрын

Sir please kindly respond me