Spark Structured Streaming Example (Kafka, Spark, Cassandra)

Рет қаралды 28,763

Күн бұрын

Пікірлер

@PampRedMonster 15 күн бұрын

I have been looking for a nice end-to-end streaming data pipeline tutorial forever. All tutorials were missing something. This is not. This has it all - it's clear, complete and really well explained. Could not recommend enough.

@oussamafahd4205 Жыл бұрын

Thank you Davis for this beautiful mini project it has all what i've wanted to learn as a beginner in the Big Data Technologies

@hkrishnan91 2 жыл бұрын

World is a better place because of people like you. Very nice presentation of the topic and clear explanation of various steps involved. Thank you very much!

@wewatchyourwebsite1 3 жыл бұрын

Very well done! I like the fact that you show what you installed and how, and explain every step in perfect detail. Outstanding. I've avoided Scala because it was too complicated, now after seeing your video here, I moved my project from Python to Scala. The API is almost identical. I LOVED the fact that you show your steps for creating the build.sbt, creating the jar file and how to submit it. I also learned how to get rid of the annoying INFO in the log4j file. You also showed how to add the paths so I can run my project from a different folder. Some of this I already knew, but the fact you included it all is outstanding. I'm sure I could have found all this without your video, but the fact that you did it all, is just amazing. Thank you sooooo much for doing this.

@bigdataprofessional1880 2 жыл бұрын

Super presentation, Practical and thoroughly mentored for a kafka novice

@jayc4849 4 жыл бұрын

This is great. Exactly what I needed to see, and well explained. I absolutely recommend following the change to the log4j properties file, otherwise your spark output will make it really really hard to find your data as you first test in the console.

@marcosoliveira8731 2 жыл бұрын

Very good example. Made me think in a lot of ideas. Thanks a lot.

@taimoorabbasi169 2 жыл бұрын

Just the video i needed. Brilliant work. Really appreciate it 👏👏

@gasmikaouther6887 2 жыл бұрын

Hi Mohammed, can we reimplement it together

@taimoorabbasi169 2 жыл бұрын

I haven't worked on scala but I've implemented it on Python. We can do it together if you want. 💯

@guilhermesilveira2742 2 жыл бұрын

This is exatcly what I was looking for. Thanks!!

@faaizshah5830 3 жыл бұрын

This is fantastic work!

@ambar752 4 жыл бұрын

Nicely Explained Davis...Thanks !!

@My_Tahar 4 жыл бұрын

thank you @Davis for sharing, the tutorial is very clear

@sankarreddy82 3 жыл бұрын

Superb. Very much helpful. Thanks a lot.

@savacano2858 3 жыл бұрын

this is exactly what I am looking for!! Thanks!!!!!

@abdulelahaljeffery6234 3 жыл бұрын

quality content ... thank you Davis! subscribed

@jagadeeshporalla4041 4 ай бұрын

Thanks Davis, if there are multiple queries with WriteStream and are done on different datasets derived from the actual Kafka stream using ( readstream) as you did… is this going to grab data for each query independently

@eatonleo 3 жыл бұрын

Could you please upload a pyspark version instead of scala? Really appreciate.

@neutrinos401 4 жыл бұрын

thank you so much for your great work

@ДмитрийСиницкий-р2щ 3 жыл бұрын

My thanks! Great tutorial!

@devon299 4 жыл бұрын

thank you! this is helpful to get started.

@sumitkhattar9849 3 жыл бұрын

Great, and Thankyou. keep doing.

@aomo5293 3 жыл бұрын

Great tutorial Thank you bro

@amitmiglani5256 4 жыл бұрын

Hi iam getting this error :- Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.package$ can u help

@nadiafekih2619 4 жыл бұрын

Did you find a solution to this problem ? I'm facing the same issue!!

@dji5028 4 жыл бұрын

Same here, you tried to run the app on cluster mode ?

@mouadriali3282 3 жыл бұрын

Same Here >_

@ryanrickerts5982 3 жыл бұрын

See 34:10

@srhreza 3 жыл бұрын

Find proper dependency for your spark version.

@alisahin1466 4 жыл бұрын

you amazing man ... thank you soo much

@mouhammaddiakhate3546 3 жыл бұрын

How would we do the spark-submit if it was with PySpark ?

@RajmohanBalachandran 3 жыл бұрын

thanks a lot Davis

@imosolar 3 жыл бұрын

can I use UI / J notebook instead of command line to run it

@samuelfernandoperezarpi4251 2 жыл бұрын

Hi, is this lambda architecture?

@nihedattia9477 4 жыл бұрын

which type of machine VM u work on it ?

@TechGeniusMinds 4 жыл бұрын

Nice explanation

@Cal97g 3 жыл бұрын

Bro you heard of docker-compose? Installing kafka?

@shrikantbhadane478 3 жыл бұрын

good session, Can you teach same thing with PySpark?

@kunjalsujalshah1992 3 жыл бұрын

Amazing!

@srhreza 3 жыл бұрын

Awesome.

@Raaj_ML 3 жыл бұрын

Very helpful tutorial ! What is that Trigger.ProcessingTime("5 seconds") and when did you add it ?

@hkrishnan91 2 жыл бұрын

He missed to talk about it. They determine the batch interval. From the manual: The trigger settings of a streaming query defines the timing of streaming data processing, whether the query is going to executed as micro-batch query with a fixed batch interval or as a continuous processing query.