This video discusses how to setup Minio object storage in Kubernetes and integrate that with Apache Spark for running analytics jobs. VIDEO RESOURCES Music: Day One from KZbin Audio Library Twitter: / bradsheppard5
Пікірлер: 29
@anmfaisal964 Жыл бұрын
I was looking for such in-depth description for so long - take a bow - many many thanks. - loved it.
@citormussa3 жыл бұрын
First video I've found that truly goes into the details on how to actually build a Spark cluster (with the MinIO bonus) with Kubernetes, instead of showing everything already built. Thank you very much!
@recs85642 ай бұрын
EXACTLY WHAT I HAVE BEEN LOOKING FOR
@th96792 жыл бұрын
00:14 Setup Data Lake (minIO) on Kubernetes 03:03 Explore minIO on Kubernetes and its options 04:20 Explore minIO web UI (contains kubefwd part) 08:02 Define object (write analytics job on top of minIO) 08:34 Install local Spark 13:59 Start writing PySpark job 15:05 Install PySpark 16:00 Writing analytics job in python 20:24 Run the PySpark job 20:56 Missing Spark dependency (expected) error 21:20 Add the missing dependencies 24:00 Rerun the PySpark job 24:53 Containerize the PySpark job 25:28 Spark-operator setup 27:33 Writing Dockerfile for the spark job 34:18 Build image from Dockerfile 34:49 Push image to registry 36:10 Deploy Spark job 40:27 Check job outcome Thank you Brad!
@321andyy8 ай бұрын
By far the best tutorial that i have seen on this topic! Thank you!!
@DevelopersHubChannel Жыл бұрын
Underrated Tutorial and KZbinr!!!! Love this. So Practical and end-to-end demo.. pro level....I work on K3S daily and sometimes K8S..
@marekkucak65813 жыл бұрын
I'm planning to run Spark on K8s cluster made of few Raspberries. This was very helpful.
@bradsheppard66503 жыл бұрын
Thanks, much appreciated! Good luck with the Raspberry cluster.
@marekkucak65813 жыл бұрын
@@bradsheppard6650 Thanks
@GamerPCForever3 жыл бұрын
I figured it out to achieve this in a mesos cluster and also with minIO. It was funny to see because this would have saved me a lot of time haha. Great content sir
@jijosunny86263 жыл бұрын
Hi Brad this is a great video, there are few videos/articles which describe in detail how to connect from spark to minio. Keep it up
@nikschuetz41124 ай бұрын
nice video. i am trying for publishing minio events to kafka and connecting this to spark streaming app, video helps a lot
@优雅世界2 жыл бұрын
课程讲的很有趣,简单明了,主题清晰,good
@rahul-qo3fi Жыл бұрын
thank you so much, this was very informative!!
@anshuman93 ай бұрын
Nice and informative video.
@ylcnky94063 жыл бұрын
This is a great video. Thanks for the great effort. Deserves a subscription. Expansion of this mini-datalake platform with more components such as kafka streaming, spark streaming, etc. would be awesome.
@tratkotratkov126 Жыл бұрын
Thank you for the great explanation !
@특이점이온다-l6d Жыл бұрын
This is a very helpful video thank you
@hi-kp7jg Жыл бұрын
you are a gigachad for doing this
@johnmason97883 жыл бұрын
Thanks Brad. This was helpful. What do you think of submitting jobs via Jupyter notebook vs spark operator?
@bradsheppard66503 жыл бұрын
Hey John. Great question! Personally I find Jupyter Notebooks are super helpful for messing around and trying things out, but they are less intended for production code. Once I have code working in a notebook, I'll generally containerize it after the fact and then run it similar to how I showed it in the video. The other great thing about the Spark Operator is its scheduling capabilities (via the ScheduledSparkApplication CRD). So if you have a job that you want to run nightly for example, then you can very easily get that automation from the Operator.
@sahmed02112 жыл бұрын
really good video, you gave me a lot of cool things to work with
@fmfvieira3 жыл бұрын
That was great, man. Very informative!
@kimted32723 жыл бұрын
thanks bro. saved my day
@afshinyavari74222 жыл бұрын
Awesome video!
@rahul-qo3fi Жыл бұрын
21:04 spark s3 dependencies 25:13 spark on k8
@iamindigamer3 жыл бұрын
any github url to apply commands
@amarvakacharla2 жыл бұрын
This helped a great Brad, thanks a ton. I got an exception while running the spark-pi example because of latest updates to repo, reason for that is related spark-pi-driver is forbidden: error looking up service account default/spark: serviceaccount spark not found Resolved this with little change in helm install command helm install spark-operator spark-operator/spark-operator --set serviceAccounts.spark.name=spark-user and use the same serviceAccount=spark-user in spark-pi.yaml