Intro To Databricks - What Is Databricks

  Рет қаралды 219,702

Seattle Data Guy

Seattle Data Guy

Күн бұрын

What is databricks?
How is it different from Snowflake?
And why do people like using Databricks.
This video will act as an intro to databricks.
We will discuss what is a databricks table(delta table), a databricks job, etc.
research paper on RDDs - www.usenix.org/system/files/c...
0:00 Intro - What Is Databricks
1:02 Presentation
6:29 - Hands On Demo - What Is Databricks
If you're team needs help setting up Databricks, then set up a consultation with me today!
calendly.com/ben-rogojan/cons...
Also! If you enjoyed this video, check out some of my other top videos.
Top Courses To Become A Data Engineer In 2022
• Top Courses To Become ...
What Is The Modern Data Stack - Intro To Data Infrastructure Part 1
• What Is The Modern Dat...
If you would like to learn more about data engineering, then check out Googles GCP certificate
bit.ly/3NQVn7V
If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.
seattledataguy.substack.com/​​
Or check out my blog
www.theseattledataguy.com/
And if you want to support the channel, then you can become a paid member of my newsletter
seattledataguy.substack.com/s...
Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio
_____________________________________________________________
Subscribe: / @seattledataguy
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.

Пікірлер: 86
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
If you guys want to learn more about data engineering, then sign up for my newsletter here seattledataguy.substack.com/ or join the discord here discord.gg/2yRJq7Eg3k
@anishninan8374
@anishninan8374 2 жыл бұрын
The best part about databricks is that is unifies batch and streaming workloads. Also provides a single source for structured, semi structured and unstructured data.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Yes those are all a lot of reasons why I do like databricks. Especially their streaming functionality.
@rachidt2764
@rachidt2764 2 жыл бұрын
Would be cool to see a mini project in databricks where you could compare and highlight why you don't see it as a data engineering / BI first tool
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Yeah I will likely do a comparison between this and foundry
@kerimsever6674
@kerimsever6674 Жыл бұрын
Transitioning into using Databricks and this is a great introduction!
@severtone263
@severtone263 Жыл бұрын
You earned my sub. I am all in. Thank you for this, it was a great help!
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
Glad you found this helpful!
@NroShock
@NroShock 2 жыл бұрын
Great video! Would love to see another one on Databricks; moving raw data from blob storage, transforming and storing in databricks tables
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
That would be a great video!
@keviny9392
@keviny9392 Жыл бұрын
Excellent content. Exactly what I needed to get started. Thanks
@tallalmoshrif6643
@tallalmoshrif6643 Жыл бұрын
Great content as always. Thanks for sharing.
@jackgolding4235
@jackgolding4235 Жыл бұрын
Had this on while I was working and it just clicked to me, thank you!
@SeattleDataGuy
@SeattleDataGuy 11 ай бұрын
Glad to hear that!
@hughesadam87
@hughesadam87 6 ай бұрын
Super helpful - have been using databricks in another system without ever really understanding how much of that other system was simply databricks.
@SeattleDataGuy
@SeattleDataGuy 4 ай бұрын
Glad it helped!
@akshaybaura
@akshaybaura 2 жыл бұрын
great starter !! it'd be interesting to dive a little deeper into delta lake file format and also compare it with iceberg or hudi formats i.e. where they are similar, different, which situations suit one best over the other.
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
Sounds like a great video. Let me see if I can get Ryan Blue in the video.
@dahof2789
@dahof2789 4 ай бұрын
Wanted to watch but can't filter that goofy background noise.... Everyone does it and it adds negative value to the listener.
@Skandawin78
@Skandawin78 Ай бұрын
after reading this i can't listen to his voice
@edwinfokobo5680
@edwinfokobo5680 6 ай бұрын
I love this an would love to learn more and understand what’s needed as a prerequisite before I can get a job
@devencareer
@devencareer 7 ай бұрын
Your approach in this particular video, is simple and precise, not deep, just right enough. Thanks! Devendra
@SeattleDataGuy
@SeattleDataGuy 7 ай бұрын
Thank you! You're too kind
@artandrock4all
@artandrock4all Жыл бұрын
really cool all around video, would love to see more videos on databricks where a more deep dive analysis would be given on each topic ;)
@SeattleDataGuy
@SeattleDataGuy Жыл бұрын
I will add it to the list of future data engineering videos
@DatabricksPro
@DatabricksPro 6 ай бұрын
This channel is great, but you may also check mine. Cheers.
@TheAndrewjoynson
@TheAndrewjoynson Жыл бұрын
brilliant intro - well done and and surprisingly i understand a lot of this. (im not a dev nor a data engineer)😄
@nujanai
@nujanai 2 жыл бұрын
Great overview. Thanks!
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
You're welcome!
@horaciosoldman4481
@horaciosoldman4481 2 жыл бұрын
Thanks for this informative video Ben 🙌
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Glad you found it informative!
@ShashankData
@ShashankData 2 жыл бұрын
🔥 video! Learning so much from your vids man. I’m not really getting the effective difference between DataBricks and Snowflake.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Filming that video now :)
@ShashankData
@ShashankData 2 жыл бұрын
@@SeattleDataGuy woohooo looking forward to it!
@aiautoglasscrm
@aiautoglasscrm Жыл бұрын
Subscribed. When you say productionize your job, I thought I was going to see API endpoints to hit to get results for example a regression price prediction model that gives you a number after inputting variables
@codestrap8031
@codestrap8031 2 жыл бұрын
Awesome breakdown Ben. I'm looking forward to a comparison with Foundry. My personal opinion is Databricks has the edge when it comes to their pay version of Spark and Delta. It's hard to do a direct head-to-head though because many of Foundry's overlapping capabilities are not documented. IMO Foundry has much better e2e capabilities with a built-in version control system, online IDE, CI/CD, and Data Apps/ML/AI tools. I'm a huge fan of both companies and really think these are the two players that will be left standing at the end of the Big Data OS wars.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Glad you enjoyed it! I think I might finally be able to start filming next week for pltr
@MichaelStephenLau
@MichaelStephenLau Жыл бұрын
@@SeattleDataGuy Please do help us understand better from a technical/professional perspective on how Palantir solutions stack up against others in the market (Databricks, Snowflake, AWS, Google, etc.).
@gardnmi
@gardnmi 2 жыл бұрын
I've used both. Snowflake is toast. Serverless Clusters will take away the pain of managing clusters and they are making really fast improvement on delta lake which will reduce a lot of the common pains of filter, joins, and updates in spark.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Oh boy! I want the competition though. The users win in that world
@Practicalinvestments
@Practicalinvestments Жыл бұрын
@@SeattleDataGuy this is all foreign language but I am an investor who’s been very closely watching this company and it sounds top notch from what you say
@jaserogers997
@jaserogers997 6 ай бұрын
This didn't age well.
@zonezero3290
@zonezero3290 Ай бұрын
Thank you for sharing this!
@SeattleDataGuy
@SeattleDataGuy 4 күн бұрын
My pleasure!
@mikenashtech
@mikenashtech Жыл бұрын
Great vid Ben. Like the way, you give tips with commercial thinking behind it. Thanks Mike
@surfh3r0
@surfh3r0 9 ай бұрын
I'm new on the subject, well explained!
@SeattleDataGuy
@SeattleDataGuy 8 ай бұрын
glad you found it helpful!
@raphaeldwain7834
@raphaeldwain7834 2 жыл бұрын
Very useful. Thanks.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
You're welcome!
@mashagalitskaia8642
@mashagalitskaia8642 4 ай бұрын
a really cool introduction, thanks a lot!
@SeattleDataGuy
@SeattleDataGuy 4 күн бұрын
Thank you! Glad you liked it!
@BuyNLarge_
@BuyNLarge_ 3 ай бұрын
🎯 Key Takeaways for quick navigation: 01:36 *🚀 Databricks offers managed Spark services along with other tools like Delta Lake and MLflow, providing options for data processing and model deployment.* 03:56 *🏠 Databricks and Snowflake both promote the concept of data lake houses, combining data warehouse and data lake functionalities, but with different emphases on use cases.* 05:37 *🛠️ Key components of Databricks include workspaces, notebooks, tables, clusters, jobs, and libraries, providing an integrated environment for data processing and analysis.* 09:33 *📊 Databricks simplifies the transition from notebooks to production by allowing users to create jobs directly from their notebooks, enabling seamless integration and scheduling.* 11:13 *🌟 Databricks facilitates easier productionization of data science workflows compared to alternatives like Snowflake, with integrated features like job creation and version control.* Made with HARPA AI
@SeattleDataGuy
@SeattleDataGuy 3 ай бұрын
thanks for this!
@wrburggraaf
@wrburggraaf Жыл бұрын
How would you compare this to SAS Viya? It seems like this is more for building a data lakehouse whereas SAS is primarily for data analysis and analytics (so it might connect to data bricka to get the data). Could you also do data analysis and analyrics well in data bricks?
@manticomar1146
@manticomar1146 Жыл бұрын
Good work
@venkateshkothapalli
@venkateshkothapalli 2 жыл бұрын
Love Databricks!
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
Seems to be a decent amount of love for it!
@marklambrecht
@marklambrecht Жыл бұрын
We need to have you do a video about SAS Viya too!
@AliTwaij
@AliTwaij 2 жыл бұрын
Thank you
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
You're welcome!
@hotpeppermovie
@hotpeppermovie 2 жыл бұрын
Databricks is awesome. Its so easy to work with
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
It really is!
@femaledeer
@femaledeer Жыл бұрын
Video didn't explain what databrick does. When was table show being built ?
@DavidKoleckar
@DavidKoleckar 3 ай бұрын
that's high quality vid, thx :]
@SeattleDataGuy
@SeattleDataGuy 4 күн бұрын
Glad you liked it!
@stelluspereira
@stelluspereira 7 ай бұрын
Enjoyed our video, thx Do you know how to install databricks &PySpark LOCALLY( on laptop ) & code & test locally Perhaps a video will be appreciated by the community WITHOUT depending on AWS/Azure
@zacharythatcher7328
@zacharythatcher7328 Жыл бұрын
To me this just looks like a great way to encourage developers to deploy untested code. Are there any testing pipelines built in that prevent job deployment prior to passing tests?
@gautam3305
@gautam3305 Жыл бұрын
Testing in data field is completely not like typical software engineering cicd unit testing
@nguyetdang111
@nguyetdang111 3 ай бұрын
Is Databricks different from Azure Databricks?
@ganonymous8448
@ganonymous8448 2 жыл бұрын
Great product overall, but sucks that you can’t use Airflow on it.
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
I could have sworn I saw a partnership between them and astronomer
@fenderbender28
@fenderbender28 2 жыл бұрын
Their Workflows orchestrator got super powerful and is easier than airflow imo
@VicusBass
@VicusBass Жыл бұрын
Whenever it's Databricks there a Romanian around :)
@cerberus1321
@cerberus1321 9 ай бұрын
Its the same as Domino
@SeattleDataGuy
@SeattleDataGuy 4 ай бұрын
there are some similar features but it is kind of an apples and oranges comparison
@mikipatel2434
@mikipatel2434 9 ай бұрын
Great video ruined by the annoying background music. The background music was really distracting and annoying. But very good information. Thank you
@joshi1q2w3e
@joshi1q2w3e 2 жыл бұрын
I hate Databricks, would rather use Snowflake.
@michaeld9682
@michaeld9682 2 жыл бұрын
Why?
@SeattleDataGuy
@SeattleDataGuy 2 жыл бұрын
I would also love to know why!
@sevegarza
@sevegarza 2 жыл бұрын
Agreed. Snowflake + Prefect = 😇
@joshi1q2w3e
@joshi1q2w3e 2 жыл бұрын
@@SeattleDataGuy a couple reasons: 1. A lot of times when I’m building a data pipeline there’s a lot of SQL queries I need to write to just analyze the data before I start creating certain metrics and to just start understanding the data. Because of Databricks 1000 row limit this is harder to do. If those tables were in a RDBMS or Snowflake I wouldn’t feel as hindered with regards to this very common task. I know it may seem weird to some people but sometimes just being able to see more of your data and scroll through just helps; maybe this is a Junior Data Engineer thing idk. 2. Idk if this is easier in Snowflake but passing a Parameter from a widget into SparkSQL was a pain in the ass in Databricks, the only reason I figured it out was because a notebook written by someone else did the same thing. We use Azure Synapse notebooks and I like that much more than Databricks as well; it was easier to do some of the same things.
@gautam3305
@gautam3305 Жыл бұрын
@@joshi1q2w3e I understand data discovery is key aspect before modeling, but that can be achieved by using groups by, limit, distinct, windowing etc, don't need to print million rows and export to excel for that.
Получилось у Вики?😂 #хабибка
00:14
ХАБИБ
Рет қаралды 6 МЛН
Master Databricks and Apache Spark Step by Step: Lesson 1 - Introduction
32:23
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
Рет қаралды 259 М.
If I could give advice to myself when starting as a data engineer
11:14
Seattle Data Guy
Рет қаралды 4,8 М.
What is Databricks? The Data Lakehouse You've Never Heard Of
5:22
How It Happened
Рет қаралды 122 М.
Databricks Tutorial [Full Course] 💥
32:37
learn by doing it
Рет қаралды 27 М.
Why Everyone Cares About Snowflake
11:41
Seattle Data Guy
Рет қаралды 105 М.
Azure Databricks Tutorial | Data transformations at scale
28:35
Adam Marczak - Azure for Everyone
Рет қаралды 375 М.
Data Lakehouses Explained
8:51
IBM Technology
Рет қаралды 80 М.
Apache Spark / PySpark Tutorial: Basics In 15 Mins
17:16
Greg Hogg
Рет қаралды 140 М.
когда повзрослела // EVA mash
0:40
EVA mash
Рет қаралды 3 МЛН
Достали существо из под земли
0:29
RICARDO
Рет қаралды 1,1 МЛН
The end for King Kong's bully #funny
1:00
Sơn Hero
Рет қаралды 9 МЛН
The clown snatched the child's pacifier.#Short #Officer Rabbit #angel
0:26
Щенок Нашёл Маму 🥹❤️
0:31
ДоброShorts
Рет қаралды 6 МЛН