Master Databricks and Apache Spark Step by Step: Lesson 22

Master Databricks and Apache Spark Step by Step: Lesson 22 - PySpark Using SQL

Рет қаралды 9,610

Bryan Cafferky

Күн бұрын

Пікірлер: 23

@harryzhang6907 2 жыл бұрын

Thanks for the series of videos. Best of all that can be found in KZbin

@loisf5079 11 ай бұрын

Extremely useful teaching approach and content, thank you so much! I've found lessons 22 and 23 to be especially relevant at this stage, but I listened to all of the preceding videos, which filled in a lot of holes I had in my understanding. Great stuff!

@BryanCafferky 11 ай бұрын

Great! Thanks for the feedback.

@tsri187 3 жыл бұрын

Thank you Bryan for the series of videos on Databricks and Spark. I like the way how you elaborate and explain the concepts which makes it easy to understand for beginners like me trying to get into data engineering. Thanks again keep up the good work.

@BryanCafferky 3 жыл бұрын

YW. Thanks for watching.

@boxiangwang 2 жыл бұрын

The session of Spark Dataframe writer is not clear.

@BryanCafferky 2 жыл бұрын

Ok. What was not clear? Can you be specific please?

@hmishra8524 3 жыл бұрын

You are the best , we were egarly waiting for this looking forward for more ,Thanks 😀

@anandmahadevanFromTrivandrum 6 ай бұрын

Just wondering if I am supposed to know Pandas before embarking on this? I don't recall a prior lesson on Pandas, but Brian you make references to Pandas on more than one occasion!

@BryanCafferky 6 ай бұрын

Yes. Pandas is the defacto Python library to perform data analysis. Most other data wrangling libraries try to follow the pandas API. An excellent investment would be to learn pandas.

@anandmahadevanFromTrivandrum 6 ай бұрын

@@BryanCafferky Brian, Brian, Brian, you sending me down another rabbit hole now? JK- I appreciate your reply and insight and have defly been learning from your videos!

@BryanCafferky 6 ай бұрын

@@anandmahadevanFromTrivandrum Check out this awesome free book by the author of pandas. wesmckinney.com/book/

@rydmerlin 2 жыл бұрын

What is the equivalent to using exists and with clauses in Spark SQL?

@BryanCafferky 2 жыл бұрын

I think it is this docs.databricks.com/sql/language-manual/functions/exists.html

@rydmerlin 2 жыл бұрын

And what about the with clause? That is extremely useful when simplifying complex queries.

@BryanCafferky 2 жыл бұрын

@@rydmerlin Yes. I think I cover with and common table expressions in the Spark SQL topics.

@Raaj_ML 3 жыл бұрын

Bryan, that "Databricks is smart enough.....restate the query for visualization" part is not clear to me...Can you please explain what's that ?

@BryanCafferky 3 жыл бұрын

If you click on the visual in the notebook, a grabber arrow appears in the lower right corner and you can drag to resize the visual. For a Python library coded visual, there are configuration settings that can reduce the size Python renders. See stackabuse.com/change-figure-size-in-matplotlib/

@JasonZhang-se2jo 2 жыл бұрын

Hello Bryan, Thank you for you video again, could you help to advise what are the differences of using sparksql and sql with pyspark, for me , it seems that they both could deal with the spark dataframe with sql clause. is the only difference would be that sparksql is spark native runtime, while pyspark is interacting with sparkcore via the dataframe API. It would be very appreciated , if you could instruct on this.

@BryanCafferky 2 жыл бұрын

You're welcome. Actually, when you execute SQL from PySpark you are calling the Spark SQL API so same performance as just calling Spark SQL. Also, the Spark SQL and PySpark dataframe API are closely intertwined and both go though the Spark performance optimizer. Either way, they both will generally perform well.

@granand 3 жыл бұрын

Teacher I need . Thanks Guru Bryan

@eugenezhelezniak539 2 жыл бұрын

Hi Bryan - thanks for the video. I don't really understand why one would want to use pyspark sql vs just using sparksql. Are there use cases where it makes more sense? It seems like it would be significantly easier to just write run the sparksql code in a very intuitive and familiar way, and then convert the result to a dataframe. Am I missing something?

@BryanCafferky 2 жыл бұрын

They are the same. There actually is no SQL console and languages like R and Python call Spark SQL via the function sql() or saprk.sql() but the SQL query is passed to the Spark SQL API. It's a great way to get a dataframe back and Spark SQL persistence of data allows you to share data between different types of notebook cells like R, Python, and SQL.. Make sense?