Thanks for the series of videos. Best of all that can be found in KZbin
@loisf507911 ай бұрын
Extremely useful teaching approach and content, thank you so much! I've found lessons 22 and 23 to be especially relevant at this stage, but I listened to all of the preceding videos, which filled in a lot of holes I had in my understanding. Great stuff!
@BryanCafferky11 ай бұрын
Great! Thanks for the feedback.
@tsri1873 жыл бұрын
Thank you Bryan for the series of videos on Databricks and Spark. I like the way how you elaborate and explain the concepts which makes it easy to understand for beginners like me trying to get into data engineering. Thanks again keep up the good work.
@BryanCafferky3 жыл бұрын
YW. Thanks for watching.
@boxiangwang2 жыл бұрын
The session of Spark Dataframe writer is not clear.
@BryanCafferky2 жыл бұрын
Ok. What was not clear? Can you be specific please?
@hmishra85243 жыл бұрын
You are the best , we were egarly waiting for this looking forward for more ,Thanks 😀
@anandmahadevanFromTrivandrum6 ай бұрын
Just wondering if I am supposed to know Pandas before embarking on this? I don't recall a prior lesson on Pandas, but Brian you make references to Pandas on more than one occasion!
@BryanCafferky6 ай бұрын
Yes. Pandas is the defacto Python library to perform data analysis. Most other data wrangling libraries try to follow the pandas API. An excellent investment would be to learn pandas.
@anandmahadevanFromTrivandrum6 ай бұрын
@@BryanCafferky Brian, Brian, Brian, you sending me down another rabbit hole now? JK- I appreciate your reply and insight and have defly been learning from your videos!
@BryanCafferky6 ай бұрын
@@anandmahadevanFromTrivandrum Check out this awesome free book by the author of pandas. wesmckinney.com/book/
@rydmerlin2 жыл бұрын
What is the equivalent to using exists and with clauses in Spark SQL?
@BryanCafferky2 жыл бұрын
I think it is this docs.databricks.com/sql/language-manual/functions/exists.html
@rydmerlin2 жыл бұрын
And what about the with clause? That is extremely useful when simplifying complex queries.
@BryanCafferky2 жыл бұрын
@@rydmerlin Yes. I think I cover with and common table expressions in the Spark SQL topics.
@Raaj_ML3 жыл бұрын
Bryan, that "Databricks is smart enough.....restate the query for visualization" part is not clear to me...Can you please explain what's that ?
@BryanCafferky3 жыл бұрын
If you click on the visual in the notebook, a grabber arrow appears in the lower right corner and you can drag to resize the visual. For a Python library coded visual, there are configuration settings that can reduce the size Python renders. See stackabuse.com/change-figure-size-in-matplotlib/
@JasonZhang-se2jo2 жыл бұрын
Hello Bryan, Thank you for you video again, could you help to advise what are the differences of using sparksql and sql with pyspark, for me , it seems that they both could deal with the spark dataframe with sql clause. is the only difference would be that sparksql is spark native runtime, while pyspark is interacting with sparkcore via the dataframe API. It would be very appreciated , if you could instruct on this.
@BryanCafferky2 жыл бұрын
You're welcome. Actually, when you execute SQL from PySpark you are calling the Spark SQL API so same performance as just calling Spark SQL. Also, the Spark SQL and PySpark dataframe API are closely intertwined and both go though the Spark performance optimizer. Either way, they both will generally perform well.
@granand3 жыл бұрын
Teacher I need . Thanks Guru Bryan
@eugenezhelezniak5392 жыл бұрын
Hi Bryan - thanks for the video. I don't really understand why one would want to use pyspark sql vs just using sparksql. Are there use cases where it makes more sense? It seems like it would be significantly easier to just write run the sparksql code in a very intuitive and familiar way, and then convert the result to a dataframe. Am I missing something?
@BryanCafferky2 жыл бұрын
They are the same. There actually is no SQL console and languages like R and Python call Spark SQL via the function sql() or saprk.sql() but the SQL query is passed to the Spark SQL API. It's a great way to get a dataframe back and Spark SQL persistence of data allows you to share data between different types of notebook cells like R, Python, and SQL.. Make sense?