Hi Anthony, You've been busy this weekend! Good for you. I took the 4th weekend off. :-) Working on the next PySpark video currently. Thanks
@sadeshsewnath62983 жыл бұрын
Like the explanation!
@ammarahmed59817 ай бұрын
Awesome series.
@BryanCafferky6 ай бұрын
Thank You!
@Pasdpawn Жыл бұрын
you are the best Bryan
@Raaj_ML3 жыл бұрын
Bryan, thanks for the series. But expecting more explanation on parallelize, partitions etc which seem to be the very purpose of using Spark. Many training videos just explain the pyspark code for reading parsing Data frames etc...but how to really parallelize a big data ? What are partitions and how to partition ? Can you please explain these more ?
@BryanCafferky3 жыл бұрын
When you use Spark SQL and the dataframe/dataset API, Spark does parallelize the work for you automatically. If you want to force partitioning, you can save data in parquet organized by partition. I think this topic needs a series of its own and agree it is worth covering. Here are some blogs you may find useful on this. towardsdatascience.com/3-methods-for-parallelization-in-spark-6a1a4333b473 luminousmen.com/post/spark-partitions
@Raaj_ML3 жыл бұрын
@@BryanCafferky Thanks a lot. You are doing a great work. Waiting for more.
@ayeshasarwar6152 жыл бұрын
great job
@annukumar75003 жыл бұрын
Golden Content and a grand series! quick question, What would be the difference between a simple SQL statement and a pyspark's spark-sql statement? Both seem to launch spark jobs when executed in databricks. Would they both leverage distributed computing?
@BryanCafferky3 жыл бұрын
Spark SQL is an API that can be called from languages like Python, R, and Scala. Datarbricks notebooks expose SQL directly so you can execute SQL statements without using a different language. When you execute in a Python cell, sark.sql('select * from mytable') . Its just running Spark SQL. When you use PySpark methods, they use the same spark dataframe classes as SQL so really use the same code under the covers. I even suspect the PySpark methods are translated into SQL prior to executing but have not confirmed this. All three forms run on the cluster nodes. Make sense?
@annukumar75003 жыл бұрын
@@BryanCafferky Makes perfect sense. This exact piece was missing from my lego blocks! Thank you!
@itsshehri Жыл бұрын
Hey brayn thankyou so much for this series. I have a question whats the difference between spark session and spark context.
@BryanCafferky Жыл бұрын
I took this from a blog but there we so many pop up ads, I'll not give the link. " Since earlier versions of Spark or Pyspark, SparkContext (JavaSparkContext for Java) is an entry point to Spark programming with RDD and to connect to Spark Cluster, Since Spark 2.0 SparkSession has been introduced and became an entry point to start programming with DataFrame and Dataset."
@itsshehri Жыл бұрын
@@BryanCafferky thankyou so much for your reply. so we use spark sessions since spark 2.0 and not spark context anymore?
@AnisIMANI-r9y Жыл бұрын
19:28 so funny 😂
@sudipbala96473 жыл бұрын
How would i upload a txt file ??
@BryanCafferky3 жыл бұрын
See video 9 in the series. It covers that but with CSV files. Same thing. kzbin.info/www/bejne/g2mcnWeugd94fac
@sudipbala96473 жыл бұрын
@@BryanCafferky yes sir . i have been watching your series. I see no option to upload .txt files. There are only CSV, JSON, Avro File types. i am practicing in community edition.
@BryanCafferky3 жыл бұрын
@@sudipbala9647 How about renaming the file to have a csv extension? When I do the walkthru via the Databricks Community Edition GUI, I just get a window to upload any file on my system, no filters. Have you watched video 9 again, it shows you. how to do this. You go to Data, Create Table, then click on Drag File to Upload of Click to Browse. Note this uploads the file but does not create a table from it automatically.