Building our first PySpark Application using Jupyter Notebook!

Building our first PySpark Application using Jupyter Notebook! | PySpark Tutorial

Рет қаралды 14,609

Күн бұрын

Пікірлер: 25

@patrickwheeler7107 5 ай бұрын

I have tried several SPARK series and never get very far. I have gone through all of yours in a row so far and think you do a really good job. Thanks for putting this together, cheers!

@SidIndian082 Жыл бұрын

Excellent Lecture Sir ,,. Truly Adorable ...

@ampcode 11 ай бұрын

Thank you so much! Subscribe for more content 😊

@albertopedro8632 Жыл бұрын

Wonderful, i'm not english native, sooo the way i´ve been understand all sessions, top top ! U are the greatest thanks for sharing with us. From Angola

@ampcode 11 ай бұрын

Thank you so much! Subscribe for more content 😊

@sriponnirealestates3259 4 ай бұрын

very useful as beginner and clear,cut explanation

@avinash7003 Жыл бұрын

RuntimeError: Java gateway process exited before sending its port number -- how to solve?

@riomorder Жыл бұрын

Very useful for me I have databricks in my job but I want to practice my queries in personal laptop thanks to you I know how

@ampcode 11 ай бұрын

Thank you so much! Subscribe for more content 😊

@sachindubey4315 Жыл бұрын

i like the way you are explaining the code .

@ampcode 11 ай бұрын

Thank you so much! Subscribe for more content 😊

@ashishveer4591 Жыл бұрын

How to run spark application on cluster ??

@jankipatel118 Жыл бұрын

I can not download the csv file. Can you please check why or give website link so that we can directly download from that website.

@ampcode Жыл бұрын

Sorry for late response. Sure I’ll check the URL and provide you the updated one.

@mahendranaidu8758 Жыл бұрын

--------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[14], line 2 1 data_2=data.select("industry","value").\ ----> 2 filter(Col("value")>1000).\ 3 orderBy(desc("value")) NameError: name 'Col' is not defined

@xx-pn7it 10 ай бұрын

same error are getting me also how can i fix them

@pramodgupta5492 10 ай бұрын

Try using data.value as you not created or imported 'col' anywhere in the code. data2 = data.select('industry', 'value').filter(data.value > 10000).orderBy('value') also, easy way to create dataframe is data = spark.read.csv('operations_management.csv', inferSchema=True, header=True) Not sure why instructor went with different way which is making code look more complex.

@varshaamuruganandam 8 ай бұрын

@@pramodgupta5492 Thanks, this worked!!! also 'desc' is not working

@patrickwheeler7107 5 ай бұрын

I had the same error. I used this to get it to work. #Start SparkSesson First from pyspark.sql import SparkSession from pyspark.sql.functions import col data_2 = data.select("industry", "value").\ filter(col("value") > 10000).\ orderBy("value", ascending = [False])

@flosrv3194 10 ай бұрын

its throwing me errors from everywhere claiming col and desc are not recognized names. How damn can you make your app work without issue ??

@robyp 3 ай бұрын

if you look at the cell numbers you see he removed the lines where he imports the symbols :/

@nayanagrawal9878 Жыл бұрын

My Spark is considering all the values of the header as String. root |-- description: string (nullable = true) |-- industry: string (nullable = true) |-- level: string (nullable = true) |-- size: string (nullable = true) |-- line_code: string (nullable = true) |-- value: string (nullable = true I have written same code as you have done in the video. #Creating DataFrame # as our dataset already had header, therefore, we provided inferSchema as True and header as true data = spark.read.format('csv').\ option('inferScheme', 'true').\ option('header', 'true').\ option('path','operations_management.csv').\ load() Can anyone please help?