Thanks for watching my video! Want to learn more about Data Science? Check out my 40+ video playlist on Data Science Courses and Projects here: kzbin.info/aero/PLTsu3dft3CWiow7L7WrCd27ohlra_5PGH&si=hHbg8KNsi5cQa0vB
@ZaferCan11 ай бұрын
i didn't come from reddit. i was searching for videos about pyspark and youtube recommended this video
@onurdatascience11 ай бұрын
Selamlar, izlediğiniz için teşekkür ederim :)
@onurdatascience Жыл бұрын
Hello everyone, I created a data science discord server. You can ask questions, contribute to the discussions and get help from the text channels this server. Additionally, I'll be sharing my new videos in the server, so you can join and never miss any of the content I publish.Thanks for reading! discord.gg/BaVm6Rt4h8
@asparshraj9016 Жыл бұрын
Came here from reddit, thanks for the video good sir. Definitely subbed.❤
@onurdatascience Жыл бұрын
Thanks!
@imanitrecruiterineurope41429 ай бұрын
Came from reddit. I found the course quite nice and your voice and accent/pace very fitting for the job. + the data presented and explanations were very nice
@onurdatascience9 ай бұрын
Thanks a lot! Thank you
@prof.mangabhai Жыл бұрын
Came from reddit, loved the video, if possible we want a playlist for hands on of best pyspark coding practices and more in depth explanantions on topics covered, most of the pyspark resources were trash that I found on YT or udemy, but you were great for a 1 hour video, we would like to see some more content on pyspark
@onurdatascience Жыл бұрын
Thanks! I will work on that, thanks for watching my video!
@cbarkinozer2 ай бұрын
Keep them coming. Great video, thank you.
@onurdatascience2 ай бұрын
Thanks for watching!
@onurdatascience2 ай бұрын
Thanks for watching!
@venky3639 Жыл бұрын
came here from reddit , Thanks for the video :) Subscribed
@onurdatascience Жыл бұрын
Awesome, thank you!
@uur127 Жыл бұрын
Redditten geldim. Eline sağlık kral 💪💪
@onurdatascience Жыл бұрын
Çok teşekkürler 🙏
@acidicacid36389 ай бұрын
Basarilarinin devamini diliyorum bro guzel guide.
@onurdatascience9 ай бұрын
Çok teşekkürler
@SAI_LINGESH18 күн бұрын
I am getting py4jjavaerror in my local machine setup can you gelp me? When i run the same spark code in data bricks its working fine and no py4jjavaerror
@onurdatascience17 күн бұрын
The Py4JJavaError on your local setup likely comes from mismatched versions of PySpark, Java, or Hadoop, or missing configurations. Make sure your Java version (JDK 8 or 11) and JAVA_HOME are set correctly, and that your PySpark version matches the one in Databricks. Also, check if Hadoop binaries are properly configured if you're using them. If the issue persists, try running your code in a clean environment or share the full error log for more help, thanks for watching!
@SAI_LINGESH17 күн бұрын
@onurdatascience Okay, can I share you the full error message in discord?
@onurdatascience17 күн бұрын
@SAI_LINGESH sure, please send it
@SAI_LINGESH17 күн бұрын
@onurdatascience I have sent sir
@koskoskng8 ай бұрын
Great guide! Thanks!
@onurdatascience8 ай бұрын
Thanks for watching!
@deepsuchak.09 Жыл бұрын
How do we load in some large datasets using pyspark (such that those cannot be loaded in our RAM) ?
@onurdatascience Жыл бұрын
Hello, you can begin by setting up a Spark session. After that, read the dataset into a Spark DataFrame with configurations that smartly use disk storage. After that, you can perform your operations without the need to load the entire dataset into RAM.
@glenn1you0 Жыл бұрын
Could you cover the java and spark installation and how to scale it beyond your local machine?
@onurdatascience Жыл бұрын
Of course, will work on it!
@ankit430558 ай бұрын
Hi Onur! One query - while i m running SparkContext("local","PySparkIntro") -> this statement is running for 7+mins. Can you please help me how to or what to do in this scenario.
@onurdatascience8 ай бұрын
Hello, it maybe due to a few things. Make sure your computer has enough memory and processing power available for Spark to run smoothly. Also, check your network connection because Spark might be trying to access resources over the network during startup. Maybe we can look for other things if these won't fix the problem
@ankit430558 ай бұрын
@@onurdatascience Still facing the same issue.
@onurdatascience8 ай бұрын
Maybe you can install PySpark again, use pip uninstall pyspark and install again with different version
@amanahmed6057 Жыл бұрын
brother you are very pro in pyspark , please do suggest how to become like you
@onurdatascience Жыл бұрын
Thanks a lot! You will get better with experience on it
@Levy957 Жыл бұрын
Amazing
@onurdatascience Жыл бұрын
Thank you!
@juvaju97 Жыл бұрын
Hello, thanks for the video, I'm getting an error when I try to run collected_data = squared_rdd.collect() at the start. It says Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. Do you know why is this happening? Thank you!
@onurdatascience Жыл бұрын
I'm going to look for this when i get home, i will write if i can find a solution to this
@juvaju97 Жыл бұрын
@@onurdatascience Thanks for the reply, I found out how to solve it. Just have to include: import findspark findspark.init()
@onurdatascience Жыл бұрын
Oh, I just got home and I was going to look for it. I am happy that you found it, sorry for the late reply. Thanks for watching my video!
@ThePerfectsoul779 ай бұрын
Hi, I am having same error, not able to resolve, may i please ask how did you fix it pls? thx
@Djgab041008 ай бұрын
hey man, in the intro your forget to mention java sdk for installation
@onurdatascience7 ай бұрын
You are right, it was already installed on my computer so I forgot about it
@ThePerfectsoul779 ай бұрын
Hello, thanks for the video, I'm getting an error when I try to run collected_data = squared_rdd.collect() at the start. It says Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. Do you know why is this happening? Thank you! I couldn't see how they fix it , could you please help !!! thx
@onurdatascience9 ай бұрын
Hello. Can you provide the specific error message? This error usually arises due to issues like misconfigured Spark settings, data compatibility problems, or incorrect Spark function usage. To tackle this, ensure your Spark setup is correct, verify your data and transformations, and check for any specific error messages in the Spark logs. I would love to help if you can provide the error message
@barmalini7 ай бұрын
I'm getting the same error, very confusing
@barmalini7 ай бұрын
@@onurdatascience collect_data = squared_rdd.collect() this is the error message I'm getting, I hope it will make more sense to you then it does to me: Py4JJavaError Traceback (most recent call last) Cell In[8], line 1 ----> 1 collect_data = squared_rdd.collect() File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\pyspark dd.py:1833, in RDD.collect(self) 1831 with SCCallSiteSync(self.context): 1832 assert self.ctx._jvm is not None -> 1833 sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) 1834 return list(_load_from_socket(sock_info, self._jrdd_deserializer)) File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\py4j\java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\pyspark\errors\exceptions\captured.py:179, in capture_sql_exception..deco(*a, **kw) 177 def deco(*a: Any, **kw: Any) -> Any: 178 try: ... at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774) ... 32 more
@onurdatascience7 ай бұрын
Can you try to install java development kit, maybe it can be the solution
@barmalini7 ай бұрын
@@onurdatascience it was my first thought too. i'm on JDK 22, but still getting Py4JJavaError
@miguelsoldevilla4792 Жыл бұрын
Thanks bro!
@onurdatascience Жыл бұрын
Thanks for your comment!
@onurdatascience6 ай бұрын
Thanks for watching! I hope you enjoyed the video. For more Data content you can subscribe to my channel, I share new videos every week. For those looking to expand their skills, check out my Data Science courses on Udemy: Data Science Projects: www.udemy.com/course/data-science-projects-3/?referralCode=AEC736448BA104C3EC3F Data Analysis Interview: www.udemy.com/course/data-analysis-interview/?referralCode=3270B750A08BE82F7994 Python for Data Analytics: www.udemy.com/course/python-for-data-analytics/?referralCode=0782B89299FFF7561184 Machine Learning with Python: www.udemy.com/course/python-machine-learning-course/?referralCode=09455A4817E14D6B83D8 Python Programming: www.udemy.com/course/python-basics-i/?referralCode=3C8A77721CCEA802B372 Time Series Analysis: www.udemy.com/course/python-for-time-series/?referralCode=439816EE7E65C91F1B55 Natural Language Processing (NLP): www.udemy.com/course/python-natural-language-processing/?referralCode=AFFEAA30617CA01E6819 Happy learning!