PySpark Course: Big Data Handling with Python and Apache Spark

  Рет қаралды 24,730

Data Science with Onur

Data Science with Onur

Күн бұрын

Пікірлер: 56
@onurdatascience
@onurdatascience Ай бұрын
Thanks for watching my video! Want to learn more about Data Science? Check out my 40+ video playlist on Data Science Courses and Projects here: kzbin.info/aero/PLTsu3dft3CWiow7L7WrCd27ohlra_5PGH&si=hHbg8KNsi5cQa0vB
@ZaferCan
@ZaferCan 11 ай бұрын
i didn't come from reddit. i was searching for videos about pyspark and youtube recommended this video
@onurdatascience
@onurdatascience 11 ай бұрын
Selamlar, izlediğiniz için teşekkür ederim :)
@onurdatascience
@onurdatascience Жыл бұрын
Hello everyone, I created a data science discord server. You can ask questions, contribute to the discussions and get help from the text channels this server. Additionally, I'll be sharing my new videos in the server, so you can join and never miss any of the content I publish.Thanks for reading! discord.gg/BaVm6Rt4h8
@asparshraj9016
@asparshraj9016 Жыл бұрын
Came here from reddit, thanks for the video good sir. Definitely subbed.❤
@onurdatascience
@onurdatascience Жыл бұрын
Thanks!
@imanitrecruiterineurope4142
@imanitrecruiterineurope4142 9 ай бұрын
Came from reddit. I found the course quite nice and your voice and accent/pace very fitting for the job. + the data presented and explanations were very nice
@onurdatascience
@onurdatascience 9 ай бұрын
Thanks a lot! Thank you
@prof.mangabhai
@prof.mangabhai Жыл бұрын
Came from reddit, loved the video, if possible we want a playlist for hands on of best pyspark coding practices and more in depth explanantions on topics covered, most of the pyspark resources were trash that I found on YT or udemy, but you were great for a 1 hour video, we would like to see some more content on pyspark
@onurdatascience
@onurdatascience Жыл бұрын
Thanks! I will work on that, thanks for watching my video!
@cbarkinozer
@cbarkinozer 2 ай бұрын
Keep them coming. Great video, thank you.
@onurdatascience
@onurdatascience 2 ай бұрын
Thanks for watching!
@onurdatascience
@onurdatascience 2 ай бұрын
Thanks for watching!
@venky3639
@venky3639 Жыл бұрын
came here from reddit , Thanks for the video :) Subscribed
@onurdatascience
@onurdatascience Жыл бұрын
Awesome, thank you!
@uur127
@uur127 Жыл бұрын
Redditten geldim. Eline sağlık kral 💪💪
@onurdatascience
@onurdatascience Жыл бұрын
Çok teşekkürler 🙏
@acidicacid3638
@acidicacid3638 9 ай бұрын
Basarilarinin devamini diliyorum bro guzel guide.
@onurdatascience
@onurdatascience 9 ай бұрын
Çok teşekkürler
@SAI_LINGESH
@SAI_LINGESH 18 күн бұрын
I am getting py4jjavaerror in my local machine setup can you gelp me? When i run the same spark code in data bricks its working fine and no py4jjavaerror
@onurdatascience
@onurdatascience 17 күн бұрын
The Py4JJavaError on your local setup likely comes from mismatched versions of PySpark, Java, or Hadoop, or missing configurations. Make sure your Java version (JDK 8 or 11) and JAVA_HOME are set correctly, and that your PySpark version matches the one in Databricks. Also, check if Hadoop binaries are properly configured if you're using them. If the issue persists, try running your code in a clean environment or share the full error log for more help, thanks for watching!
@SAI_LINGESH
@SAI_LINGESH 17 күн бұрын
@onurdatascience Okay, can I share you the full error message in discord?
@onurdatascience
@onurdatascience 17 күн бұрын
@SAI_LINGESH sure, please send it
@SAI_LINGESH
@SAI_LINGESH 17 күн бұрын
@onurdatascience I have sent sir
@koskoskng
@koskoskng 8 ай бұрын
Great guide! Thanks!
@onurdatascience
@onurdatascience 8 ай бұрын
Thanks for watching!
@deepsuchak.09
@deepsuchak.09 Жыл бұрын
How do we load in some large datasets using pyspark (such that those cannot be loaded in our RAM) ?
@onurdatascience
@onurdatascience Жыл бұрын
Hello, you can begin by setting up a Spark session. After that, read the dataset into a Spark DataFrame with configurations that smartly use disk storage. After that, you can perform your operations without the need to load the entire dataset into RAM.
@glenn1you0
@glenn1you0 Жыл бұрын
Could you cover the java and spark installation and how to scale it beyond your local machine?
@onurdatascience
@onurdatascience Жыл бұрын
Of course, will work on it!
@ankit43055
@ankit43055 8 ай бұрын
Hi Onur! One query - while i m running SparkContext("local","PySparkIntro") -> this statement is running for 7+mins. Can you please help me how to or what to do in this scenario.
@onurdatascience
@onurdatascience 8 ай бұрын
Hello, it maybe due to a few things. Make sure your computer has enough memory and processing power available for Spark to run smoothly. Also, check your network connection because Spark might be trying to access resources over the network during startup. Maybe we can look for other things if these won't fix the problem
@ankit43055
@ankit43055 8 ай бұрын
@@onurdatascience Still facing the same issue.
@onurdatascience
@onurdatascience 8 ай бұрын
Maybe you can install PySpark again, use pip uninstall pyspark and install again with different version
@amanahmed6057
@amanahmed6057 Жыл бұрын
brother you are very pro in pyspark , please do suggest how to become like you
@onurdatascience
@onurdatascience Жыл бұрын
Thanks a lot! You will get better with experience on it
@Levy957
@Levy957 Жыл бұрын
Amazing
@onurdatascience
@onurdatascience Жыл бұрын
Thank you!
@juvaju97
@juvaju97 Жыл бұрын
Hello, thanks for the video, I'm getting an error when I try to run collected_data = squared_rdd.collect() at the start. It says Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. Do you know why is this happening? Thank you!
@onurdatascience
@onurdatascience Жыл бұрын
I'm going to look for this when i get home, i will write if i can find a solution to this
@juvaju97
@juvaju97 Жыл бұрын
@@onurdatascience Thanks for the reply, I found out how to solve it. Just have to include: import findspark findspark.init()
@onurdatascience
@onurdatascience Жыл бұрын
Oh, I just got home and I was going to look for it. I am happy that you found it, sorry for the late reply. Thanks for watching my video!
@ThePerfectsoul77
@ThePerfectsoul77 9 ай бұрын
Hi, I am having same error, not able to resolve, may i please ask how did you fix it pls? thx
@Djgab04100
@Djgab04100 8 ай бұрын
hey man, in the intro your forget to mention java sdk for installation
@onurdatascience
@onurdatascience 7 ай бұрын
You are right, it was already installed on my computer so I forgot about it
@ThePerfectsoul77
@ThePerfectsoul77 9 ай бұрын
Hello, thanks for the video, I'm getting an error when I try to run collected_data = squared_rdd.collect() at the start. It says Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. Do you know why is this happening? Thank you! I couldn't see how they fix it , could you please help !!! thx
@onurdatascience
@onurdatascience 9 ай бұрын
Hello. Can you provide the specific error message? This error usually arises due to issues like misconfigured Spark settings, data compatibility problems, or incorrect Spark function usage. To tackle this, ensure your Spark setup is correct, verify your data and transformations, and check for any specific error messages in the Spark logs. I would love to help if you can provide the error message
@barmalini
@barmalini 7 ай бұрын
I'm getting the same error, very confusing
@barmalini
@barmalini 7 ай бұрын
@@onurdatascience collect_data = squared_rdd.collect() this is the error message I'm getting, I hope it will make more sense to you then it does to me: Py4JJavaError Traceback (most recent call last) Cell In[8], line 1 ----> 1 collect_data = squared_rdd.collect() File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\pyspark dd.py:1833, in RDD.collect(self) 1831 with SCCallSiteSync(self.context): 1832 assert self.ctx._jvm is not None -> 1833 sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) 1834 return list(_load_from_socket(sock_info, self._jrdd_deserializer)) File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\py4j\java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ 1318 args_command +\ 1319 proto.END_COMMAND_PART 1321 answer = self.gateway_client.send_command(command) -> 1322 return_value = get_return_value( 1323 answer, self.gateway_client, self.target_id, self.name) 1325 for temp_arg in temp_args: 1326 if hasattr(temp_arg, "_detach"): File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\pyspark\errors\exceptions\captured.py:179, in capture_sql_exception..deco(*a, **kw) 177 def deco(*a: Any, **kw: Any) -> Any: 178 try: ... at java.base/java.io.DataInputStream.readFully(DataInputStream.java:210) at java.base/java.io.DataInputStream.readInt(DataInputStream.java:385) at org.apache.spark.api.python.PythonRunner$$anon$3.read(PythonRunner.scala:774) ... 32 more
@onurdatascience
@onurdatascience 7 ай бұрын
Can you try to install java development kit, maybe it can be the solution
@barmalini
@barmalini 7 ай бұрын
@@onurdatascience it was my first thought too. i'm on JDK 22, but still getting Py4JJavaError
@miguelsoldevilla4792
@miguelsoldevilla4792 Жыл бұрын
Thanks bro!
@onurdatascience
@onurdatascience Жыл бұрын
Thanks for your comment!
@onurdatascience
@onurdatascience 6 ай бұрын
Thanks for watching! I hope you enjoyed the video. For more Data content you can subscribe to my channel, I share new videos every week. For those looking to expand their skills, check out my Data Science courses on Udemy: Data Science Projects: www.udemy.com/course/data-science-projects-3/?referralCode=AEC736448BA104C3EC3F Data Analysis Interview: www.udemy.com/course/data-analysis-interview/?referralCode=3270B750A08BE82F7994 Python for Data Analytics: www.udemy.com/course/python-for-data-analytics/?referralCode=0782B89299FFF7561184 Machine Learning with Python: www.udemy.com/course/python-machine-learning-course/?referralCode=09455A4817E14D6B83D8 Python Programming: www.udemy.com/course/python-basics-i/?referralCode=3C8A77721CCEA802B372 Time Series Analysis: www.udemy.com/course/python-for-time-series/?referralCode=439816EE7E65C91F1B55 Natural Language Processing (NLP): www.udemy.com/course/python-natural-language-processing/?referralCode=AFFEAA30617CA01E6819 Happy learning!
Fast-track Python for Finance Course | Learn Financial Analysis in 20 Minutes
21:03
Хаги Ваги говорит разными голосами
0:22
Фани Хани
Рет қаралды 2,2 МЛН
요즘유행 찍는법
0:34
오마이비키 OMV
Рет қаралды 12 МЛН
Война Семей - ВСЕ СЕРИИ, 1 сезон (серии 1-20)
7:40:31
Семейные Сериалы
Рет қаралды 1,6 МЛН
Sigma girl VS Sigma Error girl 2  #shorts #sigma
0:27
Jin and Hattie
Рет қаралды 124 МЛН
The ONLY PySpark Tutorial You Will Ever Need.
17:21
Moran Reznik
Рет қаралды 151 М.
Apache Spark Architecture - EXPLAINED!
1:15:10
Databricks For Professionals
Рет қаралды 25 М.
Apache Spark / PySpark Tutorial: Basics In 15 Mins
17:16
Greg Hogg
Рет қаралды 160 М.
PySpark Tutorial for Beginners
48:12
coder2j
Рет қаралды 110 М.
PySpark Tutorial
1:49:02
freeCodeCamp.org
Рет қаралды 1,3 МЛН
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
Рет қаралды 397 М.
PySpark For AWS Glue Tutorial [FULL COURSE in 100min]
1:36:49
Johnny Chivers
Рет қаралды 94 М.
Хаги Ваги говорит разными голосами
0:22
Фани Хани
Рет қаралды 2,2 МЛН