Hello from Spain, One question: When you use Bronze data, the dataframe is called: bronze_df, but when you create the logic it is called delta_bronze, is this a mistake? Thanks a lot
@sureshkondapaturi740326 күн бұрын
Pls provide repo for this
@baharfathalizadeh3945Ай бұрын
It was helpful, thanks
@DataEngineeringToolbox18 күн бұрын
thanks for your comment
@baharfathalizadeh3945Ай бұрын
It was so helpful
@baharfathalizadeh3945Ай бұрын
Thanks
@DataEngineeringToolboxАй бұрын
Welcome
@houstonfirefoxАй бұрын
Good content. Suggestion: Edit out the long pauses and repeated "Uhmmms". I usually have the scripts already written and on another screen ready to copy into the presentation so the learners don't have to wait while I type large sections of code.
@DataEngineeringToolboxАй бұрын
Thank you so much for the feedback and kind words! 😊 I really appreciate your suggestion. You're absolutely right-editing out long pauses and "uhms" can make the video more engaging. I'll definitely work on improving that in future videos. Having the script ready to paste is a fantastic idea, too! It would help keep the flow smooth while still demonstrating the process. Thanks for sharing your tip-I'll give it a try!
@lucasrueda30892 ай бұрын
hi i execute that in vsc and have many errors from enviroments vars
@ll_ashu_ll5 ай бұрын
Please make more videos related to pyspark and databricks
@houstonfirefox5 ай бұрын
Very good video. I would recommend ensuring all syntax errors be edited out or re-recorded so the viewer doesn't get confused. The channel name (Data Engineering Toolbox) is a bit confusing as these function comparisons between SQL Server and PySpark fall under the realm of Data Science. A Data Engineer moves, converts and stores data from system to system whereas a Data Scientist extracts and interprets the data provided by the Data Engineer. A small point to be sure but wanted to be more accurate. In Variance: The avg_rating column returned integer values because the underlying column "review_score" was also an integer. To get the PySpark equivalent of a floating point avg_rating you could change the column type to FLOAT (unnecessary really) or use CONVERT(FLOAT, VAR(review_score)) to return the true (more accurate) Variation complete with decimal places. New sub. I am interested to see even more Data Science equivalent functions in SQL Server that may be native ( i.e; CORR() ) and how to write functions that emulate some of the functionality in PySpark 🙂
@DataEngineeringToolbox5 ай бұрын
thanks for amazing your suggestions
@yashwanthv56049 ай бұрын
Can you provide code github link
@Grover-mb9 ай бұрын
buen video amigo, pero me sale un error que no puedo solucionarlo --------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) Cell In[10], line 14 3 spark = sqlContext.sparkSession \ 4 #.appName("Mi_Aplicacion") \ 5 #.getOrCreate() 6 7 # Tu código de Spark aquí 8 jdbcDF = spark.read.format('jdbc') \ 9 .option('url',url) \ 10 .option('query',query) \ 11 .option('user',user) \ 12 .option('password',password) \ 13 .option('driver',driver) \ ---> 14 .load() 16 spark.stop() # No olvides detener la sesión de Spark al finalizar File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\pyspark\sql eadwriter.py:314, in DataFrameReader.load(self, path, format, schema, **options) 312 return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path))) 313 else: --> 314 return self._df(self._jreader.load()) File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\py4j\java_gateway.py:1322, in JavaMember.__call__(self, *args) 1316 command = proto.CALL_COMMAND_NAME +\ 1317 self.command_header +\ ... at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:1570) Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
@venvilhenrydsilva835410 ай бұрын
"You are trying to pass an insecure Py4j gateway to Spark. This" " is not allowed as it is a security risk." while sc = SparkContext(conf=conf)
@nickoder437410 ай бұрын
parasha
@平凡-p1v11 ай бұрын
code is not clear to follow.
@DataEngineeringToolbox11 ай бұрын
Thanks , I will try to provide better tutorial in the future
@sravankumar176711 ай бұрын
Superb explanation 👌 👏 👍
@DataEngineeringToolbox11 ай бұрын
Thanks
@lucaslira5 Жыл бұрын
Using auto loader it’s not necessary
@平凡-p1v Жыл бұрын
the video is not clear even in full screen mode.
@DataEngineeringToolbox Жыл бұрын
Thanks for the feedback! I apologize for the video quality issue. I'm working on improving it for future videos. Your input is valuable, and I appreciate your understanding