Hi @Raja, You are executing Action command .count() in cell 12 itself, and on cell 14 you have executed one more action command .collect(). Please correct me if I am wrong. And in cell 14 there is a stage skipped for Job 23, is it because you execute a action command in cell 12
@rajasdataengineering7585Күн бұрын
Hi Vinay, yes thats right
@vinayveerabhadra728023 сағат бұрын
@@rajasdataengineering7585 Perfect! I’d like to express my appreciation for your tremendous effort in creating this playlist and offering it for free. Thank you very much! 🙏
@rajasdataengineering758522 сағат бұрын
Thanks for your comment!
@sorathiyasmit8602Күн бұрын
Your content is very good can you provide pdf of ppt
@quiet8691Күн бұрын
Kindly reply Where to find all your notebooks and PPT Thanks
@quiet86912 күн бұрын
Sir where to find notebooks and ppt notes.
@quiet86912 күн бұрын
Sir where to get the PPT and code of this lectures. It will be much benificial. Thanks
@ourgourmetkitchen17743 күн бұрын
you could use "unpivot": data = [ (2017, 2,1,1,2), (2018, 3,1,3,2), (2019,3,1,1,3) ] columns = ["year", "wimbledon", "fr_open", "us_open", "au_open"] df = spark.createDataFrame(data, schema=columns) result = df.unpivot("year", ["wimbledon", "fr_open", "us_open", "au_open"], "tournament", "wins").show()
@venkatasai42934 күн бұрын
Good video Raja . Could you please make a video on liquid clustering with example illustrating the difference with normal partitioning .
@rajasdataengineering75854 күн бұрын
Hi Venkat, good suggestion. Sure will create a video on liquid clustering soon
@dipanjanpan14 күн бұрын
Can we create multiple executor node on a worker node?
@rajasdataengineering75854 күн бұрын
Yes we can. Executor is logical division of computing resources
@shivachaitanyachinna98194 күн бұрын
Thanks for providing indepth knowledge about these topics. Amazing.
@rajasdataengineering75854 күн бұрын
Glad you like them! My pleasure!
@kartikjaiswal89235 күн бұрын
insightful and precise
@rajasdataengineering75855 күн бұрын
Glad it is helpful! Thanks for your comment
@cantcatchme83685 күн бұрын
How to trigger this workflow from adf?
@rajasdataengineering75855 күн бұрын
You can trigger only the notebook from ADF. Databricks workflows can be scheduled within databricks itself. Still if you need to trigger from ADF, rest API's are provided by databricks which can be used using ADF web activity
@cantcatchme83685 күн бұрын
@@rajasdataengineering7585 I need to trigger a notebook which had the program to run the workflows using jobid and other parameters.. I can trigger the base notebook explained above from adf by passing jobid params. Can u pls confirm is this possible? If so how
@NikhilaMarripati5 күн бұрын
where can i find the dataset
@mateen1615 күн бұрын
Very helpful. Thank you!
@rajasdataengineering75855 күн бұрын
Glad it was helpful! You are welcome
@sravankumar17677 күн бұрын
Superb explanation Raja 👌 👏 👍
@rajasdataengineering75857 күн бұрын
Thank you so much 🙂
@RaviY-o6r7 күн бұрын
After applying optimise command the behaviour is same with out enable deletion vector. What use case we use deletion vector.
@rajasdataengineering75857 күн бұрын
Yes that's right. But we usually don't run optimize command frequently and its not recommended also. So the use cases are like where we need to manipulate the data frequently. This is going to big boost to performance and storage
@venkatasai42934 күн бұрын
@@rajasdataengineering7585why it was not recommend to use optimize command ? Any overhead ?
@rajasdataengineering75854 күн бұрын
Optimize command is rearranging the data files which is costlier operation. So it's not recommended to run so frequently. When we accumulate significant amount of data from previous run, we can run optimize command
@akashghadage53777 күн бұрын
Thanks!
@rajasdataengineering75857 күн бұрын
Welcome!
@siddavatamvenugopalreddy96867 күн бұрын
Is this all tutorials related to spark only? Or it includes data bricks aswell? Please confirm
@rajasdataengineering75857 күн бұрын
It's more on databricks which is internally using apache Spark
@zubairmushtaq79127 күн бұрын
read from azure sql db and write it again in azure sql db please make a video on it
@rajasdataengineering75857 күн бұрын
Sure, will create a video on this requirement
@SanjayKumar-rw2gj7 күн бұрын
It is more of similar to map or starmap function of python
@rajasdataengineering75857 күн бұрын
Yes that's right. But transform gives better performance compared to map
@manikanta-zq1yg10 күн бұрын
Thank you
@rajasdataengineering758510 күн бұрын
You're welcome
@manikanta-zq1yg10 күн бұрын
great work
@rajasdataengineering758510 күн бұрын
Thank you! Cheers!
@prathapganesh702110 күн бұрын
Awesome video thank you so much
@rajasdataengineering758510 күн бұрын
Glad you liked it! Thank you
@user-wy5tl2ev2r10 күн бұрын
one such good video with neat explanation.
@rajasdataengineering758510 күн бұрын
Thank you
@TK-ht9ky10 күн бұрын
Hello Raja,unable to see the commands clearly...this is causing so much effort to understand.
@rajasdataengineering758510 күн бұрын
Apologies, in later videos it's sorted out
@auhumanmedium11 күн бұрын
Hi Sir, Thanks for sharing your knowledge. I have a question. In previous video you have mentioned that Data is stored as columnar storage. What is the purpose of distribution ? This is contradicting -- How can the data be stored in columnar and distribution basis.? Please help me in this...
@rajasdataengineering758511 күн бұрын
Hi, very good question. Columnar storage is while storing the data in disk or data Lake storage. While processing the data, data would be transferred from storage to memory within compute nodes of data warehousing cluster. While loading the data into memory, how it can be distributed across is referred as distribution concept such as hash, round robin and replicated distribution
@prajanna969611 күн бұрын
Hai..sir .i am interested to join your training center.. else OR can you please send Azure dataengineerng content videos at cost..
@rajasdataengineering758511 күн бұрын
Hi Rajanna, thanks for contacting. I don't have any training program, will let you know
@sravankumar176711 күн бұрын
Superb explanation Raja 👌 👏 👍
@rajasdataengineering758511 күн бұрын
Thank you so much 🙂
@sagarvarma391911 күн бұрын
Sir, can you please come up with video what all new things included from spark 3.0 to 3.4 as a summary video
@rajasdataengineering758511 күн бұрын
Hi Sagar, sure I will create a video on this requirement.
@ChillHeal-tx1ln11 күн бұрын
Really helpful
@rajasdataengineering758511 күн бұрын
Glad you think so! Thanks
@dataanalyst321012 күн бұрын
please add one project production ready which live
@rajasdataengineering758511 күн бұрын
Sure, I will create a project for this requirement
@nikhilhake9312 күн бұрын
❤
@baigrais645112 күн бұрын
Thank you for this video. Can I use ADF rather then workflow in databricks? as we can use databricks activity in ADF if i am not wrong.
@rajasdataengineering758512 күн бұрын
Yes ADF is good choice for orchestration and scheduling
@tanushreenagar311614 күн бұрын
perfect video sir
@rajasdataengineering758514 күн бұрын
Thank you!
@ranaumershamshad14 күн бұрын
I read data from a huge table in Azure SQL DB and wrote it to ADLS. It created one file of 900 MB instead of partitions. Is there any parameter we can change to create the partitions?
@deepanshuaggarwal704215 күн бұрын
Very useful video
@rajasdataengineering758515 күн бұрын
Glad you think so! Thank you
@dianadai461615 күн бұрын
Do you have your codes posted somewhere? It is very important for us to follow along
@dineshpandey500816 күн бұрын
Very good use case, but I have one doubt and wan to your suggestion, if there are more than 100 column then still we have to mention all column name in xxhash64() method. or any other way to do it
@deepanshuaggarwal704216 күн бұрын
Can you please explain in the video why these many job and stages are created. To understand internal working of spark is very necessary for optimisation purpose
@mihirpatel251217 күн бұрын
Nice Videos! can you share the slides?
@prabakardurairaj329717 күн бұрын
God bless u..Keep serving the needy.Thanks a lot
@rajasdataengineering758517 күн бұрын
So nice of you! Thanks, keep watching
@Jayalakshmi-r9t18 күн бұрын
how to get that IP address , foe me it was not visible
@Jayalakshmi-r9t18 күн бұрын
how to get that ip address . i did not find while logging. please can you say
@rajasdataengineering758518 күн бұрын
You can get it from command prompt using ipconfig command
@adityaarbindam18 күн бұрын
excellent explanation Raja ..very insightful
@rajasdataengineering758518 күн бұрын
Glad you liked it! Keep watching
@adityaarbindam18 күн бұрын
is it you Kartik ? i am guessing because of the way you uses notepad++🙂
@rajasdataengineering758518 күн бұрын
No, this is Raja
@prathapganesh702118 күн бұрын
Nice explanation thank you
@rajasdataengineering758518 күн бұрын
Glad you liked it! Keep watching
@sugunanindia19 күн бұрын
you are life saver always.. thank you
@rajasdataengineering758519 күн бұрын
Happy to help! You are most welcome!
@vishnupriyaarora19 күн бұрын
Hello Raja, what's the best way to reach you?
@user-dv6if4xd4m19 күн бұрын
thank you anna may lord shiva bless you with all happy
@rajasdataengineering758519 күн бұрын
Thank you!
@VipinYadav-ii1ow19 күн бұрын
Just starting to learn spark and databrics. Is this resource is enough to crack entry level data engineering job?
@rajasdataengineering758519 күн бұрын
Yes definitely, these videos are more than enough to crack entry level job
@tobitikare415220 күн бұрын
Hi Raja! Thanks for this video! You are very good at explaining the way the delta table works. One comment though is that you didn't mention in this video or make it clear that a new data file and a new log file is created when you delete a record.