Пікірлер
@manikanta-zq1yg
@manikanta-zq1yg 9 сағат бұрын
thank you for all your efforts😊😊
@rajasdataengineering7585
@rajasdataengineering7585 9 сағат бұрын
Glad you like it! Thanks for your comment
@vinayveerabhadra7280
@vinayveerabhadra7280 Күн бұрын
Hi @Raja, You are executing Action command .count() in cell 12 itself, and on cell 14 you have executed one more action command .collect(). Please correct me if I am wrong. And in cell 14 there is a stage skipped for Job 23, is it because you execute a action command in cell 12
@rajasdataengineering7585
@rajasdataengineering7585 Күн бұрын
Hi Vinay, yes thats right
@vinayveerabhadra7280
@vinayveerabhadra7280 23 сағат бұрын
@@rajasdataengineering7585 Perfect! I’d like to express my appreciation for your tremendous effort in creating this playlist and offering it for free. Thank you very much! 🙏
@rajasdataengineering7585
@rajasdataengineering7585 22 сағат бұрын
Thanks for your comment!
@sorathiyasmit8602
@sorathiyasmit8602 Күн бұрын
Your content is very good can you provide pdf of ppt
@quiet8691
@quiet8691 Күн бұрын
Kindly reply Where to find all your notebooks and PPT Thanks
@quiet8691
@quiet8691 2 күн бұрын
Sir where to find notebooks and ppt notes.
@quiet8691
@quiet8691 2 күн бұрын
Sir where to get the PPT and code of this lectures. It will be much benificial. Thanks
@ourgourmetkitchen1774
@ourgourmetkitchen1774 3 күн бұрын
you could use "unpivot": data = [ (2017, 2,1,1,2), (2018, 3,1,3,2), (2019,3,1,1,3) ] columns = ["year", "wimbledon", "fr_open", "us_open", "au_open"] df = spark.createDataFrame(data, schema=columns) result = df.unpivot("year", ["wimbledon", "fr_open", "us_open", "au_open"], "tournament", "wins").show()
@venkatasai4293
@venkatasai4293 4 күн бұрын
Good video Raja . Could you please make a video on liquid clustering with example illustrating the difference with normal partitioning .
@rajasdataengineering7585
@rajasdataengineering7585 4 күн бұрын
Hi Venkat, good suggestion. Sure will create a video on liquid clustering soon
@dipanjanpan1
@dipanjanpan1 4 күн бұрын
Can we create multiple executor node on a worker node?
@rajasdataengineering7585
@rajasdataengineering7585 4 күн бұрын
Yes we can. Executor is logical division of computing resources
@shivachaitanyachinna9819
@shivachaitanyachinna9819 4 күн бұрын
Thanks for providing indepth knowledge about these topics. Amazing.
@rajasdataengineering7585
@rajasdataengineering7585 4 күн бұрын
Glad you like them! My pleasure!
@kartikjaiswal8923
@kartikjaiswal8923 5 күн бұрын
insightful and precise
@rajasdataengineering7585
@rajasdataengineering7585 5 күн бұрын
Glad it is helpful! Thanks for your comment
@cantcatchme8368
@cantcatchme8368 5 күн бұрын
How to trigger this workflow from adf?
@rajasdataengineering7585
@rajasdataengineering7585 5 күн бұрын
You can trigger only the notebook from ADF. Databricks workflows can be scheduled within databricks itself. Still if you need to trigger from ADF, rest API's are provided by databricks which can be used using ADF web activity
@cantcatchme8368
@cantcatchme8368 5 күн бұрын
@@rajasdataengineering7585 I need to trigger a notebook which had the program to run the workflows using jobid and other parameters.. I can trigger the base notebook explained above from adf by passing jobid params. Can u pls confirm is this possible? If so how
@NikhilaMarripati
@NikhilaMarripati 5 күн бұрын
where can i find the dataset
@mateen161
@mateen161 5 күн бұрын
Very helpful. Thank you!
@rajasdataengineering7585
@rajasdataengineering7585 5 күн бұрын
Glad it was helpful! You are welcome
@sravankumar1767
@sravankumar1767 7 күн бұрын
Superb explanation Raja 👌 👏 👍
@rajasdataengineering7585
@rajasdataengineering7585 7 күн бұрын
Thank you so much 🙂
@RaviY-o6r
@RaviY-o6r 7 күн бұрын
After applying optimise command the behaviour is same with out enable deletion vector. What use case we use deletion vector.
@rajasdataengineering7585
@rajasdataengineering7585 7 күн бұрын
Yes that's right. But we usually don't run optimize command frequently and its not recommended also. So the use cases are like where we need to manipulate the data frequently. This is going to big boost to performance and storage
@venkatasai4293
@venkatasai4293 4 күн бұрын
@@rajasdataengineering7585why it was not recommend to use optimize command ? Any overhead ?
@rajasdataengineering7585
@rajasdataengineering7585 4 күн бұрын
Optimize command is rearranging the data files which is costlier operation. So it's not recommended to run so frequently. When we accumulate significant amount of data from previous run, we can run optimize command
@akashghadage5377
@akashghadage5377 7 күн бұрын
Thanks!
@rajasdataengineering7585
@rajasdataengineering7585 7 күн бұрын
Welcome!
@siddavatamvenugopalreddy9686
@siddavatamvenugopalreddy9686 7 күн бұрын
Is this all tutorials related to spark only? Or it includes data bricks aswell? Please confirm
@rajasdataengineering7585
@rajasdataengineering7585 7 күн бұрын
It's more on databricks which is internally using apache Spark
@zubairmushtaq7912
@zubairmushtaq7912 7 күн бұрын
read from azure sql db and write it again in azure sql db please make a video on it
@rajasdataengineering7585
@rajasdataengineering7585 7 күн бұрын
Sure, will create a video on this requirement
@SanjayKumar-rw2gj
@SanjayKumar-rw2gj 7 күн бұрын
It is more of similar to map or starmap function of python
@rajasdataengineering7585
@rajasdataengineering7585 7 күн бұрын
Yes that's right. But transform gives better performance compared to map
@manikanta-zq1yg
@manikanta-zq1yg 10 күн бұрын
Thank you
@rajasdataengineering7585
@rajasdataengineering7585 10 күн бұрын
You're welcome
@manikanta-zq1yg
@manikanta-zq1yg 10 күн бұрын
great work
@rajasdataengineering7585
@rajasdataengineering7585 10 күн бұрын
Thank you! Cheers!
@prathapganesh7021
@prathapganesh7021 10 күн бұрын
Awesome video thank you so much
@rajasdataengineering7585
@rajasdataengineering7585 10 күн бұрын
Glad you liked it! Thank you
@user-wy5tl2ev2r
@user-wy5tl2ev2r 10 күн бұрын
one such good video with neat explanation.
@rajasdataengineering7585
@rajasdataengineering7585 10 күн бұрын
Thank you
@TK-ht9ky
@TK-ht9ky 10 күн бұрын
Hello Raja,unable to see the commands clearly...this is causing so much effort to understand.
@rajasdataengineering7585
@rajasdataengineering7585 10 күн бұрын
Apologies, in later videos it's sorted out
@auhumanmedium
@auhumanmedium 11 күн бұрын
Hi Sir, Thanks for sharing your knowledge. I have a question. In previous video you have mentioned that Data is stored as columnar storage. What is the purpose of distribution ? This is contradicting -- How can the data be stored in columnar and distribution basis.? Please help me in this...
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Hi, very good question. Columnar storage is while storing the data in disk or data Lake storage. While processing the data, data would be transferred from storage to memory within compute nodes of data warehousing cluster. While loading the data into memory, how it can be distributed across is referred as distribution concept such as hash, round robin and replicated distribution
@prajanna9696
@prajanna9696 11 күн бұрын
Hai..sir .i am interested to join your training center.. else OR can you please send Azure dataengineerng content videos at cost..
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Hi Rajanna, thanks for contacting. I don't have any training program, will let you know
@sravankumar1767
@sravankumar1767 11 күн бұрын
Superb explanation Raja 👌 👏 👍
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Thank you so much 🙂
@sagarvarma3919
@sagarvarma3919 11 күн бұрын
Sir, can you please come up with video what all new things included from spark 3.0 to 3.4 as a summary video
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Hi Sagar, sure I will create a video on this requirement.
@ChillHeal-tx1ln
@ChillHeal-tx1ln 11 күн бұрын
Really helpful
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Glad you think so! Thanks
@dataanalyst3210
@dataanalyst3210 12 күн бұрын
please add one project production ready which live
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Sure, I will create a project for this requirement
@nikhilhake93
@nikhilhake93 12 күн бұрын
@baigrais6451
@baigrais6451 12 күн бұрын
Thank you for this video. Can I use ADF rather then workflow in databricks? as we can use databricks activity in ADF if i am not wrong.
@rajasdataengineering7585
@rajasdataengineering7585 12 күн бұрын
Yes ADF is good choice for orchestration and scheduling
@tanushreenagar3116
@tanushreenagar3116 14 күн бұрын
perfect video sir
@rajasdataengineering7585
@rajasdataengineering7585 14 күн бұрын
Thank you!
@ranaumershamshad
@ranaumershamshad 14 күн бұрын
I read data from a huge table in Azure SQL DB and wrote it to ADLS. It created one file of 900 MB instead of partitions. Is there any parameter we can change to create the partitions?
@deepanshuaggarwal7042
@deepanshuaggarwal7042 15 күн бұрын
Very useful video
@rajasdataengineering7585
@rajasdataengineering7585 15 күн бұрын
Glad you think so! Thank you
@dianadai4616
@dianadai4616 15 күн бұрын
Do you have your codes posted somewhere? It is very important for us to follow along
@dineshpandey5008
@dineshpandey5008 16 күн бұрын
Very good use case, but I have one doubt and wan to your suggestion, if there are more than 100 column then still we have to mention all column name in xxhash64() method. or any other way to do it
@deepanshuaggarwal7042
@deepanshuaggarwal7042 16 күн бұрын
Can you please explain in the video why these many job and stages are created. To understand internal working of spark is very necessary for optimisation purpose
@mihirpatel2512
@mihirpatel2512 17 күн бұрын
Nice Videos! can you share the slides?
@prabakardurairaj3297
@prabakardurairaj3297 17 күн бұрын
God bless u..Keep serving the needy.Thanks a lot
@rajasdataengineering7585
@rajasdataengineering7585 17 күн бұрын
So nice of you! Thanks, keep watching
@Jayalakshmi-r9t
@Jayalakshmi-r9t 18 күн бұрын
how to get that IP address , foe me it was not visible
@Jayalakshmi-r9t
@Jayalakshmi-r9t 18 күн бұрын
how to get that ip address . i did not find while logging. please can you say
@rajasdataengineering7585
@rajasdataengineering7585 18 күн бұрын
You can get it from command prompt using ipconfig command
@adityaarbindam
@adityaarbindam 18 күн бұрын
excellent explanation Raja ..very insightful
@rajasdataengineering7585
@rajasdataengineering7585 18 күн бұрын
Glad you liked it! Keep watching
@adityaarbindam
@adityaarbindam 18 күн бұрын
is it you Kartik ? i am guessing because of the way you uses notepad++🙂
@rajasdataengineering7585
@rajasdataengineering7585 18 күн бұрын
No, this is Raja
@prathapganesh7021
@prathapganesh7021 18 күн бұрын
Nice explanation thank you
@rajasdataengineering7585
@rajasdataengineering7585 18 күн бұрын
Glad you liked it! Keep watching
@sugunanindia
@sugunanindia 19 күн бұрын
you are life saver always.. thank you
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
Happy to help! You are most welcome!
@vishnupriyaarora
@vishnupriyaarora 19 күн бұрын
Hello Raja, what's the best way to reach you?
@user-dv6if4xd4m
@user-dv6if4xd4m 19 күн бұрын
thank you anna may lord shiva bless you with all happy
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
Thank you!
@VipinYadav-ii1ow
@VipinYadav-ii1ow 19 күн бұрын
Just starting to learn spark and databrics. Is this resource is enough to crack entry level data engineering job?
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
Yes definitely, these videos are more than enough to crack entry level job
@tobitikare4152
@tobitikare4152 20 күн бұрын
Hi Raja! Thanks for this video! You are very good at explaining the way the delta table works. One comment though is that you didn't mention in this video or make it clear that a new data file and a new log file is created when you delete a record.
@rajasdataengineering7585
@rajasdataengineering7585 20 күн бұрын
Thanks for your comment! Keep watching