Пікірлер
@use-lucky
@use-lucky Күн бұрын
how can i get this ppt?
@manjunathb.n7465
@manjunathb.n7465 2 күн бұрын
Hi Sir, Thanks for all your videos. Can you please make video on how to read private API data incrementally with credentials and bearer token using databricks?
@rajasdataengineering7585
@rajasdataengineering7585 Күн бұрын
Hi Manju, sure will create a video on this requirement soon
@yash.1th
@yash.1th 2 күн бұрын
Hi Sir, Can you please start the Unity Catalogue series.
@rajasdataengineering7585
@rajasdataengineering7585 2 күн бұрын
Hi Yash, sure, will start soon
@suryasabulalmathew1331
@suryasabulalmathew1331 5 күн бұрын
Hi Sir, Can you tell in these examples you have shown, why many jobs are created for each of the join query you have executed. I have understood the stages, explain plan and the DAG. But, the number of jobs part is not clear for me, can you shed some light on it.
@amiyarout217
@amiyarout217 6 күн бұрын
nice
@rajasdataengineering7585
@rajasdataengineering7585 6 күн бұрын
Thanks! Keep watching
@HariMuppa
@HariMuppa 6 күн бұрын
Your explanation is greatly appreciated.
@rajasdataengineering7585
@rajasdataengineering7585 6 күн бұрын
Glad it was helpful! Keep watching
@avirupmukherjee2080
@avirupmukherjee2080 8 күн бұрын
Hi Raja, thanks for explaining. Just wanted to check that first cluster you created JOB cluster however later on you created two all purpose cluster. Could you please explain why you have taken all purpose instead of job cluster
@AnujRastogi-m6v
@AnujRastogi-m6v 9 күн бұрын
I Required more sql and pyspark problems for practice is there any paid course you have which i can purchase for practice only
@TejaswiniKulkarni-g9s
@TejaswiniKulkarni-g9s 9 күн бұрын
Can you please Add video related to unity catalog feature in Databricks.
@milind6217
@milind6217 11 күн бұрын
Hello @rajasdataengineering7585, could you please share the csv files used in the course, the course provides excellent information and if the files are available it would really be helpful.
@saifahmed7843
@saifahmed7843 11 күн бұрын
Greetings, Is this sufficient for Databricks Certified Associate Developer exam? Appreciate any clarity.
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Yes, I have covered almost all the topics. If you understand all the concepts I explained in this channel, that's more than sufficient
@saifahmed7843
@saifahmed7843 11 күн бұрын
@@rajasdataengineering7585 Thank you so much for the info.
@rajasdataengineering7585
@rajasdataengineering7585 10 күн бұрын
Welcome
@raviyadav2552
@raviyadav2552 11 күн бұрын
I found the explanation very detailed, grt work ,keep it up sir
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Thank you, Ravi! Keep watching
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 12 күн бұрын
hi raja can you cover all streaming related usecases wrt spark and databricks?
@rajasdataengineering7585
@rajasdataengineering7585 11 күн бұрын
Sure Shivam, will create videos soon
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 12 күн бұрын
great video raja
@rajasdataengineering7585
@rajasdataengineering7585 12 күн бұрын
Thank you! Keep watching
@saayamprakash8832
@saayamprakash8832 12 күн бұрын
Hi sir, can we use groupby dept and take average and then filter using Having clause?
@tusharagarwal3553
@tusharagarwal3553 13 күн бұрын
Where can we get these notebook..please share link for all the notebook.. so that we can revise
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 14 күн бұрын
great
@rajasdataengineering7585
@rajasdataengineering7585 14 күн бұрын
Thank you! Keep watching
@AnjanaDevi-e4s
@AnjanaDevi-e4s 15 күн бұрын
Really awesome explanation, no video taught difference btw above 3 with much clarity, thank you...
@rajasdataengineering7585
@rajasdataengineering7585 14 күн бұрын
Thank you! Keep watching
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 15 күн бұрын
need separate series on Spark Streaming
@rajasdataengineering7585
@rajasdataengineering7585 14 күн бұрын
Sure, will create soon
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 16 күн бұрын
simpler way: df_flattened=df.select("*",explode("Employee").alias("new_emp"))\ .drop("Employee")\ .select("Department","new_emp.emp_name","new_emp.salary","new_emp.yrs_of_service","new_emp.Age") df_flattened.show()
@HariprasanthSenthilkumar
@HariprasanthSenthilkumar 16 күн бұрын
In step 2 (Databricks cell.no 7) While Executing filterDF , you are concatenating all columns of both source and target and checking for the equivalent string to filter . But in this, columns having different values in both source and target can still can give equivalent string and that data should be updated in final table but it will be filtered out by the given condition. Example : dim1=100,dim2=201,dim=300,dim=400 Target_dim1=100,Target_dim2=20,Target_dim=1300,Target_dim=400 Concat value will be 100201300400 . So in this case it will be filtered out without updating
@WB_Tom
@WB_Tom 16 күн бұрын
I learned that Databricks use datelakehouse and delta lake format, in that case if i create or add files to DBFS then it will convert into delta lake or it'll be as it is, and DBFS is delta lake house?
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 17 күн бұрын
hi raja, can you also add detailed spark streaming section ? thanks your content is great!!
@rajasdataengineering7585
@rajasdataengineering7585 17 күн бұрын
Hi Shivam, sure I will cove advanced concepts in spark streaming. I have already covered the basic in one of the previous video
@AjithM-h5q
@AjithM-h5q 17 күн бұрын
i have chosen my carrier path by learning and watching these videos Raja..Thanks Much
@rajasdataengineering7585
@rajasdataengineering7585 17 күн бұрын
All the best! Keep watching
@AjithM-h5q
@AjithM-h5q 17 күн бұрын
Hi All sessions are well organized and detailed explanation can we have any notes for these topics to summarize or revise it for learning purposes. Thanks Much and great work
@wolfguptaceo
@wolfguptaceo 18 күн бұрын
Sir, why did you switch from Databricks Community Edition to Azure in this video?
@rajasdataengineering7585
@rajasdataengineering7585 18 күн бұрын
Hi, all features are not available in community edition so used azure databricks to cover all important features
@wolfguptaceo
@wolfguptaceo 18 күн бұрын
@@rajasdataengineering7585 Hi sir, thanks for clarifying
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 19 күн бұрын
df_ans=df.withColumn("new_id",when((col("id")%2!=0) & (col("id")!=df.count()),col("id")+1)\ .when(col("id")%2==0,col("id")-1)\ .otherwise(col("id")))\ .drop("id")\ .orderBy(col("new_id"))\ .show() df.createOrReplaceTempView("students") spark.sql( ''' select *, case when id%2==0 then id-1 when id%2!=0 and id!=(select count(*) from students) then id+1 else id end as new_id from students order by ''' ).show()
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 20 күн бұрын
my sol: window_base=Window.orderBy('date') Dataframe API df_t=df.withColumn("diff", dense_rank().over(window_base)- dense_rank().over(window_base.partitionBy("status")))\ .groupBy("status","diff").agg(min("date").alias("start_date")\ ,(max("date").alias("end_date")))\ .orderBy("start_date")\ .show() Spark SQL 1 df.createOrReplaceTempView("match") spark.sql(''' with cte as( select *, dense_rank() over(order by date)- dense_rank() over(partition by status order by date) as diff from match) select status,diff, min(date) as start, max(date) as end from cte group by status,diff order by start ''' ).show() Spark SQL 2 df.createOrReplaceTempView("match") spark.sql( ''' with cte as( select *, dense_rank() over(order by date) as rn1, dense_rank() over(partition by status order by date) as rn2, dense_rank() over(order by date) -dense_rank() over(partition by status order by date) as diff from match) select a.status,max(a.start_date) as start_date,max(a.end_date) as end_date from (select date,status,diff, min(date) over(partition by status,diff ) as start_date, max(date) over(partition by status,diff ) as end_date from cte order by date) a group by a.status,a.diff order by start_date asc ''' ).show() enjoy
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
Thank you for sharing your approach
@amiyarout217
@amiyarout217 20 күн бұрын
great explanation
@rajasdataengineering7585
@rajasdataengineering7585 19 күн бұрын
Glad it was helpful! Keep watching
@aiswaryakraj4210
@aiswaryakraj4210 20 күн бұрын
data = [ ("2020-06-01","Won"), ("2020-06-02","Won"), ("2020-06-03","Won"), ("2020-06-04","Lost"), ("2020-06-05","Lost"), ("2020-06-06","Lost"), ("2020-06-07","Won") ] df=spark.createDataFrame(data,['event_date','event_status'])
@devmaharaj4640
@devmaharaj4640 21 күн бұрын
Hello, your videos inspire to choose Data Engineering as a Career, why you stopped making Interview questions videos ?
@rajasdataengineering7585
@rajasdataengineering7585 21 күн бұрын
Hello, thanks for your comment! I have covered almost all possible questions so that I stopped. Will revisit once again and cover any missing topics
@lanyofrancis1195
@lanyofrancis1195 21 күн бұрын
Thanks for the videos and nice explanation, I have a question. Default partition size if 128 MB, so for 2 GB RDD file, no of partitions created will be 16. Here in your example are you changing the default size of partition to 10 MB instead of default 128 MB? Please correct if I am missing anything?
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 22 күн бұрын
If after reading and writing all csv file we again upload the same 5 csv file will it process it again or skip it after checking the checkpointing metadata?
@rk-ej9ep
@rk-ej9ep 22 күн бұрын
Such a great info. Awesome..
@rajasdataengineering7585
@rajasdataengineering7585 22 күн бұрын
Glad it was helpful! Keep watching
@saturdaywedssunday
@saturdaywedssunday 23 күн бұрын
Hi Anna, nice to see your videos and you do reply to all our doubts. One thing related to this transformations applied here, we haven't done any kind of partioning here yet right just read data and how come we are confirming these as narrow and wide transformations. Will partioning happen by default once data is read? Do clarify anna.
@MichaelAdebayo-y4n
@MichaelAdebayo-y4n 23 күн бұрын
Would watching these videos be enough to help pass the Databricks Certified Associate Developer for Apache Spark 3.0 - Scala exam?
@rajasdataengineering7585
@rajasdataengineering7585 23 күн бұрын
Yes most of the concepts are covered in this channel.
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 24 күн бұрын
great
@rajasdataengineering7585
@rajasdataengineering7585 23 күн бұрын
Thanks
@RajBalaChauhan-b4w
@RajBalaChauhan-b4w 24 күн бұрын
Thank you for such clarity. But I have a query - As Catalyst Optimizer will consider the broadcast join itself if a table is small enough to fit in memory, even if we haven't performed any broadcast join. So, is it really going to help us out in performance optimization? Or the performance will remain same only even after applying broadcast join?
@rajasdataengineering7585
@rajasdataengineering7585 23 күн бұрын
Catalyst optimiser won't apply broadcast join by default. Either we need to apply manually or adaptive query execution needs to be enabled (AQE is enabled for recent spark versions)
@chaitanyanagare757
@chaitanyanagare757 24 күн бұрын
Thanks Raja,,
@rajasdataengineering7585
@rajasdataengineering7585 24 күн бұрын
Welcome!
@chaitanyanagare757
@chaitanyanagare757 24 күн бұрын
Great video content.. Thank you so much
@rajasdataengineering7585
@rajasdataengineering7585 24 күн бұрын
Glad you liked it! Keep watching
@PraveenKumar-ev1uv
@PraveenKumar-ev1uv 24 күн бұрын
How to get the opprtunity to work on databricks with pyspark..what all real time scenarios to get started with?
@ShivamGupta-wn9mo
@ShivamGupta-wn9mo 25 күн бұрын
great playlist
@rajasdataengineering7585
@rajasdataengineering7585 24 күн бұрын
Thank you
@nitinpandey4857
@nitinpandey4857 26 күн бұрын
How does spark.read differ from spark.load?
@Prakash-r9o2w
@Prakash-r9o2w 28 күн бұрын
Hi sir if possible can you provide CSV files what you have used, thank you.
@rabink.5115
@rabink.5115 29 күн бұрын
i believe, now autoloader can be also applied as a batch processing so, it can have triggering effect for whenever files get arrived.
@rabink.5115
@rabink.5115 Ай бұрын
Hi Raja, do you have these code stored in github?
@varalaxmi1742
@varalaxmi1742 Ай бұрын
Hi, In this video you say serializing and deserializing overhead is there for off heap memory, but in previous videos you said, we can avoid serialization and deserialization in off heap memory. Could you please clarify which one is correct?
@ramvrikshsharma7724
@ramvrikshsharma7724 Ай бұрын
can you please make one video on DLT.
@rajasdataengineering7585
@rajasdataengineering7585 Ай бұрын
I have covered basics of DLT in another playlist
@ankitachaturvedi1138
@ankitachaturvedi1138 Ай бұрын
This interview series is really helpful. I haven't worked much on databricks but these videos are giving great insight of internal working & concepts. I am able to crack interviews. Thanks a lot for such informative videos!!
@rajasdataengineering7585
@rajasdataengineering7585 Ай бұрын
Glad to hear this! Keep watching
@amiyarout217
@amiyarout217 Ай бұрын
use chatgpt for sample dataset creation