Hi Sir, Thanks for all your videos. Can you please make video on how to read private API data incrementally with credentials and bearer token using databricks?
@rajasdataengineering7585Күн бұрын
Hi Manju, sure will create a video on this requirement soon
@yash.1th2 күн бұрын
Hi Sir, Can you please start the Unity Catalogue series.
@rajasdataengineering75852 күн бұрын
Hi Yash, sure, will start soon
@suryasabulalmathew13315 күн бұрын
Hi Sir, Can you tell in these examples you have shown, why many jobs are created for each of the join query you have executed. I have understood the stages, explain plan and the DAG. But, the number of jobs part is not clear for me, can you shed some light on it.
@amiyarout2176 күн бұрын
nice
@rajasdataengineering75856 күн бұрын
Thanks! Keep watching
@HariMuppa6 күн бұрын
Your explanation is greatly appreciated.
@rajasdataengineering75856 күн бұрын
Glad it was helpful! Keep watching
@avirupmukherjee20808 күн бұрын
Hi Raja, thanks for explaining. Just wanted to check that first cluster you created JOB cluster however later on you created two all purpose cluster. Could you please explain why you have taken all purpose instead of job cluster
@AnujRastogi-m6v9 күн бұрын
I Required more sql and pyspark problems for practice is there any paid course you have which i can purchase for practice only
@TejaswiniKulkarni-g9s9 күн бұрын
Can you please Add video related to unity catalog feature in Databricks.
@milind621711 күн бұрын
Hello @rajasdataengineering7585, could you please share the csv files used in the course, the course provides excellent information and if the files are available it would really be helpful.
@saifahmed784311 күн бұрын
Greetings, Is this sufficient for Databricks Certified Associate Developer exam? Appreciate any clarity.
@rajasdataengineering758511 күн бұрын
Yes, I have covered almost all the topics. If you understand all the concepts I explained in this channel, that's more than sufficient
@saifahmed784311 күн бұрын
@@rajasdataengineering7585 Thank you so much for the info.
@rajasdataengineering758510 күн бұрын
Welcome
@raviyadav255211 күн бұрын
I found the explanation very detailed, grt work ,keep it up sir
@rajasdataengineering758511 күн бұрын
Thank you, Ravi! Keep watching
@ShivamGupta-wn9mo12 күн бұрын
hi raja can you cover all streaming related usecases wrt spark and databricks?
@rajasdataengineering758511 күн бұрын
Sure Shivam, will create videos soon
@ShivamGupta-wn9mo12 күн бұрын
great video raja
@rajasdataengineering758512 күн бұрын
Thank you! Keep watching
@saayamprakash883212 күн бұрын
Hi sir, can we use groupby dept and take average and then filter using Having clause?
@tusharagarwal355313 күн бұрын
Where can we get these notebook..please share link for all the notebook.. so that we can revise
@ShivamGupta-wn9mo14 күн бұрын
great
@rajasdataengineering758514 күн бұрын
Thank you! Keep watching
@AnjanaDevi-e4s15 күн бұрын
Really awesome explanation, no video taught difference btw above 3 with much clarity, thank you...
In step 2 (Databricks cell.no 7) While Executing filterDF , you are concatenating all columns of both source and target and checking for the equivalent string to filter . But in this, columns having different values in both source and target can still can give equivalent string and that data should be updated in final table but it will be filtered out by the given condition. Example : dim1=100,dim2=201,dim=300,dim=400 Target_dim1=100,Target_dim2=20,Target_dim=1300,Target_dim=400 Concat value will be 100201300400 . So in this case it will be filtered out without updating
@WB_Tom16 күн бұрын
I learned that Databricks use datelakehouse and delta lake format, in that case if i create or add files to DBFS then it will convert into delta lake or it'll be as it is, and DBFS is delta lake house?
@ShivamGupta-wn9mo17 күн бұрын
hi raja, can you also add detailed spark streaming section ? thanks your content is great!!
@rajasdataengineering758517 күн бұрын
Hi Shivam, sure I will cove advanced concepts in spark streaming. I have already covered the basic in one of the previous video
@AjithM-h5q17 күн бұрын
i have chosen my carrier path by learning and watching these videos Raja..Thanks Much
@rajasdataengineering758517 күн бұрын
All the best! Keep watching
@AjithM-h5q17 күн бұрын
Hi All sessions are well organized and detailed explanation can we have any notes for these topics to summarize or revise it for learning purposes. Thanks Much and great work
@wolfguptaceo18 күн бұрын
Sir, why did you switch from Databricks Community Edition to Azure in this video?
@rajasdataengineering758518 күн бұрын
Hi, all features are not available in community edition so used azure databricks to cover all important features
@wolfguptaceo18 күн бұрын
@@rajasdataengineering7585 Hi sir, thanks for clarifying
@ShivamGupta-wn9mo19 күн бұрын
df_ans=df.withColumn("new_id",when((col("id")%2!=0) & (col("id")!=df.count()),col("id")+1)\ .when(col("id")%2==0,col("id")-1)\ .otherwise(col("id")))\ .drop("id")\ .orderBy(col("new_id"))\ .show() df.createOrReplaceTempView("students") spark.sql( ''' select *, case when id%2==0 then id-1 when id%2!=0 and id!=(select count(*) from students) then id+1 else id end as new_id from students order by ''' ).show()
@ShivamGupta-wn9mo20 күн бұрын
my sol: window_base=Window.orderBy('date') Dataframe API df_t=df.withColumn("diff", dense_rank().over(window_base)- dense_rank().over(window_base.partitionBy("status")))\ .groupBy("status","diff").agg(min("date").alias("start_date")\ ,(max("date").alias("end_date")))\ .orderBy("start_date")\ .show() Spark SQL 1 df.createOrReplaceTempView("match") spark.sql(''' with cte as( select *, dense_rank() over(order by date)- dense_rank() over(partition by status order by date) as diff from match) select status,diff, min(date) as start, max(date) as end from cte group by status,diff order by start ''' ).show() Spark SQL 2 df.createOrReplaceTempView("match") spark.sql( ''' with cte as( select *, dense_rank() over(order by date) as rn1, dense_rank() over(partition by status order by date) as rn2, dense_rank() over(order by date) -dense_rank() over(partition by status order by date) as diff from match) select a.status,max(a.start_date) as start_date,max(a.end_date) as end_date from (select date,status,diff, min(date) over(partition by status,diff ) as start_date, max(date) over(partition by status,diff ) as end_date from cte order by date) a group by a.status,a.diff order by start_date asc ''' ).show() enjoy
Hello, your videos inspire to choose Data Engineering as a Career, why you stopped making Interview questions videos ?
@rajasdataengineering758521 күн бұрын
Hello, thanks for your comment! I have covered almost all possible questions so that I stopped. Will revisit once again and cover any missing topics
@lanyofrancis119521 күн бұрын
Thanks for the videos and nice explanation, I have a question. Default partition size if 128 MB, so for 2 GB RDD file, no of partitions created will be 16. Here in your example are you changing the default size of partition to 10 MB instead of default 128 MB? Please correct if I am missing anything?
@ShivamGupta-wn9mo22 күн бұрын
If after reading and writing all csv file we again upload the same 5 csv file will it process it again or skip it after checking the checkpointing metadata?
@rk-ej9ep22 күн бұрын
Such a great info. Awesome..
@rajasdataengineering758522 күн бұрын
Glad it was helpful! Keep watching
@saturdaywedssunday23 күн бұрын
Hi Anna, nice to see your videos and you do reply to all our doubts. One thing related to this transformations applied here, we haven't done any kind of partioning here yet right just read data and how come we are confirming these as narrow and wide transformations. Will partioning happen by default once data is read? Do clarify anna.
@MichaelAdebayo-y4n23 күн бұрын
Would watching these videos be enough to help pass the Databricks Certified Associate Developer for Apache Spark 3.0 - Scala exam?
@rajasdataengineering758523 күн бұрын
Yes most of the concepts are covered in this channel.
@ShivamGupta-wn9mo24 күн бұрын
great
@rajasdataengineering758523 күн бұрын
Thanks
@RajBalaChauhan-b4w24 күн бұрын
Thank you for such clarity. But I have a query - As Catalyst Optimizer will consider the broadcast join itself if a table is small enough to fit in memory, even if we haven't performed any broadcast join. So, is it really going to help us out in performance optimization? Or the performance will remain same only even after applying broadcast join?
@rajasdataengineering758523 күн бұрын
Catalyst optimiser won't apply broadcast join by default. Either we need to apply manually or adaptive query execution needs to be enabled (AQE is enabled for recent spark versions)
@chaitanyanagare75724 күн бұрын
Thanks Raja,,
@rajasdataengineering758524 күн бұрын
Welcome!
@chaitanyanagare75724 күн бұрын
Great video content.. Thank you so much
@rajasdataengineering758524 күн бұрын
Glad you liked it! Keep watching
@PraveenKumar-ev1uv24 күн бұрын
How to get the opprtunity to work on databricks with pyspark..what all real time scenarios to get started with?
@ShivamGupta-wn9mo25 күн бұрын
great playlist
@rajasdataengineering758524 күн бұрын
Thank you
@nitinpandey485726 күн бұрын
How does spark.read differ from spark.load?
@Prakash-r9o2w28 күн бұрын
Hi sir if possible can you provide CSV files what you have used, thank you.
@rabink.511529 күн бұрын
i believe, now autoloader can be also applied as a batch processing so, it can have triggering effect for whenever files get arrived.
@rabink.5115Ай бұрын
Hi Raja, do you have these code stored in github?
@varalaxmi1742Ай бұрын
Hi, In this video you say serializing and deserializing overhead is there for off heap memory, but in previous videos you said, we can avoid serialization and deserialization in off heap memory. Could you please clarify which one is correct?
@ramvrikshsharma7724Ай бұрын
can you please make one video on DLT.
@rajasdataengineering7585Ай бұрын
I have covered basics of DLT in another playlist
@ankitachaturvedi1138Ай бұрын
This interview series is really helpful. I haven't worked much on databricks but these videos are giving great insight of internal working & concepts. I am able to crack interviews. Thanks a lot for such informative videos!!