Optimize read from Relational Databases using Spark

  Рет қаралды 4,862

The Big Data Show

The Big Data Show

Күн бұрын

Пікірлер: 22
@DURGESHKUMAR-gd5in
@DURGESHKUMAR-gd5in 2 жыл бұрын
Way of teaching is awesome bro 🤞
@Vrishtimohan5222
@Vrishtimohan5222 2 жыл бұрын
It's very tough to make these videos, after all day job. Hats off to your determination. Really inspirational 😇
@TheBigDataShow
@TheBigDataShow 9 ай бұрын
Thank you Amrita 🥳🎉🎊
@gadgetswisdom9384
@gadgetswisdom9384 2 жыл бұрын
nice video keep it up
@princeyjaiswal45
@princeyjaiswal45 2 жыл бұрын
Great👍
@DURGESHKUMAR-gd5in
@DURGESHKUMAR-gd5in 2 жыл бұрын
Hi Ankur , Durgesh this side 🙂
@shubhambadaya
@shubhambadaya 2 жыл бұрын
thank you
@shreyakatti5070
@shreyakatti5070 9 ай бұрын
Amazing Video.
@TheBigDataShow
@TheBigDataShow 9 ай бұрын
Thank you Shreya for your kind words
@mranaljadhav8259
@mranaljadhav8259 Жыл бұрын
Thanks alot sir for making such a awesome video...Keep uploading more videos..waiting for more such videos
@nupoornawathey100
@nupoornawathey100 8 ай бұрын
Only video on YT to explain these parameters well, thanks Ankur !! I had a query. For example, we have 10 partitions, lowerBound=0, upperBound=10000 and provide fetchSize as 1000. Will fetchSize be used as limit 1000 here ? Say one partitioned sql have more rows than fetchSize what may happen here ?
@dataenthusiast_
@dataenthusiast_ Жыл бұрын
Great Explanation Ankur So In production scenario, ideally we have to calculate the min max of the bounds at run time right. we cannot hardcode the lowerbound upperbound.
@TheBigDataShow
@TheBigDataShow Жыл бұрын
Yes correct. Most of the developers write smart code to dynamically determine these lower or upper case
@shivanshudhawan7714
@shivanshudhawan7714 Жыл бұрын
@@TheBigDataShow I actually did the same, reading from mysql (1053 tables- some really big some medium and some small) and writing them to databricks raw layer. What I did, I was programmatically getting lower and upper bound for the tables and then using them to read the data parallely, in that case my total hits to the source db are doubled. Any advice you can provide on this?
@kamalnayan9157
@kamalnayan9157 2 жыл бұрын
Great!
@RohanKumar-mh3pt
@RohanKumar-mh3pt Жыл бұрын
hello sir this is very helpful can you please make video regarding what kind of question they asked in data pipeline design round and what are the possible way to handle such questions
@TheBigDataShow
@TheBigDataShow 9 ай бұрын
Please check the Data Engineering Mock Interview playlist. We have recorded more than 25 Data Engineering mock interviews..
@dpatel9
@dpatel9 Жыл бұрын
This is very useful example. Is there any way to optmize writing/insertion into SQL tables where we have millions of rows in dataframe???
@kalpeshswain8207
@kalpeshswain8207 2 жыл бұрын
I have doubts here, when we deal with tables from databases, we can use inner bound and outer bound.....but when we read flat files like CSV , can we use inner bound and outerbound
@TheBigDataShow
@TheBigDataShow 2 жыл бұрын
Use if it is a file format then use columnar file formats like Parquet, ORC or row based file formats like AVRO. It will help you in the predicate pushdown and help you to fetch your column more quickly. CSV files are row based format and it is a very simple format. It is not recommended to store big data in CSV.
@TheBigDataShow
@TheBigDataShow 2 жыл бұрын
Check my article for understanding Parquet, ORC and Avro www.linkedin.com/feed/update/urn:li:activity:6972381746185064448?
번쩍번쩍 거리는 입
0:32
승비니 Seungbini
Рет қаралды 182 МЛН
I Sent a Subscriber to Disneyland
0:27
MrBeast
Рет қаралды 104 МЛН
Как Ходили родители в ШКОЛУ!
0:49
Family Box
Рет қаралды 2,3 МЛН
Жездуха 41-серия
36:26
Million Show
Рет қаралды 5 МЛН
7 Database Design Mistakes to Avoid (With Solutions)
11:29
Database Star
Рет қаралды 96 М.
Advancing Spark - Understanding the Spark UI
30:19
Advancing Analytics
Рет қаралды 56 М.
10 recently asked Pyspark Interview Questions | Big Data Interview
28:36
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
Darshil Parmar
Рет қаралды 385 М.
Spark Parallelism using JDBC similar to Sqoop
11:41
Tech Island
Рет қаралды 4,5 М.
Learn Database Normalization - 1NF, 2NF, 3NF, 4NF, 5NF
28:34
Decomplexify
Рет қаралды 2,2 МЛН
번쩍번쩍 거리는 입
0:32
승비니 Seungbini
Рет қаралды 182 МЛН