Optimize read from Relational Databases using Spark

Рет қаралды 4,862

The Big Data Show

Күн бұрын

Пікірлер: 22

@DURGESHKUMAR-gd5in 2 жыл бұрын

Way of teaching is awesome bro 🤞

@Vrishtimohan5222 2 жыл бұрын

It's very tough to make these videos, after all day job. Hats off to your determination. Really inspirational 😇

@TheBigDataShow 9 ай бұрын

Thank you Amrita 🥳🎉🎊

@gadgetswisdom9384 2 жыл бұрын

nice video keep it up

@princeyjaiswal45 2 жыл бұрын

Great👍

@DURGESHKUMAR-gd5in 2 жыл бұрын

Hi Ankur , Durgesh this side 🙂

@shubhambadaya 2 жыл бұрын

thank you

@shreyakatti5070 9 ай бұрын

Amazing Video.

@TheBigDataShow 9 ай бұрын

Thank you Shreya for your kind words

@mranaljadhav8259 Жыл бұрын

Thanks alot sir for making such a awesome video...Keep uploading more videos..waiting for more such videos

@nupoornawathey100 8 ай бұрын

Only video on YT to explain these parameters well, thanks Ankur !! I had a query. For example, we have 10 partitions, lowerBound=0, upperBound=10000 and provide fetchSize as 1000. Will fetchSize be used as limit 1000 here ? Say one partitioned sql have more rows than fetchSize what may happen here ?

@dataenthusiast_ Жыл бұрын

Great Explanation Ankur So In production scenario, ideally we have to calculate the min max of the bounds at run time right. we cannot hardcode the lowerbound upperbound.

@TheBigDataShow Жыл бұрын

Yes correct. Most of the developers write smart code to dynamically determine these lower or upper case

@shivanshudhawan7714 Жыл бұрын

@@TheBigDataShow I actually did the same, reading from mysql (1053 tables- some really big some medium and some small) and writing them to databricks raw layer. What I did, I was programmatically getting lower and upper bound for the tables and then using them to read the data parallely, in that case my total hits to the source db are doubled. Any advice you can provide on this?

@kamalnayan9157 2 жыл бұрын

Great!

@RohanKumar-mh3pt Жыл бұрын

hello sir this is very helpful can you please make video regarding what kind of question they asked in data pipeline design round and what are the possible way to handle such questions

@TheBigDataShow 9 ай бұрын

Please check the Data Engineering Mock Interview playlist. We have recorded more than 25 Data Engineering mock interviews..

@dpatel9 Жыл бұрын

This is very useful example. Is there any way to optmize writing/insertion into SQL tables where we have millions of rows in dataframe???

@kalpeshswain8207 2 жыл бұрын

I have doubts here, when we deal with tables from databases, we can use inner bound and outer bound.....but when we read flat files like CSV , can we use inner bound and outerbound

@TheBigDataShow 2 жыл бұрын

Use if it is a file format then use columnar file formats like Parquet, ORC or row based file formats like AVRO. It will help you in the predicate pushdown and help you to fetch your column more quickly. CSV files are row based format and it is a very simple format. It is not recommended to store big data in CSV.

@TheBigDataShow 2 жыл бұрын

Check my article for understanding Parquet, ORC and Avro www.linkedin.com/feed/update/urn:li:activity:6972381746185064448?