20 Data Caching in Spark | Cache vs Persist | Spark Storage Level with Persist |Partial Data Caching

  Рет қаралды 3,109

Ease With Data

Ease With Data

Күн бұрын

Пікірлер: 14
@reslleygabriel
@reslleygabriel 11 ай бұрын
Excellent content in this playlist! Thanks for sharing and keep up the good work 🚀
@sureshraina321
@sureshraina321 11 ай бұрын
Nice job and can you please provide more details on serialized and deserialized when dealing with cache/persist in upcoming lectures ?
@mohammedshoaib1769
@mohammedshoaib1769 11 ай бұрын
Thanks. Your explanation is too good. Keep making such videos. Also, if possible, make some videos on scenario based interview questions
@nishantsoni9330
@nishantsoni9330 5 ай бұрын
one of the best explanation in depth, Thanks :) Could you please make a video on "end to end Data engineering" project, from requirement gathering to the deployment.
@easewithdata
@easewithdata 5 ай бұрын
Thanks ❤️ Please make sure to share with your network on LinkedIn 🛜
@ComedyXRoad
@ComedyXRoad 5 ай бұрын
thanks for your efforts it helps lot
@easewithdata
@easewithdata 5 ай бұрын
Thanks ❤️ Please make sure to share with your network over LinkedIn 🛜
@at-cv9ky
@at-cv9ky 10 ай бұрын
as already mentioned in a comment, pls make a video on ser/deserialization of the objects
@easewithdata
@easewithdata 9 ай бұрын
will definitely try.
@sayantabarik4252
@sayantabarik4252 9 ай бұрын
I have one query, Cache() is equal to persist(pyspark.StorageLevel.MEMORY_AND_DISK). Only difference in this scenario is that cache() uses deserialized and persist used serialized data. So, if persist is better in terms of data serialization and functionality, what is the use case of using cache over persist ?
@easewithdata
@easewithdata 9 ай бұрын
You already have the answer in your question, for cache the data is already de serialized thus no hassle but in persist the data is serialized and need to be deserialized before processing.
@sayantabarik4252
@sayantabarik4252 9 ай бұрын
@@easewithdata Got it.. Thank you for the explanation !! I went through all the videos in this playlist. I really loved it !!
@VikasChavan-v1c
@VikasChavan-v1c 7 ай бұрын
Consider you have a orders dataframe with 25 million records now you applied a projection and a filter and cached this dataframe as shown below orders_df.select("order_id","order_status").filter("order_status == 'CLOSED'").cache() Now you execute the below statements... 1) orders_df.select("order_id","order_status").filter("order_status == 'CLOSED'").count() 2) orders_df.filter("order_status == 'CLOSED'").select("order_id","order_status").count() 3) orders_df.select("order_id").filter("order_status == 'CLOSED'").count() 4) orders_df.select("order_id","order_status").filter("order_status == 'OPEN'").count() please answer the below queries... question 1) what point of time the data is cached (partially/completely) ? question 2) Which all queries serves your request from the cache, and which all will have to go to the disk. Please explain.
@easewithdata
@easewithdata 6 ай бұрын
As you have already written the complete query, why not just try it out and share the result with us.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 17 МЛН
Как Я Брата ОБМАНУЛ (смешное видео, прикол, юмор, поржать)
00:59
黑天使只对C罗有感觉#short #angel #clown
00:39
Super Beauty team
Рет қаралды 31 МЛН
Cache Systems Every Developer Should Know
5:48
ByteByteGo
Рет қаралды 523 М.
23. Databricks | Spark | Cache vs Persist | Interview Question | Performance Tuning
18:56
Jupyter Notebooks vs Python Scripts | When to Use Which?
13:07
ArjanCodes
Рет қаралды 50 М.
What is Database Sharding?
9:05
Anton Putra
Рет қаралды 65 М.
So Cute 🥰 who is better?
00:15
dednahype
Рет қаралды 17 МЛН