Hi Ankush, Thanks for the video, I have one query. suppose if I am using Persist(StorageLevel.DISK_ONLY), then how will it improve Spark application performance because if this application will need this data again then it will have to read from DISK only, so there will be more I/O operations with the disks and as we all know spark doesn't do unnecessary I/O operations with the disks and it is the main reason why Spark is better than MapReduce.
@learnomate4 жыл бұрын
Simple example - you may have one relatively great RDD rdd1 and one smalled RDD rdd2. You want to store both of them. If you apply persist MEMORY_AND_DISK on both, then both of them will be spilled to disk resulting in slower reaed. But you may take a different approach - you may store rdd1 with DISK_ONLY. It may just so happen that thanks to this move you can store rdd2 right in the memory with cache() option and you will be able to read it faster.
@DS-bo5wu4 жыл бұрын
@@learnomate Thanks for the clarification
@pardeep6574 жыл бұрын
Hi Ankush, how long the cached data will survive in memory, does it automatically gets removed when the session ends?