Another little detail: spark can delegate saving of shuffle data to the Hadoop Yarn NodeManager process -which can serve data even after the spark worker process terminates. This allows for more agile spark clusters within a Hadoop cluster. However, with the move to kubernetes container hosting spark serves the data itself and assumes that it won’t terminate. This is potentially a problem with deployment on spot-priced cloud VMs as the “your server is fairly reliable” no longer holds.…