*Introduction to Spark Internals* by UC Berkeley AmpLab member Matei Zaharia given Dec 18th 2012 at Yahoo in Sunnyvale, Ca The presentation is 1 hour 14 minutes long. *Summary* 2:32 Spark Project Goals 4:48 Spark Code base Size 5:59 Code base breakdown by module 8:45 Components 10:41 Example Job 12:03 RDD Graph 14:43 Data Locality 15:48 In More Detail: Life of a Job 16:15 Scheduling Process 27:11 RDD Abstraction 27:52 RDD Interface 29:34 Example: HadoopRDD 30:28 Example: FilteredRDD 31:32 Example: JoinedRDD 32:47 Discussion of source code 38:25 Dependency Types, Narrow and Wide 39:49 DAG Scheduler 40:43 Discussion of source code 42:05 Scheduler Optimizations 45:39 Task Details 51:07 Worker 52:00 Other Components: BlockManager 52:16 Other Components: CommunicationsManager 52:24 Other Components: MapOutputTracker 52:42 Extending Spark 52:53 Extension points: RDD, SchedulerBackend, spark.serializer 53:38 What People Have Done 53:39 Possible Future Extensions 54:15 As an Exercise to prepare for extending Spark 54:50 How to contribute 54:52 Development Process: Issue tracking, developer list, master on Github Follow code style and add tests *Spark Documentation* spark-project.org/documentation/ *Spark Downloads and Github URL* spark-project.org/downloads/
@vikassinghalld7 жыл бұрын
Very Good presentation.
@TechGeniusMinds4 жыл бұрын
A Great presentation.
@jorgemeira49479 жыл бұрын
Nice presentation! Thanks!
@anything4139 жыл бұрын
Can you please upload a newer video explaining the changes, if any, that have been incorporated in Spark. Thanks.
@Stoney-g1o9 жыл бұрын
+jayant singh 1. You can find the latest videos at the youtube "Apache Spark" home page. kzbin.info New videos from the European Spark Summit 2015 were posted a few days ago. 2. For basic user tutorials that include videos and course materials you can use the Spark Summit 2014 materials that are here spark-summit.org/2014/training 3. If you are trying to contribute to the project, you should start with documentation and bug fixes. Ensure that you subscribe to the user and developer mailing lists here spark.apache.org/community.html 4. A recent video about the Apache Spark test and build system is here kzbin.info/www/bejne/bafGg6CKe7uaaqc
@anything4139 жыл бұрын
+Stoney Vintson Thanks
@techonlyguo97889 жыл бұрын
mark
@mazenezzeddine83199 жыл бұрын
Matei asserts that: ... Whenever we ship a task we actually ship the RDD objects too and then we call the compute method on them ..... The way we ship task is by sending a scala closure object which is essentially like a java object with pointer to all of these guys .. can someone kindly elaborate more on ' sending a scala closure object which is essentially like a java object with pointer to all of these guys' Thanks.
@Stoney-g1o9 жыл бұрын
+Mazen Ezzeddine This video is about three years old. 1. You might watch Aaron Davidson's internals video which discusses the optimization of the graph into pipeline stages which are deployed to different workers as tasks. ( Spark 1.1 July 2014) kzbin.info/www/bejne/mp6vYYFppsuGmZo 2. Also Olivier Girardot's study guide for developer certification has some good information on this topic. ( Nov 2015)
@mazenezzeddine83199 жыл бұрын
+Stoney Vintson Though, this video is three years old it is one of the most valuable resource on spark internals diving into much details as compared to the video you mentioned that i've already watched. I will check the other resource you routed me into to. Thanks.