Checkout the Big Data course details here: trendytech.in/?referrer=youtube_bd22
@gebrilyoussef6851Ай бұрын
Sumit, you are the master trainer of Big Data. Thank you so much for all the efforts you made.
@anuragdubey58982 жыл бұрын
Very informative session. Have learnt a lot and even cleared my doubts as well. Easy and simplified way of explanation made it a best video for AWS use in Bigdata. Thanks for the session..
@NaturalPro1004 жыл бұрын
This really cleared some basics I required for starting spark with AWS.Content and explanation is up to the point.Thanks for sharing Sumit.
@NIHAL9604 жыл бұрын
S3: Amazon storage On demand instance : Available on demand Spot instance : Available at high discounts for temporary basis, can be taken back with 2 min warning Reserved instance : Available if commitment is long such as a year at discounted price as compared to on demand Types of nodes: 1. Master Node: This manages the cluster. This is single ec2 instance. 2. Core Node: Each cluster has one or more core node, It hosts data and runs tasks 3. Task Node: This can only run task and not store,, Required if application is compute heavy. Spot instance are good choice for it. Cluster: 1 Transient cluster terminates automatically. 2. long running cluster requires manual termination.
@sumitmittal074 жыл бұрын
Nice Summarization.. thanks much
@sampaar3 жыл бұрын
Amazing presentation. Better than many of the udemy courses that I have come across.
@mdabdulmujeebmalik4224 жыл бұрын
Excellent video on AWS and how to run spark job on AWS. Amazing!!. Thank You so much for the video and kudos to the instructor.
@udaynayak47882 жыл бұрын
one of the best informative session , thank you so much for sharing.
@datoalavista5812 жыл бұрын
Thank you for sharing
@VallabhGhodkeB2 жыл бұрын
Top Stuff this is. just got started way to go
@amitbajpai62094 жыл бұрын
Best video to get an overall understanding of AWS EMR.. It was really helpful 😊 Kudos to the Instructor !! Liked 👍 and Subscribed.. Hoping for more such videos..
@sumitmittal074 жыл бұрын
Glad it was helpful!
@kashamp93882 жыл бұрын
best session ever. concise
@sumitmittal072 жыл бұрын
Glad you are liking the my teaching :)
@divakarluffy37732 жыл бұрын
one video resolved all my doubts , Thanks
@sumitmittal072 жыл бұрын
Happy to hear that your doubts are resolved
@laxmisuresh3 жыл бұрын
Very meaningful presentation. Explained in correct pace and with proper content.
@swaroupbanikk44442 жыл бұрын
BEST
@gauravrai43984 жыл бұрын
Very lucid and concise explanation .... A job well done!
@sumitmittal074 жыл бұрын
Thank you Gaurav
@ririraman72 жыл бұрын
beautiful
@vairammoorthy66654 жыл бұрын
best tutorial for AWS EMR
@sridharreddy96054 жыл бұрын
very clear explaination thank you for your time...
@subratakr53534 жыл бұрын
Thanks for the lovely presentation! Had 2 questions though : 1) When you say your are running code in master do you mean namenode of the cluster ? Where is the namenode for this this cluster ? 2) Since data is stored in S3 does EMR copy it to hdfs and then spark reads from hdfs eventually ? in which hdfs path is the data stored ?
@vijeandran3 жыл бұрын
Answer 1: Here namenode, driver node, edge node and master node all are the same. 2. As soon as you create one master and 2 slave nodes, These slave node's harddisk behaves like HDFS, and spark will fetch files from this disk and run as in memory of the slave nodes. Here 1 master and slave acts as a processing unit that is part inside AWS. S3 also is a part of AWS and it is a storage unit. When you want to process data, you are copying the data file from storage part s3 to processing part HDFS, where HDFS is present in the 2 slave nodes that you created. then use can run spark jar file.
@sohailhosseini22662 жыл бұрын
Thanks for the video!
@pankajnakil61733 жыл бұрын
Very useful & good explanation..
@dineshughade6570 Жыл бұрын
Nice explanation. Can we have a pdf of this video?
@RaviKumar-oy5jq4 жыл бұрын
Excellent session ..
@ramprasadbandari81954 жыл бұрын
Excellent explanation and very useful Info!!
@sumitmittal074 жыл бұрын
Glad you liked it
@puneetnaik87194 жыл бұрын
Great explanation sir..thanks for video.
@shilparathore88494 жыл бұрын
Very well explained thanks for sharing
@Dyslexic_Neuron4 жыл бұрын
very good explanation . Can u make a video on spark shuffle and issues
@vijeandran3 жыл бұрын
Neat explanation.... and very very informative video....
@AparnaBL4 жыл бұрын
Moreover hdfs data is ephemeral right ...if you want the data to exist even after cluster is terminated ...we can use S3
@sumitmittal074 жыл бұрын
absolutely. you can see same thing is mentioned around 24th minute of the session
@AparnaBL4 жыл бұрын
@@sumitmittal07 yeah @ 22:36
@gaurav18254 жыл бұрын
Sir please give some guidance of AWS EMR with Apache Flink and Hudi .
@dharmeswaranparamasivam54984 жыл бұрын
Very good session. Thanks for doing this.
@keyursolanki11 ай бұрын
will there be default allow access to s3 from emr cluster?
@BinduReddy-n1q Жыл бұрын
How to save the wordcount output in HDFS and also in S3.
@sancharighosh82043 жыл бұрын
Can you make some tutorials on Databricks
@amulmgr3 жыл бұрын
thankyou very much for video
@fzgarcia4 жыл бұрын
Thank you, nice presentation!
@diptyojha1744 жыл бұрын
Very nice explanation
@sumitmittal074 жыл бұрын
thank you Dipty
@piby18024 жыл бұрын
Really nice presentation! Thank you!
@sumitmittal074 жыл бұрын
Glad you liked it!
@Naveen-xi7os4 жыл бұрын
it was awesome session
@anuj39223 жыл бұрын
EMR cluster is on hourly rate --if. I don't use it do I still have to pay for it--if I build it just for learning purpose and come back to it as per my learning scope ?
@techtransform3 жыл бұрын
Excellent Explanation :)
@fzgarcia4 жыл бұрын
Do you know if in free tier account I can run a EMR cluster like this? Even if I can only run micro t3 in free tier, I can create a manual cluster with minimum 3 nodes of micro t3 or more nodes? Thanks.
@rrjishan3 жыл бұрын
as we say , on amazon aws we can shut down the cluster after computation and data will be saved in s3 . So, clusters only responsible to compute data? isn't data also stored in clusters. Getting bit confusing..please clear it
@priyabhatia41074 жыл бұрын
Great content!!
@SpiritOfIndiaaa4 жыл бұрын
Thanks , but why hdfs data gone when cluster shutdown ? as hdfs is persistant when cluster is up it would be automatically available right ?
@vijeandran3 жыл бұрын
When you start the cluster you are creating three instances... one for master and two for datanode. These nodes are available only for that session, because it is virtual only for that session, once you terminate the cluster, the instance created 1 master and 2 slave will be killed and due to that data present in the HDFS will be deleted. As summit said if you want to run your cluster continuously then the data would be available in HDFS, where the amazon will put more bill for continuous usage of cluster.
@rajsekhargada92122 жыл бұрын
I think S3 is not distributed storage
@sumitmittal072 жыл бұрын
its a object store, but in this scenario its a replacement of distributed storage and serving similar usecase.
@satishj8012 жыл бұрын
@1:14:30 , he downloaded the jar from S3 but he didn't copy it to hdfs just like he copied book-data.txt and he mentioned he is running the jar from hdfs not from S3 , but its the same step as @ 1:06:52 , I'm bit confused at that point . If some one has understood please drop a reply.
@user-co8oc1rm5w2 жыл бұрын
jar file he kept in root path of the cluster but the file to access for processing that he kept in hdfs e.g the directory which he created named '/data'. thats y he menioned he is running the jar from hdfs because the file to be processed i.e. book-data.txt he downloaded to hdfs in place of s3 then he changed the file location in scala code then recreated jar and placed that jar to s3 first then downloaded that jar from s3 to master node and executed spark job to process the book-data.txt file from hdfs not from s3.