Transient Cluster on AWS from Scratch using boto3 | Trigger Spark job from AWS Lambda

  Рет қаралды 8,151

Knowledge Amplifier

Knowledge Amplifier

Күн бұрын

Пікірлер: 48
@nishantsingh8477
@nishantsingh8477 2 жыл бұрын
Your transient node was amazing.This would be an underrated video
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Thank you Nishant Singh! Happy Learning :-)
@VinayGautam-s8s
@VinayGautam-s8s 3 ай бұрын
How come this video has too less views, this is so well explained. Salute to your effort brother.
@KnowledgeAmplifier1
@KnowledgeAmplifier1 3 ай бұрын
Thank you for your kind words @user-ee6hu4ec5v! Happy Learning
@singhsVP
@singhsVP 11 ай бұрын
Great Explanation and demo, thank you
@KnowledgeAmplifier1
@KnowledgeAmplifier1 11 ай бұрын
You are welcome! If you want to implement the same using Airlfow , have a watch in this video too -- kzbin.info/www/bejne/nnyXnIOsf8aqrJosi=FpmG3YjcAp-nB8Hc Happy Learning!
@parasgadhiya5356
@parasgadhiya5356 2 жыл бұрын
What an effort, Brother... Really Appreciate it.
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Thank you PARAS GADHIYA! Happy Learning
@joseneto6558
@joseneto6558 Жыл бұрын
Thanks a lot. You indian guys are the smarter of the world. I just have a question. How can I choose to use Spot instances when launching the EMR cluster from the Lambda trigger code? Is there a parameter for that?
@alexbessette232
@alexbessette232 2 жыл бұрын
Awesome video, very helpful only note is that when initializing the client all you need is boto.client('emr') aws will automatically do the rest.
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Yes correct Alex , using Lambda roles we can avoid the access key , secret key part..
@amarnadhchithari8992
@amarnadhchithari8992 2 жыл бұрын
very good explanation.....thank you
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
You are welcome Amarnadh Chithari! Happy Learning :-)
@raghavendrahs7695
@raghavendrahs7695 2 жыл бұрын
Thank you so much. This is very helpful. I have one question regarding the job flow. How is Lambda getting triggered automatically after we place the file into the S3 bucket? Can we modify this to just run the job only once a day and process all the files available in the S3 bucket?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Hello Raghavendra H S , answer to your first question -- Ques.)How is Lambda getting triggered automatically after we place the file into the S3 bucket? Ans.)Amazon S3 can send an event to a Lambda function when an object is created or deleted and based on that event-trigger lambda code is executed .For details you can check this link : docs.aws.amazon.com/lambda/latest/dg/with-s3.html answer to your second question -- Ques.)Can we modify this to just run the job only once a day and process all the files available in the S3 bucket? Ans.) Yes , you can do that , you can schedule Lambda using Cloudwatch or Eventbridge . For details , you can check this video -- kzbin.info/www/bejne/aJSvqIaKqKetgLM Hope this will be helpful ! Happy Learning 😊✌
@raghavendrahs7695
@raghavendrahs7695 2 жыл бұрын
@@KnowledgeAmplifier1 Thanks for your quick response. Below details are very helpful and it is very clear to me now.
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
@@raghavendrahs7695 Glad to know the resources were helpful to you ! Happy Learning :-)
@manojt7012
@manojt7012 3 жыл бұрын
Hey bro.. The content was really useful. Thanks a lot
@KnowledgeAmplifier1
@KnowledgeAmplifier1 3 жыл бұрын
Glad it helped! Happy Learning :-)
@atulbisht9019
@atulbisht9019 Жыл бұрын
thank you soo much for this video
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Most welcome 😊
@thekamaalpashashow2091
@thekamaalpashashow2091 2 жыл бұрын
Great video. Top notch content. Thank you so much for these!
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Thank you The Kamaal Pasha Show for your inspiring comment ! Happy Learning
@duskbbd
@duskbbd Жыл бұрын
How to pass 2 spark submit s one after one in same single Cluster I don't want to create Cluster again for 2nd spark submit
@KnowledgeAmplifier1
@KnowledgeAmplifier1 Жыл бұрын
Hello @duskbbd, you can submit two Spark jobs sequentially in the same EMR cluster by using the following approach: Submit the first Spark job using 'spark-submit' on your EMR cluster. After submitting the first job, you can poll its status to check if it has completed. You can use AWS CLI, AWS SDKs, or EMR APIs to monitor the status of the EMR step associated with your Spark job. Once you receive confirmation that the first job has completed, you can then submit the second Spark job using 'spark-submit' on the same EMR cluster. This way, you can run multiple Spark jobs one after the other without the need to create a new EMR cluster for each job. To implement this , you can use AWS Step or Airflow for orchestration , for details you can refer this video -- kzbin.info/www/bejne/nnyXnIOsf8aqrJosi=WqUMzesnPzxiZjWi Happy Learning
@ipsitachatterjee2173
@ipsitachatterjee2173 2 жыл бұрын
Hi Sir , Do you give any classes on AWS and it's integration of other services?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Hello Ipsita ,না, আমি কোন প্রাইভেট ক্লাস রাখি না , but , you can check this link for Complete Snowflake with AWS Course --doc.clickup.com/37466271/d/h/13qc4z-104/d4346819bd8d510 And for Data Engineering with AWS , you can check this playlist -- kzbin.info/aero/PLjfRmoYoxpNopPjdACgS5XTfdjyBcuGku Hope this will be helpful! Happy Learning :-)
@ipsitachatterjee2173
@ipsitachatterjee2173 2 жыл бұрын
@@KnowledgeAmplifier1 great,might post some queries if I face any issue real time
@aniketjadhav32
@aniketjadhav32 Жыл бұрын
Great tutorial
@vishalrana302
@vishalrana302 2 жыл бұрын
Hi, how can i use bootStrapAction parameter in this lambda code
@uvannarayanan1048
@uvannarayanan1048 2 жыл бұрын
"ClientError: An error occurred (ValidationException) when calling the RunJobFlow operation: Invalid InstanceProfile: EMR_EC2_DefaultRole" I am getting this error in the cloudwatch logevents, what should i do?
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
You should be having EMR_EC2_DefaultRole & EMR_DefaultRole in the Roles section in your AWS Account, Please make sure that is there , or else create the roles .. as a shortcut , what you can do , try to launch one EMR Cluster manually , AWS will automatically generate the roles and then terminate the cluster and use those roles in launching transient cluster :-)
@uvannarayanan1048
@uvannarayanan1048 2 жыл бұрын
@@KnowledgeAmplifier1 Thank you for your immediate reply.. now im getting another error, "when calling the RunJobFlow operation: User: arn:aws:iam::671876216699:user/caplambda is not authorized to perform: elasticmapreduce:RunJobFlow on resource: arn:aws:elasticmapreduce:us-east-1:671876216699:cluster/* because no identity-based policy allows the elasticmapreduce:RunJobFlow action" here caplambda is the user that i created to get access key id and secret access key, i have given s3 full access, EMR full access and cloud watch full acces(for the caplambda user).. help me sir
@uvannarayanan1048
@uvannarayanan1048 2 жыл бұрын
@@KnowledgeAmplifier1 Actually after searching it in google, I got to know that we need to add "AmazonElasticMapReduce" policy also, now my cluster is running but I'm not getting the output.. I tried with pyspark commands, for example, customer_data = spark.read.format("csv").option("inferSchema","true").option("header","true").load(s3_location); customer_data = customer_data.withColumn("transactionAmount", customer_data.transactionAmount.substr(2,6)) from pyspark.sql.types import FloatType customer_data = customer_data.withColumn("transactionAmount", customer_data["transactionAmount"].cast(FloatType())) I gave comments like this and the cluster failed.. is there anyother way that i can give pyspark commands or can i upload a python notebook(.ipynb) file? If you have any suggestions pls let me know
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
@@uvannarayanan1048 you can upload the pyspark code in s3 as I demonstrated in the video and then best way to run the code in Transient cluster is first do regression testing of the same code in a dummy persistent EMR Cluster and make sure it is working as expected , then try running that in Transient cluster..
@joseneto6558
@joseneto6558 Жыл бұрын
Fact is that the Ec2KeyName must be the name of an EC2 key pair that you created beforehand.
@joseneto6558
@joseneto6558 Жыл бұрын
Is there anyone with clusters taking too long to start? My clusters are stuck for more than 15 minutes in the "Master: bootstrapping". How can I fix that?
@Gaurav-wy2wm
@Gaurav-wy2wm 2 жыл бұрын
Hey I have tried this but getting error that “Provided region_name ‘us_west_2’ doesn’t match a supported format
@Gaurav-wy2wm
@Gaurav-wy2wm 2 жыл бұрын
Even I have putted correct Ec2SubnetId also
@simonzhang8668
@simonzhang8668 2 жыл бұрын
nice!
@KnowledgeAmplifier1
@KnowledgeAmplifier1 2 жыл бұрын
Thank you Simon Zhang! Happy Learning :-)
@umangsinghal9320
@umangsinghal9320 2 жыл бұрын
I followed your code step by step, cluster is also running but job is getting failed with this error, Exception in thread "main" org.apache.spark.SparkException: Application application_1666264815714_0001 finished with failed status, could you help us to know why this is happening
@jatinkr300
@jatinkr300 10 ай бұрын
If someone knows the ans please share as I am also facing same issue
Implement a CloudWatch Events Rule That Calls an AWS Lambda Function
37:30
Knowledge Amplifier
Рет қаралды 2,5 М.
Человек паук уже не тот
00:32
Miracle
Рет қаралды 3,1 МЛН
КОГДА К БАТЕ ПРИШЕЛ ДРУГ😂#shorts
00:59
BATEK_OFFICIAL
Рет қаралды 6 МЛН
How to submit Spark jobs to EMR cluster from Airflow
14:38
StartDataEngineering
Рет қаралды 12 М.
Intro to Amazon EMR - Big Data Tutorial using Spark
22:02
jayzern
Рет қаралды 29 М.
Configuring a Snowflake Storage Integration to Access Amazon S3
19:18
Knowledge Amplifier
Рет қаралды 10 М.
Bootstrap Action & Managing secrets in AWS EMR PySpark job
32:01
Knowledge Amplifier
Рет қаралды 2,1 М.
Encrypt and Decrypt AWS Lambda Function Environment Variables using AWS KMS
23:54
How to Use AWS Glue with Snowflake | PySpark-Snowflake Connectivity
25:01
Knowledge Amplifier
Рет қаралды 17 М.
Человек паук уже не тот
00:32
Miracle
Рет қаралды 3,1 МЛН