Your transient node was amazing.This would be an underrated video
@KnowledgeAmplifier12 жыл бұрын
Thank you Nishant Singh! Happy Learning :-)
@VinayGautam-s8s3 ай бұрын
How come this video has too less views, this is so well explained. Salute to your effort brother.
@KnowledgeAmplifier13 ай бұрын
Thank you for your kind words @user-ee6hu4ec5v! Happy Learning
@singhsVP11 ай бұрын
Great Explanation and demo, thank you
@KnowledgeAmplifier111 ай бұрын
You are welcome! If you want to implement the same using Airlfow , have a watch in this video too -- kzbin.info/www/bejne/nnyXnIOsf8aqrJosi=FpmG3YjcAp-nB8Hc Happy Learning!
@parasgadhiya53562 жыл бұрын
What an effort, Brother... Really Appreciate it.
@KnowledgeAmplifier12 жыл бұрын
Thank you PARAS GADHIYA! Happy Learning
@joseneto6558 Жыл бұрын
Thanks a lot. You indian guys are the smarter of the world. I just have a question. How can I choose to use Spot instances when launching the EMR cluster from the Lambda trigger code? Is there a parameter for that?
@alexbessette2322 жыл бұрын
Awesome video, very helpful only note is that when initializing the client all you need is boto.client('emr') aws will automatically do the rest.
@KnowledgeAmplifier12 жыл бұрын
Yes correct Alex , using Lambda roles we can avoid the access key , secret key part..
@amarnadhchithari89922 жыл бұрын
very good explanation.....thank you
@KnowledgeAmplifier12 жыл бұрын
You are welcome Amarnadh Chithari! Happy Learning :-)
@raghavendrahs76952 жыл бұрын
Thank you so much. This is very helpful. I have one question regarding the job flow. How is Lambda getting triggered automatically after we place the file into the S3 bucket? Can we modify this to just run the job only once a day and process all the files available in the S3 bucket?
@KnowledgeAmplifier12 жыл бұрын
Hello Raghavendra H S , answer to your first question -- Ques.)How is Lambda getting triggered automatically after we place the file into the S3 bucket? Ans.)Amazon S3 can send an event to a Lambda function when an object is created or deleted and based on that event-trigger lambda code is executed .For details you can check this link : docs.aws.amazon.com/lambda/latest/dg/with-s3.html answer to your second question -- Ques.)Can we modify this to just run the job only once a day and process all the files available in the S3 bucket? Ans.) Yes , you can do that , you can schedule Lambda using Cloudwatch or Eventbridge . For details , you can check this video -- kzbin.info/www/bejne/aJSvqIaKqKetgLM Hope this will be helpful ! Happy Learning 😊✌
@raghavendrahs76952 жыл бұрын
@@KnowledgeAmplifier1 Thanks for your quick response. Below details are very helpful and it is very clear to me now.
@KnowledgeAmplifier12 жыл бұрын
@@raghavendrahs7695 Glad to know the resources were helpful to you ! Happy Learning :-)
@manojt70123 жыл бұрын
Hey bro.. The content was really useful. Thanks a lot
@KnowledgeAmplifier13 жыл бұрын
Glad it helped! Happy Learning :-)
@atulbisht9019 Жыл бұрын
thank you soo much for this video
@KnowledgeAmplifier1 Жыл бұрын
Most welcome 😊
@thekamaalpashashow20912 жыл бұрын
Great video. Top notch content. Thank you so much for these!
@KnowledgeAmplifier12 жыл бұрын
Thank you The Kamaal Pasha Show for your inspiring comment ! Happy Learning
@duskbbd Жыл бұрын
How to pass 2 spark submit s one after one in same single Cluster I don't want to create Cluster again for 2nd spark submit
@KnowledgeAmplifier1 Жыл бұрын
Hello @duskbbd, you can submit two Spark jobs sequentially in the same EMR cluster by using the following approach: Submit the first Spark job using 'spark-submit' on your EMR cluster. After submitting the first job, you can poll its status to check if it has completed. You can use AWS CLI, AWS SDKs, or EMR APIs to monitor the status of the EMR step associated with your Spark job. Once you receive confirmation that the first job has completed, you can then submit the second Spark job using 'spark-submit' on the same EMR cluster. This way, you can run multiple Spark jobs one after the other without the need to create a new EMR cluster for each job. To implement this , you can use AWS Step or Airflow for orchestration , for details you can refer this video -- kzbin.info/www/bejne/nnyXnIOsf8aqrJosi=WqUMzesnPzxiZjWi Happy Learning
@ipsitachatterjee21732 жыл бұрын
Hi Sir , Do you give any classes on AWS and it's integration of other services?
@KnowledgeAmplifier12 жыл бұрын
Hello Ipsita ,না, আমি কোন প্রাইভেট ক্লাস রাখি না , but , you can check this link for Complete Snowflake with AWS Course --doc.clickup.com/37466271/d/h/13qc4z-104/d4346819bd8d510 And for Data Engineering with AWS , you can check this playlist -- kzbin.info/aero/PLjfRmoYoxpNopPjdACgS5XTfdjyBcuGku Hope this will be helpful! Happy Learning :-)
@ipsitachatterjee21732 жыл бұрын
@@KnowledgeAmplifier1 great,might post some queries if I face any issue real time
@aniketjadhav32 Жыл бұрын
Great tutorial
@vishalrana3022 жыл бұрын
Hi, how can i use bootStrapAction parameter in this lambda code
@uvannarayanan10482 жыл бұрын
"ClientError: An error occurred (ValidationException) when calling the RunJobFlow operation: Invalid InstanceProfile: EMR_EC2_DefaultRole" I am getting this error in the cloudwatch logevents, what should i do?
@KnowledgeAmplifier12 жыл бұрын
You should be having EMR_EC2_DefaultRole & EMR_DefaultRole in the Roles section in your AWS Account, Please make sure that is there , or else create the roles .. as a shortcut , what you can do , try to launch one EMR Cluster manually , AWS will automatically generate the roles and then terminate the cluster and use those roles in launching transient cluster :-)
@uvannarayanan10482 жыл бұрын
@@KnowledgeAmplifier1 Thank you for your immediate reply.. now im getting another error, "when calling the RunJobFlow operation: User: arn:aws:iam::671876216699:user/caplambda is not authorized to perform: elasticmapreduce:RunJobFlow on resource: arn:aws:elasticmapreduce:us-east-1:671876216699:cluster/* because no identity-based policy allows the elasticmapreduce:RunJobFlow action" here caplambda is the user that i created to get access key id and secret access key, i have given s3 full access, EMR full access and cloud watch full acces(for the caplambda user).. help me sir
@uvannarayanan10482 жыл бұрын
@@KnowledgeAmplifier1 Actually after searching it in google, I got to know that we need to add "AmazonElasticMapReduce" policy also, now my cluster is running but I'm not getting the output.. I tried with pyspark commands, for example, customer_data = spark.read.format("csv").option("inferSchema","true").option("header","true").load(s3_location); customer_data = customer_data.withColumn("transactionAmount", customer_data.transactionAmount.substr(2,6)) from pyspark.sql.types import FloatType customer_data = customer_data.withColumn("transactionAmount", customer_data["transactionAmount"].cast(FloatType())) I gave comments like this and the cluster failed.. is there anyother way that i can give pyspark commands or can i upload a python notebook(.ipynb) file? If you have any suggestions pls let me know
@KnowledgeAmplifier12 жыл бұрын
@@uvannarayanan1048 you can upload the pyspark code in s3 as I demonstrated in the video and then best way to run the code in Transient cluster is first do regression testing of the same code in a dummy persistent EMR Cluster and make sure it is working as expected , then try running that in Transient cluster..
@joseneto6558 Жыл бұрын
Fact is that the Ec2KeyName must be the name of an EC2 key pair that you created beforehand.
@joseneto6558 Жыл бұрын
Is there anyone with clusters taking too long to start? My clusters are stuck for more than 15 minutes in the "Master: bootstrapping". How can I fix that?
@Gaurav-wy2wm2 жыл бұрын
Hey I have tried this but getting error that “Provided region_name ‘us_west_2’ doesn’t match a supported format
@Gaurav-wy2wm2 жыл бұрын
Even I have putted correct Ec2SubnetId also
@simonzhang86682 жыл бұрын
nice!
@KnowledgeAmplifier12 жыл бұрын
Thank you Simon Zhang! Happy Learning :-)
@umangsinghal93202 жыл бұрын
I followed your code step by step, cluster is also running but job is getting failed with this error, Exception in thread "main" org.apache.spark.SparkException: Application application_1666264815714_0001 finished with failed status, could you help us to know why this is happening
@jatinkr30010 ай бұрын
If someone knows the ans please share as I am also facing same issue