Amazing. Could you do a tutorial about using step function with EMR Serverless? Thanks.
@dacort Жыл бұрын
EMR Serverless is not natively supported with Step Functions today, but there is a way to do it using Lambda functions. We have a blog post about it here, if it's helpful! aws.amazon.com/blogs/big-data/run-a-data-processing-job-on-amazon-emr-serverless-with-aws-step-functions/
@disrupcao4674 Жыл бұрын
great video
@ManishBhandari-df2xf Жыл бұрын
Hi Great video - can you please also show steps on how to install external libraries on EMR - bootstrap script replacement?
@dacort Жыл бұрын
Assuming you're talking about EMR Serverless, there's a couple different options. You can use custom images ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/application-custom-image.html ) to install OS-level dependencies. If you're just talking about PySpark dependencies you can also bundle a virtual environment ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-python-libraries.html ).
@srirajvasireddy26159 ай бұрын
For pyspark dependencies like pandas or kafka. How to bundle a virtual environment? New to python, any help or suggestions are greatly appreciated.
@viewermm15882 ай бұрын
Does anyone here knows if it is possible to use Spark to select/collect multiple Parquet files from s3 bucket ( all in "ABC" folder) and combined them in one Parquet file in ( "DEF") file in the same location? and if so what is the code , thanks
@bariowd2 жыл бұрын
Amazing video do you know if there is any chance to send parameters from airflow DAG to the called notebook? For example the DAG receives a random date&&number then when you trigger the DAG it send those parameters to the notebook. Thank you! :)
@dacort2 жыл бұрын
I didn't use notebooks in this video, the EMR StartNotebookExecution API allows you to pass parameters to notebook runs. We have a blog post about that here: aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-on-amazon-emr-notebooks-using-amazon-mwaa/
@JayanthNaidu-w5e Жыл бұрын
Is there a way to install custom Java versions without creating custom images?
@dacort11 ай бұрын
We now support Java 17 ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-java-runtime.html ). Unfortunately not another way to use custom Java versions without custom images.
@subhomoysikdar Жыл бұрын
Is there a way to run EMR serverless with GPU? I want to run pyspark jobs with NVIDIA RAPIDS
@dacort Жыл бұрын
Not as of today. For that you'll still need EMR on EC2 or EMR on EKS.
@subhomoysikdar Жыл бұрын
@@dacort Ok. Thank you
@julsgranados6861 Жыл бұрын
Great video!! , Is there any way to run a dbt project using emr serverless?, I have seen that they have the Thrift option to connect to EMR on EC2, but I am not sure if it is possible to connect it to EMR serverless :(