Running Spark jobs on Amazon EMR Serverless

  Рет қаралды 10,251

dacort - Data Analytics

dacort - Data Analytics

Күн бұрын

Пікірлер: 17
@AnGELsPearhead
@AnGELsPearhead 2 жыл бұрын
Amazing Demo!!!
@kingsleywen3889
@kingsleywen3889 Жыл бұрын
Amazing. Could you do a tutorial about using step function with EMR Serverless? Thanks.
@dacort
@dacort Жыл бұрын
EMR Serverless is not natively supported with Step Functions today, but there is a way to do it using Lambda functions. We have a blog post about it here, if it's helpful! aws.amazon.com/blogs/big-data/run-a-data-processing-job-on-amazon-emr-serverless-with-aws-step-functions/
@disrupcao4674
@disrupcao4674 Жыл бұрын
great video
@ManishBhandari-df2xf
@ManishBhandari-df2xf Жыл бұрын
Hi Great video - can you please also show steps on how to install external libraries on EMR - bootstrap script replacement?
@dacort
@dacort Жыл бұрын
Assuming you're talking about EMR Serverless, there's a couple different options. You can use custom images ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/application-custom-image.html ) to install OS-level dependencies. If you're just talking about PySpark dependencies you can also bundle a virtual environment ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-python-libraries.html ).
@srirajvasireddy2615
@srirajvasireddy2615 9 ай бұрын
For pyspark dependencies like pandas or kafka. How to bundle a virtual environment? New to python, any help or suggestions are greatly appreciated.
@viewermm1588
@viewermm1588 2 ай бұрын
Does anyone here knows if it is possible to use Spark to select/collect multiple Parquet files from s3 bucket ( all in "ABC" folder) and combined them in one Parquet file in ( "DEF") file in the same location? and if so what is the code , thanks
@bariowd
@bariowd 2 жыл бұрын
Amazing video do you know if there is any chance to send parameters from airflow DAG to the called notebook? For example the DAG receives a random date&&number then when you trigger the DAG it send those parameters to the notebook. Thank you! :)
@dacort
@dacort 2 жыл бұрын
I didn't use notebooks in this video, the EMR StartNotebookExecution API allows you to pass parameters to notebook runs. We have a blog post about that here: aws.amazon.com/blogs/big-data/orchestrating-analytics-jobs-on-amazon-emr-notebooks-using-amazon-mwaa/
@JayanthNaidu-w5e
@JayanthNaidu-w5e Жыл бұрын
Is there a way to install custom Java versions without creating custom images?
@dacort
@dacort 11 ай бұрын
We now support Java 17 ( docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/using-java-runtime.html ). Unfortunately not another way to use custom Java versions without custom images.
@subhomoysikdar
@subhomoysikdar Жыл бұрын
Is there a way to run EMR serverless with GPU? I want to run pyspark jobs with NVIDIA RAPIDS
@dacort
@dacort Жыл бұрын
Not as of today. For that you'll still need EMR on EC2 or EMR on EKS.
@subhomoysikdar
@subhomoysikdar Жыл бұрын
@@dacort Ok. Thank you
@julsgranados6861
@julsgranados6861 Жыл бұрын
Great video!! , Is there any way to run a dbt project using emr serverless?, I have seen that they have the Thrift option to connect to EMR on EC2, but I am not sure if it is possible to connect it to EMR serverless :(
@dacort
@dacort Жыл бұрын
Unfortunately not as of today. :(
AWS EMR Serverless - What is it? [FULL TUTORIAL in 25mins]
23:35
Johnny Chivers
Рет қаралды 16 М.
Intro to Amazon EMR - Big Data Tutorial using Spark
22:02
jayzern
Рет қаралды 31 М.
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 46 МЛН
Увеличили моцареллу для @Lorenzo.bagnati
00:48
Кушать Хочу
Рет қаралды 8 МЛН
Perfect Pitch Challenge? Easy! 🎤😎| Free Fire Official
00:13
Garena Free Fire Global
Рет қаралды 94 МЛН
Amazon EMR - When to use EMR on EC2, EKS, and Serverless
7:37
dacort - Data Analytics
Рет қаралды 3,2 М.
Intro to Amazon EMR Toolkit
10:13
dacort - Data Analytics
Рет қаралды 2 М.
Amazon EMR Deep Dive and Best Practices - AWS Online Tech Talks
40:32
AWS Developers
Рет қаралды 57 М.
The cloud is over-engineered and overpriced (no music)
14:39
Tom Delalande
Рет қаралды 690 М.
What is Amazon EMR and how can I use it for processing data?
9:02
Amazon Web Services
Рет қаралды 84 М.
Running EMR jobs with Airflow
10:31
dacort - Data Analytics
Рет қаралды 8 М.
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 46 МЛН