Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest

  Рет қаралды 6,581

DataEng Uncomplicated

DataEng Uncomplicated

Күн бұрын

Пікірлер
@thepravinbtech
@thepravinbtech Жыл бұрын
Hi Data eng your knowledge in AWS and way of teaching is excellent could you please share the videos on CICD pipeline to deploy the glue jobs to production
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks for the kind words! Yes actually this was going to be one of my next videos. How to deploy a glue job to terraform with terraform.
@0777deep
@0777deep 8 ай бұрын
Thanks !
@Angleito
@Angleito 7 ай бұрын
how do you add third party python libraries ?
@DataEngUncomplicated
@DataEngUncomplicated 7 ай бұрын
I don't know an elegant way to do this but you can go into the docker container and install the python libraries you need directly that way.
@harshadk4264
@harshadk4264 8 ай бұрын
Do you use the Factory Design pattern?
@renyang2320
@renyang2320 7 ай бұрын
Your functions based job is quite straightforward. Would you like to organize your glue job in a Python class?
@DataEngUncomplicated
@DataEngUncomplicated 7 ай бұрын
I made the script just for this KZbin video, sure things could be organized into classes if it makes sense?
@kckc1289
@kckc1289 7 ай бұрын
How would you recommend local dev and organization -> uploading to AWS for scripts with multiple files ?
@kckc1289
@kckc1289 7 ай бұрын
Do you have a Github for this Pytest example?
@DataEngUncomplicated
@DataEngUncomplicated 7 ай бұрын
Hey, checkout my videos on local development for AWS glue. I covered topics like using interactive sessions, pycharm and vs code with a docker container with AWS glue. In order to upload them, I recommend managing them with IaC with terraform or cdk.
@joseluisvega3237
@joseluisvega3237 Жыл бұрын
I've been looking to develop some unit tests with pytest but I would like to mock everything related to the Glue Environment. I've been trying to do it through MonkeyPatch but the problem I have is when I transform the dybamicframe to dataFrame, it's also expecting a full mock of the dataFrame and it's functions. Any experience with that?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi, Can you explain how your approach is different than how I created the unit test in the video? If you design your functions to do one particular thing, it makes it much easier to write unit tests for it.
@joseluisvega3237
@joseluisvega3237 Жыл бұрын
The approach is to be able to run the unit test without a glue environment, no docker image, pure local développement (my laptop). Mocking GlueContext and DybamicFrame. The tests would use the mocks of these instances so there's no interaction with AWS glue at all.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Yea I don't know how you can achieve this. your environment you are running the glue jobs need to have the python libraries installed so you can execute the code. The way I set it up is I am doing 100% local development but glue is in a docker container. If you can't use docker, you need to install and set up spark directly on your local machine. I tried to do this following the documentation but it was messy and I couldn't get it to work in the end
Best Practices for Unit Testing PySpark
18:57
Databricks
Рет қаралды 3,6 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 8 МЛН
Симбу закрыли дома?! 🔒 #симба #симбочка #арти
00:41
Симбочка Пимпочка
Рет қаралды 6 МЛН
Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step
13:43
How To Write Unit Tests in Python • Pytest Tutorial
35:34
pixegami
Рет қаралды 150 М.
AWS Glue PySpark: Flatten Nested Schema (JSON)
7:51
DataEng Uncomplicated
Рет қаралды 15 М.
Learn to Efficiently Test ETL Pipelines
35:13
Databricks
Рет қаралды 11 М.
Top AWS Services A Data Engineer Should Know
13:11
DataEng Uncomplicated
Рет қаралды 177 М.
What is an API Gateway?
10:19
IBM Technology
Рет қаралды 325 М.
Practical Projects to Learn Data Engineering On AWS
8:04
DataEng Uncomplicated
Рет қаралды 50 М.
How to test your Python ETL pipelines | Data pipeline | Pytest
11:35
BI Insights Inc
Рет қаралды 14 М.
Cheerleader Transformation That Left Everyone Speechless! #shorts
00:27
Fabiosa Best Lifehacks
Рет қаралды 8 МЛН