Mastering AWS Glue Unit Testing for PySpark Jobs with Pytest

  Рет қаралды 7,235

DataEng Uncomplicated

DataEng Uncomplicated

Күн бұрын

Пікірлер: 15
@thepravinbtech
@thepravinbtech Жыл бұрын
Hi Data eng your knowledge in AWS and way of teaching is excellent could you please share the videos on CICD pipeline to deploy the glue jobs to production
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Thanks for the kind words! Yes actually this was going to be one of my next videos. How to deploy a glue job to terraform with terraform.
@Angleito
@Angleito 9 ай бұрын
how do you add third party python libraries ?
@DataEngUncomplicated
@DataEngUncomplicated 9 ай бұрын
I don't know an elegant way to do this but you can go into the docker container and install the python libraries you need directly that way.
@kckc1289
@kckc1289 9 ай бұрын
How would you recommend local dev and organization -> uploading to AWS for scripts with multiple files ?
@kckc1289
@kckc1289 9 ай бұрын
Do you have a Github for this Pytest example?
@DataEngUncomplicated
@DataEngUncomplicated 9 ай бұрын
Hey, checkout my videos on local development for AWS glue. I covered topics like using interactive sessions, pycharm and vs code with a docker container with AWS glue. In order to upload them, I recommend managing them with IaC with terraform or cdk.
@0777deep
@0777deep 10 ай бұрын
Thanks !
@harshadk4264
@harshadk4264 10 ай бұрын
Do you use the Factory Design pattern?
@renyang2320
@renyang2320 9 ай бұрын
Your functions based job is quite straightforward. Would you like to organize your glue job in a Python class?
@DataEngUncomplicated
@DataEngUncomplicated 9 ай бұрын
I made the script just for this KZbin video, sure things could be organized into classes if it makes sense?
@joseluisvega3237
@joseluisvega3237 Жыл бұрын
I've been looking to develop some unit tests with pytest but I would like to mock everything related to the Glue Environment. I've been trying to do it through MonkeyPatch but the problem I have is when I transform the dybamicframe to dataFrame, it's also expecting a full mock of the dataFrame and it's functions. Any experience with that?
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Hi, Can you explain how your approach is different than how I created the unit test in the video? If you design your functions to do one particular thing, it makes it much easier to write unit tests for it.
@joseluisvega3237
@joseluisvega3237 Жыл бұрын
The approach is to be able to run the unit test without a glue environment, no docker image, pure local développement (my laptop). Mocking GlueContext and DybamicFrame. The tests would use the mocks of these instances so there's no interaction with AWS glue at all.
@DataEngUncomplicated
@DataEngUncomplicated Жыл бұрын
Yea I don't know how you can achieve this. your environment you are running the glue jobs need to have the python libraries installed so you can execute the code. The way I set it up is I am doing 100% local development but glue is in a docker container. If you can't use docker, you need to install and set up spark directly on your local machine. I tried to do this following the documentation but it was messy and I couldn't get it to work in the end
Best Practices for Unit Testing PySpark
18:57
Databricks
Рет қаралды 5 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН
It’s all not real
00:15
V.A. show / Магика
Рет қаралды 20 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨
00:21
Two More French
Рет қаралды 42 МЛН
How to test your Python ETL pipelines | Data pipeline | Pytest
11:35
BI Insights Inc
Рет қаралды 16 М.
How To Write Unit Tests in Python • Pytest Tutorial
35:34
pixegami
Рет қаралды 155 М.
How to do unit testing in PySpark | PySpark Tutorial
23:22
CognitiveCoders
Рет қаралды 1,2 М.
AWS Hands-On: ETL with Glue and Athena
22:35
Cumulus Cycles
Рет қаралды 33 М.
AWS Glue PySpark: Flatten Nested Schema (JSON)
7:51
DataEng Uncomplicated
Рет қаралды 15 М.
Develop AWS Glue Jobs Locally Using PyCharm and Docker on Windows - step by step
13:43
Learn to Efficiently Test ETL Pipelines
35:13
Databricks
Рет қаралды 12 М.
Unit Tests and Test Doubles like Mocks, Stubs & Fakes
17:32
Cognitive Programmer
Рет қаралды 138 М.
Pytest Tutorial - How to Test Python Code
1:28:39
freeCodeCamp.org
Рет қаралды 241 М.
“Don’t stop the chances.”
00:44
ISSEI / いっせい
Рет қаралды 62 МЛН