End To End Data Engineering Project With Snowflake | Parquet, JSON & CSV Data Files

  Рет қаралды 14,784

Data Engineering Simplified

Data Engineering Simplified

Күн бұрын

In this exciting hands on tutorial , we will deep dive into the world of End-to-End Data Engineering Project using Snowflake Snowpark (Python). We will explore the Snowpark's File API & Data Frame API capabilities to tackle Amazon sales order data project for mobile products. With step-by-step demonstrations and insightful explanations, this content will guide you through the entire journey, from data ingestion and transformation to advanced analytics and visualisation. Whether you're a data enthusiast, aspiring data engineer, or simply curious about the world of data engineering, this video is a must-watch. Get ready to unlock the full potential of Snowflake Snowpark and revolutionise your data projects.
Once you complete this end-to-end snowpark python based data engineering project (ETL), you would be able answer of following questions
1. How to load large amount of data set from local machine to internal stages?
2. How to load delta data set from local machine to internal stages?
3. How to use copy command using Snowpark File API?
4. How to use Snowpark data frame API for complex transformation?
5. How to draw simple dashboard using snowsight dashboard along with filters?
🚀🚀 Social Links 🚀🚀
-----------------------------------------------------------
✏ Instagram: / learn_dataengineering
✏ Twitter: / de_simplified
✏ Facebook: learndataengineering
✏ GitHub : github.com/TopperTips
✏ Website: toppertips.com
🚀🚀 Data Engineering Simplified Snowflake Cheat Sheets 🚀🚀
------------------------------------------------------------------------------------------------------
➥ Complete Snowflake Tutorial Cheat Sheet - rebrand.ly/maj4l6x
➥ Complete Snowflake JSON Guide Cheat Sheet - rebrand.ly/d52cef
🚀🚀 Snowpark Medium Article For Source Code 🚀🚀
---------------------------------------------------------------------------------------
➥ / 744c1d5a8d50
➥ gitlab.com/toppertips/snowfla...
🚀🚀 Snowpark End 2 End Use Case Video Sections 🚀🚀
----------------------------------------------------------------------------
➥ 00:00:00 Introduction
➥ 00:02:04 Welcome To DE Simplified
➥ 00:02:37 Sample Code - Git Location
➥ 00:04:06 Logical Data Flow
➥ 00:06:11 Layered Design Approach Using Schema
➥ 00:08:34 Snowflake Logical Data Model
➥ 00:09:49 Amazon Retail Sales - Mobile Data Set
➥ 00:11:30 Step-01 - Create User & VWH
➥ 00:14:30 Step-02 - Create DB & Schemas
➥ 00:14:30 Step-03 - Create Internal Stage
➥ 00:17:47 Git Location (Data Set)
➥ 00:19:27 Data Processing Rule
➥ 00:14:30 Step-04 - Load Data To Stage Location
➥ 00:24:27 Step-05 - File Format Creation
➥ 00:30:07 Step-06 - Loading Forex Data
➥ 00:31:39 Step-07 - Internal Stage To Source Schema
➥ 00:41:53 Step-08 - Source To Curated
➥ 00:53:20 Step-09 - Curated To Consumption (Dim Modelling)
➥ 01:08:59 Step-10 - Snowsight dashboard
➥ 01:10:47 Step-11 - Delta Processing
➥ 01:17:47 Step-12 - Additional Dashboard
🚀 🚀 Everything About Snowpark Playlist 🚀 🚀
--------------------------------------------------------------------------------
➥ What Is Snowpark, Introduction & Architecture? (01/15 • #01 | What is Snowpark... )
➥ What is NOT Snowpark? (02/15 • #02 | What Is NOT Snow... .)
➥Who Should Learn Snowpark? (03/15 • #03 | Who Should Lear... )
➥Where To Learn Snowpark From? (04/15 coming soon...)
➥Can A SQL Developer Learn Snowpark? (05/15 coming soon...)
➥Can we write snowpark in Python, Scala or Java Programming Languages? (06/15 coming soon...)
➥Can I run snowpark in Databricks notebooks? (07/15 coming soon...)
➥Can Snowpark run inside Snowflake Cloud Data Warehouse?(08/15 coming soon...)
➥Will Apache Spark Survive? (09/15 coming soon...)
➥Snowpark and Future of ADF, Databricks & Azure Synapse Analytics? (10/15 coming soon...)
➥Apache Spark to Snowflake Snowpark Migration Roadmap? (11/15 coming soon...)
➥Snowpark Python Vs Snowpark Scala Vs. Snowpark Java? (12/15 coming soon...)
🚀 🚀 Other Playlist By Data Engineering Simplified 🚀 🚀
----------------------------------------------------------------------------------------------
➥ Complete Snowflake Master Class 🌐 bit.ly/snowflake-tutorial
➥ Data Loading In Snowflake Master Class 🌐 bit.ly/load-data-into-snowflake
➥ SnowPro Certification Mock Test Papers 🌐 bit.ly/snowpro-mock-test
➥ SnowPro Certification Guide 🌐 bit.ly/snowpro-certification-v1
#snowpark
#snowflake
#snowflaketutorial
#snowflakedatawrehouse
#snowflakecomputing
#clouddatawarehouse
#snowparktutorial
Disclaimer: All snowflake-related learning materials and tutorial videos published in this channel are the personal opinions of the data engineering simplified team and they're neither authorised by nor associated with Snowflake, Inc.

Пікірлер: 67
@harshshah3546
@harshshah3546 3 ай бұрын
Appreciate the time and effort you've put in to create this tutorial.
@gyt7504
@gyt7504 Ай бұрын
great tutorial. thanks!
@rakeshkumarsharma195
@rakeshkumarsharma195 11 күн бұрын
Awasome just awesome 👍
@anilkumark3573
@anilkumark3573 Жыл бұрын
Sir , We appreciate your efforts and knowledge sharing.
@DataEngineering
@DataEngineering Жыл бұрын
Glad you liked the content.. thank you so much for you note Anil...
@uravakondakhadhar5458
@uravakondakhadhar5458 11 ай бұрын
Appreciate your work
@DataEngineering
@DataEngineering 11 ай бұрын
Thank you so much 😀
@hritiksharma7154
@hritiksharma7154 Жыл бұрын
Really great project 👌
@DataEngineering
@DataEngineering Жыл бұрын
glad you liked it..
@mariumbegum7325
@mariumbegum7325 Жыл бұрын
Interesting video and great explanation.
@DataEngineering
@DataEngineering Жыл бұрын
Glad you liked the content...
@user-uv5xf1mc6b
@user-uv5xf1mc6b Жыл бұрын
Thanks for Everything...
@DataEngineering
@DataEngineering Жыл бұрын
Always welcome
@ganeshlakshman2506
@ganeshlakshman2506 Жыл бұрын
Very comprehensive, thank you :) I don't see 2 years data in the gitlab link but just one month, Jan 2020. Am I missing looking at the wrong location?
@parikshitchavan2211
@parikshitchavan2211 8 ай бұрын
wow superb you covered almost all the things required to survive as an data engineer in industry 🙂🤐
@DataEngineering
@DataEngineering 8 ай бұрын
Thanks for your note... If you want to manage snowflake more programatically.. you can watch my paid contents .. many folks don't know the power of snowpark... this 2 videos... will help you to broaden your knowledge.. These contents are available in discounted price for limited time.. (one for JSON and one for CSV).. it can automatically create DDL and DML and also run copy command... and make all SQL statement available for CI/CD... 1. www.udemy.com/course/snowpark-python-ingest-json-data-automatically-in-snowflake/?couponCode=SPECIAL50 2. www.udemy.com/course/automatic-data-ingestion-using-snowflake-snowpark-python-api/?couponCode=SPECIAL35
@uttapa22
@uttapa22 6 ай бұрын
Great video. It would have been wonderful if it also contained 1. how to do end to end CICD 2. How to setup pipeline dependency between data ingestion tool and snow flake task ( assuming we can bundle up all the loading steps you have covered in this video into a snowflake task) Apologies if you have already got these covered else where , if so please direct me. Many Thanks
@uttapa22
@uttapa22 6 ай бұрын
Great video. It would have been wonderful if it also contained 1. how to do end to end CICD 2. How to setup pipeline dependency between data ingestion tool and snow flake task ( assuming we can bundle up all the loading steps you have covered in this video into a snowflake task) Apologies if you have already got these covered else where , if so please direct me. Many Thanks 1:21:30
@DataEngineering
@DataEngineering 6 ай бұрын
Glad you linked the content and your request for CI/CD is noted. The CI/CD is not yet covered in my video.
@dilshadsayed7202
@dilshadsayed7202 2 ай бұрын
great video, thanks, would it be possible to share the data used in this project
@hansrony5684
@hansrony5684 9 ай бұрын
Hi Bro, While going through the course, I found out that not all the data is provided in the gitlab link as well as the exchange_rates.csv at 50:00 . The exchange rate column is null for all rows after moving the file into curated stage. Could you update the link with all the files as mentioned in the course? Thanks
@user-tv4cl1wm6f
@user-tv4cl1wm6f 9 ай бұрын
Thanks for everything. U helped a lot ❤! May i ask if u can make videos on the exception handling and error logging? E.g. one of the csv has an additional column. Another example is when loading data into the internal stage, wifi connection failed and how to resume the job? Thanks bro! :)
@DataEngineering
@DataEngineering 9 ай бұрын
if file are not loaded, and if you try to load, the load will ignore them...
@rodrigoschammass5205
@rodrigoschammass5205 9 ай бұрын
Thank you very much but Step 4.2 Loading Data To Internal Stage Using Snowpark File API is not working for me. I run the code but no data is there.
@srinivasp6579
@srinivasp6579 10 ай бұрын
Thank you for sharing such a good content. I should say you are a rockstar in Snowflake world. I have a question. In this case, since there are lot of Data frames created in snowpark-python scripts and running the code from local machine ,does it consume local system storage/compute or push everything to the Snowflake storage/Compute? Thank you in advance!
@DataEngineering
@DataEngineering 10 ай бұрын
thanks for your note.. when you perform an operation using dataframe in snowflake, it uses snowflake's compute power. When you pull data to your location machine..in that case.. it uses your local compute...
@srinivasp6579
@srinivasp6579 10 ай бұрын
Thank you for your quick response. If i would like to push everything to the snowflake storage and compute, how should we do it? How should we register the snowpark-python programs in snowflake database and run/debug it(Instead of Stored proc route) ? is is really possible? May be having a separate video might help@@DataEngineering
@DataEngineering
@DataEngineering 10 ай бұрын
Watch ch-08 from this snowpark playlist.. and you would understand how to deploy it (playlist link kzbin.info/aero/PLba2xJ7yxHB4yPg3pUrobdzeMxk4mP24S)
@srinivasp6579
@srinivasp6579 10 ай бұрын
​@@DataEngineeringThank you. I already watched it. Does that mean we should test it locally first and then deploy on SF sandbox. I am looking for options if we can develop,test, debug and deploy directly in the SF sandbox itself? Is it possible? Any insight?
@satishbalaji1832
@satishbalaji1832 Жыл бұрын
Thank you for informative session. Can’t we achieve same solution through snowflake sql baes queries/stored procs
@DataEngineering
@DataEngineering Жыл бұрын
yes, you can do it... and snowpark is nothing but SQL generator with current version .. may be you can watch this video.. what it is and what it is not 1. kzbin.info/www/bejne/Y5LahIOIjJ50hbs (What is snowpark) 2. kzbin.info/www/bejne/baW3oHWamb-Sn9U (What is NOT snowpark)
@Vidhvamsam-Villain
@Vidhvamsam-Villain Ай бұрын
curated to consumption code is not working properly.
@balajikarthik8366
@balajikarthik8366 9 ай бұрын
Thanks for you effort. A question: how would you productionise the entire flow. Should your python code be converted to a stored procedure?
@DataEngineering
@DataEngineering 9 ай бұрын
Yes, exactly.. or it needs a runtime environment outside of snowflake and it has to be scheduled with some kind of scheduler.
@user-tv4cl1wm6f
@user-tv4cl1wm6f 9 ай бұрын
@@DataEngineeringCan we have a video on that as well? Like how can .py first become a stored procedure 😅😅😅
@prajeetkatari3742
@prajeetkatari3742 11 ай бұрын
HI , i was tryin to run the getting the csv etc. files on the internal stage , I even get the output of the directory but Im not able to see the data as a result ! pls do help have been trying to rectify for hours but got no clue! thanks
@DataEngineering
@DataEngineering 11 ай бұрын
Not sure which step you are talking about... if you can give me a timestamp, it will be helpful or you share a screenshot to my instagram account (instagram.com/learn_dataengineering/)
@user-le8cf9ck4v
@user-le8cf9ck4v 10 ай бұрын
Sir can you please do one end to end project in snowsql as well. that will be very beneficial for us.
@DataEngineering
@DataEngineering 10 ай бұрын
snowsql is just a cli tool.. you mean Snowflake SQL? if so.. watch ch-19 from my snowflake tutorial .. the end to end flow is covered using SQL.
@affanamin105
@affanamin105 6 ай бұрын
Hi, thanks for this end to end project, Where can I find complete dataset which you used in this video ?
@DataEngineering
@DataEngineering 6 ай бұрын
complete data set is too big.. the desc has the link that has limited data. ----- and yes, I know many of us are not fully aware of snowpark Python API, if you want to manage snowflake more programatically.. you can watch my paid contents (data + code available) .. many folks don't know the power of snowpark... these 2 videos... will help you to broaden your knowledge.. These contents are available in udemy.. (one for JSON and one for CSV).. it can automatically create DDL and DML and also run copy command... 1. www.udemy.com/course/snowpark-python-ingest-json-data-automatically-in-snowflake/ 2. www.udemy.com/course/automatic-data-ingestion-using-snowflake-snowpark-python-api/
@bharathkandati3911
@bharathkandati3911 Жыл бұрын
Hi, thank you for sharing the project. Where to find the python code ? Gitlab has data files. Please advise
@DataEngineering
@DataEngineering Жыл бұрын
In the description
@jaelinjordan1104
@jaelinjordan1104 Жыл бұрын
Quick question: I am on Part 4 but for some reason I downloaded the data to my computer but it does not show when I try to run it through Snowflake. Is there a reason for that?
@DataEngineering
@DataEngineering Жыл бұрын
could you please provide additional detail, not able to understand the issue. pls attach a time stamp or share a screenshot via my instagram account.
@sabarisri4515
@sabarisri4515 Жыл бұрын
There is no primary key in snowflake. Then why do we use Primary and Foreign key here ? Can you please explain.
@DataEngineering
@DataEngineering Жыл бұрын
When you connect to any BI tool like PowerBI..they need these relationship.. and can build the model for slice and dice... and if you have to draw the ER diagram.. to understand the relationship.. in such case.. you have to have those relationship are important..
@vaibhavverma1340
@vaibhavverma1340 8 ай бұрын
Can you please tell me how to update row in snowflake_sample_data.tpch_sf100.orders??? getting error - "Object 'ORDERS' does not exist or not authorized."
@DataEngineering
@DataEngineering 8 ай бұрын
that is a shared object, you can not update it.
@uravakondakhadhar5458
@uravakondakhadhar5458 Жыл бұрын
Hi bro
@DataEngineering
@DataEngineering Жыл бұрын
pls share your query..
@praveenkumar-sk8nx
@praveenkumar-sk8nx 8 ай бұрын
How we can do reverse engineering without third party tool
@DataEngineering
@DataEngineering 8 ай бұрын
then you have to write program for it.... snowpark can do .. or you can also write python unless snowsight come up with some kind of UI for that.. and yes, If you want to manage snowflake more programatically.. you can watch my paid contents .. many folks don't know the power of snowpark... this 2 videos... will help you to broaden your knowledge.. These contents are available in discounted price for limited time.. (one for JSON and one for CSV).. it can automatically create DDL and DML and also run copy command... and make all SQL statement available for CI/CD... 1. www.udemy.com/course/snowpark-python-ingest-json-data-automatically-in-snowflake/?couponCode=SPECIAL50 2. www.udemy.com/course/automatic-data-ingestion-using-snowflake-snowpark-python-api/?couponCode=SPECIAL35
@ramakrishnatirumala428
@ramakrishnatirumala428 9 ай бұрын
sir...where i can get all these code ?
@DataEngineering
@DataEngineering 9 ай бұрын
Check description
@chittaranjanpradhan5290
@chittaranjanpradhan5290 8 ай бұрын
Hello, How can I get the source code of this project?
@DataEngineering
@DataEngineering 7 ай бұрын
it is in the description... and yes..I know many of us are not fully aware of snowpark Python API, if you want to manage snowflake more programatically.. you can watch my paid contents (data + code available) .. many folks don't know the power of snowpark... these 2 videos... will help you to broaden your knowledge.. These contents are available in discounted price for limited time.. (one for JSON and one for CSV).. it can automatically create DDL and DML and also run copy command... 1. www.udemy.com/course/snowpark-python-ingest-json-data-automatically-in-snowflake/?couponCode=DIWALI50 2. www.udemy.com/course/automatic-data-ingestion-using-snowflake-snowpark-python-api/?couponCode=DIPAWALI35
@srinigoud7393
@srinigoud7393 6 ай бұрын
You have any udemy cource.Can you please send me gitlab repo or udemy course details
@DataEngineering
@DataEngineering 6 ай бұрын
These contents are available in udemy.. (one for JSON and one for CSV).. it can automatically create DDL and DML and also run copy command... 1. www.udemy.com/course/snowpark-python-ingest-json-data-automatically-in-snowflake/ 2. www.udemy.com/course/automatic-data-ingestion-using-snowflake-snowpark-python-api/
@antonybest2599
@antonybest2599 8 ай бұрын
Hi I keep getting this error File "C:\Users\anbest\OneDrive - Capgemini\Documents\Git\Snowpark_project\LoadData.py", line 57, in main put_result(file_element," => ",put_result[0].status) TypeError: 'list' object is not callable I tested the traverse func on its own, and it is picking up my file names location etc. seems to be the put_result causing issues
@DataEngineering
@DataEngineering 8 ай бұрын
Not sure clear what kind of error you are getting... your result is not what the program expect.. so you need to check the typeof(object) and if it is list or not.
@s.satishkumar8089
@s.satishkumar8089 Жыл бұрын
How to contact you brother
@DataEngineering
@DataEngineering Жыл бұрын
instagram.com/learn_dataengineering/
@s.satishkumar8089
@s.satishkumar8089 Жыл бұрын
@@DataEngineering already sent a msg to you but no reply
@SaurabhYadav-ep6cu
@SaurabhYadav-ep6cu 10 ай бұрын
gitlab link just have one month, data Jan 2020. Can you send us the proper link containing whole data file. @DataEngineering
@DataEngineering
@DataEngineering 10 ай бұрын
Yes, if it hard to put so much of data in any platform.... that's why given only 1 month of data.
#01 | What is Snowpark in Snowflake
21:36
Data Engineering Simplified
Рет қаралды 23 М.
How Many Balloons Does It Take To Fly?
00:18
MrBeast
Рет қаралды 123 МЛН
- А что в креме? - Это кАкАооо! #КондитерДети
00:24
Телеканал ПЯТНИЦА
Рет қаралды 7 МЛН
THE POLICE TAKES ME! feat @PANDAGIRLOFFICIAL #shorts
00:31
PANDA BOI
Рет қаралды 25 МЛН
Мы никогда не были так напуганы!
00:15
Аришнев
Рет қаралды 6 МЛН
End To End Data Engineering Project Using Snowflake | Real Cricket Analytics Use Case
1:51:13
How to Build Python Data Engineering Pipelines with Snowpark
1:14:26
Snowflake Developers
Рет қаралды 19 М.
I've been using Redis wrong this whole time...
20:53
Dreams of Code
Рет қаралды 337 М.
Top 5 FREE Resources to 10X Your Data Engineering Skills
11:49
Jash Radia
Рет қаралды 48 М.
How Many Balloons Does It Take To Fly?
00:18
MrBeast
Рет қаралды 123 МЛН