YouTube Data Analysis | END TO END DATA ENGINEERING PROJECT

YouTube Data Analysis | END TO END DATA ENGINEERING PROJECT | Part 2

Рет қаралды 68,746

Күн бұрын

Пікірлер: 328

@aman_nv Ай бұрын

this is great data engineering project by the way and well explained, especially for beginner who starting to learn.. i have learnt a lot. thanks, Darshil. those who are stuck at creating AWS Glue ETL job, seems AWS updated the UI and settings of it. it's not the same way as he shown in the video, there is another way you can do that, -->he attached the PySpark code for the ETL job creation in the description, ---> then, AWS Glue --> ETL jobs --> choose option create job using script , now simply copy script in there. thats all. save the job and try running it. it should work.. Note: Make sure the data source details are modified as per your database name and table name in the script, other wise job may fail with an error Thanks!!

@KatyayaniDatti Ай бұрын

@aman_nv I too have completed the project over this weekend. cheers to us 🥳 Coming to the ETL Job creation part, copy pasting the script is fine but as a beginner I find it difficult to write such script. The old UI was better where it generated the script for us based on our actions, do we have something similar in the new UI?

@iamayuv 6 ай бұрын

00:03 Creating a crawler to understand and analyze data stored in AWS S3 buckets 05:48 Query execution and data type casting 11:43 Preprocessing and Efficiency for Querying 17:16 Writing data into the target bucket and creating partitions 22:05 Create a glue crawler to clean and catalog data 27:25 Data processing pipeline created using AWS Glue Studio 32:23 Created an analytical pipeline using AWS Glue to transform and store data 37:18 Building reporting version of data makes it easier for data scientists to analyze and query the data. 42:01 Create a dashboard to visualize data from KZbin

@mrbcan7215 6 ай бұрын

Hi, Can you explain me how the "raw_statistic" table has been created autamatcally after he created the crawl_1, When I tried same processor, it didint work on me

@robertmoncriefglockrock8957 Жыл бұрын

This is a simple error I ran into gonna post it here incase others have the same. When trying to run the job @21:25 I was getting "NameError: name 'gluecontext' is not defined. When adding the line "df_final_output = DynamicFrame.fromDF(datasink1, gluecontext, "df_final_output")" I accidentally forgot to capitalize glueContext, instead I put gluecontext Thank you for this walkthrough, I start my new Data Engineering job tomorrow and the company uses AWS so this has helped me tremendously. You are doing magic my friend

@vishnuvardhan9082 10 ай бұрын

Hi Robert, Hope you are doing well. it's been over a year since you posted and joined your new company. Just wanted to check if this new job was your first data engineering job or if you were already experienced in DE? And how are things at your new workplace?

@LaxmikantMaheshjiKabra 6 ай бұрын

Great work Darshil! I have only 1 suggestion after finishing the whole project along with the video which took me around a total of 6-8 hours except the dashboard. My suggestion is that you can take a minute extra and explain the code properly so that we viewers can understand what transform actions are we taking in the ETL because that would make more sense to the video overall and why you chose the steps were there before and after ETL step becomes clearer. Though, thanks for this wonderful project and I am probably moving to the Azure analytics project after this one.

@Ujwalarao Жыл бұрын

Thanks for bringing me close to the real use case scenario of Data Engineering.

@freelychanu2086 16 күн бұрын

Well explained everything 0 to 100% , thank you so much!❤👍

@lâmtrịnh-r4m 4 ай бұрын

12:08 if u still got issue with hive_bad_data after following step by step : u need to convert id to bigint by lambda funtion : add this code at line 21 " # Extract required columns: df_step_1 = pd.json_normalize(df_raw['items']) df_step_1['id']=df_step_1['id'].astype(int)"

@VishalThota-p1r 4 ай бұрын

Appreciated your efforts Darshil

@rahulgyani2965 8 ай бұрын

Hi Darshil, Thank you for this video. I have a question for you, when you have created a cleaned version for csv to parquet, why didnt we use lambda function instead of glue job?

@youraverageguide 2 жыл бұрын

Darshil is a great teacher! Great project.

@PriyankaGopi-j6m 10 ай бұрын

Finally completed this project. Thankyou so much for this! You're a gem :)

@SatishSharma-rh4su 10 ай бұрын

Heyy, I am facing an error while joining cleaned and raw data. Can you please help?

@sreyassawant2205 10 ай бұрын

Hey I am facing same issue. Can you help out?

@PranavPranav-w9i 4 ай бұрын

how did you created job in glue i am not able to create there is no option for it

@assieneolivier5560 Жыл бұрын

Finally get this project done. Great project to learn data engineering!!

@chinmaymaganur7133 11 ай бұрын

Hi how did you set up the etl job?

@xandao30 5 ай бұрын

@@chinmaymaganur7133 good question

@ajtam05 2 жыл бұрын

Another great video. Only thing is...AWS has updated Glue console along w/ other consoles. I believe I updated accordingly, except for the schema datatypes (which it looks like I change update after the job is run). But for the script...it does look entirely different. Could you assist w/ an updated vid on using the new Glue consoles?

@jenithmehta9603 Жыл бұрын

I am facing the same issue

@ajtam05 Жыл бұрын

@Jenith Mehta If you scroll to the bottom of the navigation pane there is "LEGACY" versions. I realized after I posted this but I used that. Hope that helps. 😀

@rohanchoudhary672 Жыл бұрын

@@ajtam05 can't find that

@SCREENERA Жыл бұрын

Finally,after 100times of disappointments...i done it...Great Efforts and it's my very first Project in DataEngineering field...Thanks ...Errors are challenging but only whose who have a real interest in DataEngineering.He will definitely achieve it ..by.Done this project completely.

@N12SR48SLC Жыл бұрын

An error occurred while calling o88.getDynamicFrame. User's pushdown predicate: region in ('ca','gb','us') can not be resolved against partition columns: [] in my job 23:00

@SCREENERA Жыл бұрын

@@N12SR48SLC sorry bro.

@SCREENERA Жыл бұрын

@@N12SR48SLC yes bro

@SCREENERA Жыл бұрын

Don't forget close the activated services from AWS..after project got done

@SCREENERA Жыл бұрын

@@N12SR48SLC what's the error

@dsilvera1578 Жыл бұрын

Darshil I learned alot. I believe this is helping many persons. Thanks for all the effort you put into this.

@srinivasn4510 Жыл бұрын

best project from scratch Thanks bro☺☺

@jsingh7810 10 ай бұрын

Thanks Darshil!! Finally made this cool project after overcoming all those errors. Really good explanation.

@GustavoFringe-dv2yg 9 ай бұрын

@@aishwaryapatel7045facing same issue

@lifefacts7368 9 ай бұрын

I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

@blanguss Ай бұрын

I am stuck at 15:00. After adding the node for the data source path node, I added a transform node, then the node for the target path. However, I cannot seem to find a way to edit the data types like he did (from long to bigint). I cannot find the interface to compare the schemas. The UI has changed a lot and I am struggling to figure it out. Someone help

@nitishkaushik2076 2 жыл бұрын

simple and to the point explanation. Great work bro 👍🏻

@adib4361 2 жыл бұрын

How do we showcase this project in our linkedin profile our in our resume

@piyushpaikroy3579 2 жыл бұрын

hey Darshil.... I hope the project is complete!!

@rushikeshdarge6115 5 ай бұрын

Thank You for the Awesome Tutorial!!!

@ajitagalawe8028 2 жыл бұрын

Too good. Learned alot.Thank you

@umerimran3833 Жыл бұрын

that was the awesome project, Thank you!

@saitarun3246 Жыл бұрын

Great video Darshil, thank you so much!!

@lifefacts7368 9 ай бұрын

I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

@harshitjoshi2250 Жыл бұрын

To the folks struggling with glue script to filter regions out: try deleting the region files manually from S3 (make sure to enable bucket versioning so that objects are not permanently deleted). By doing this you can check if rest of your code is good, and even go on with rest of the video if its working.

@iamgdsclead4208 Жыл бұрын

I am getting timeout error

@chayanshrangraj4298 Жыл бұрын

I think it's better if you just move the folders somewhere else so you won't have to upload them again in the future.

@lguerrero17 Жыл бұрын

Which part of the video ?

@snehakadam16 Жыл бұрын

Hey hi, can you please share the pyspark code?

@akshayrajput9049 2 жыл бұрын

great work darshil bro ...can you send the ppt if possible

@neha4024 2 жыл бұрын

Can you provide ETL script shown in the video. I am getting error even after adding predicate_pushdown

@tanny_tales 5 ай бұрын

did you resolved it ? I am facing few issues in it kindly help

@shutzzzzzz 2 жыл бұрын

Do i have to pay anything to complete this project? Or it is completely free?

@imenbenhassine9710 Жыл бұрын

@darshil thanks for the effort, great job!! I just finished the project and so proud of myself, my very first project switching from DA to DE. thanks a lot

@ybalasaireddy1248 Жыл бұрын

Hey hi did you get lambda timeout error by any chance?

@allenclement5672 Жыл бұрын

hey iam seeing the new aws glue ui. how did you create the job there? iam facing a lot of confusion on what to select and to navigate in that .the video ui is different.

@vighneshbuddhivant8353 Жыл бұрын

@@allenclement5672 hey did you solve this issue

@uditkapadia7104 Жыл бұрын

@@ybalasaireddy1248if you are getting time error...pls increase time

@KomilMustaev Жыл бұрын

@@allenclement5672 same problems... Did u solve them? can u help me?

@kuldeep_garg 2 жыл бұрын

You are doing such a great work, please should learn from you how to teach by this learning by doing method… Please do some more projects like this using real time data, big data also so that we can learn that also. And thanks again this tutorial is helping a lot🎉❤

@sharafmomen2460 Жыл бұрын

Really great project! Just wanted to ask, when more data ends up in the landing area, will the rest of the processes after automatically go through the pipeline you created? Because it seemed like some parts you had to do manually, like using AWS Lambda.

@mkdTech369 2 ай бұрын

Great

@TanmayMeda 7 ай бұрын

Thank you so much for the Amazing video

@ashutoshdixit2049 10 ай бұрын

Good work ,it's a grate project ,helped out learning many things.

@sanikaapatil7279 8 ай бұрын

can you help me idont understand with new interface of awsglue

@vishalkamlapure3344 2 жыл бұрын

Thank you Darshil for this wonderful project.. I have been looking for such project for long time.

@ishan358 Жыл бұрын

How did you solve runtime error

@chayanshrangraj4298 Жыл бұрын

@@ishan358 What kind of error are you getting?

@lguerrero17 Жыл бұрын

@@chayanshrangraj4298 can You help me with an error ? In the step of aws glue to do the join of the tables

@chayanshrangraj4298 Жыл бұрын

@@lguerrero17 Sure! What is the error that you are facing?

@lguerrero17 Жыл бұрын

@@chayanshrangraj4298 When I try to do create the etl to generate table analytics , it creates the table but doesn't generate columns and rows.

@ajtam05 Жыл бұрын

Has anyone used ProjectPro before? I'm considering investing into it, but just wanted to see if anyone has experience with it yet? Looks promising.

@nikhilrunku8877 5 ай бұрын

Hi Darshil, I have been trying to implement this project. At 13:28 you have created a job, but I am not able to see that option in the current version. All I can see is to create the ETL job visually. Can you please help me with this?

@rushikeshdarge6115 5 ай бұрын

Did you get a solution?

@tanny_tales 5 ай бұрын

@@rushikeshdarge6115 did you get a solution?

@kushpatel699 Ай бұрын

Hey, did it got resolved??

@kushpatel699 Ай бұрын

@@rushikeshdarge6115 Hey, did it got resolved??

@rafdeo 12 күн бұрын

@@kushpatel699 proceed with creating it visually via drag and drop. s3 as source > transform schema > s3 as target. Only thing is for me the partitioned region column doesn't pull in so I am not able to do the next step. Hopefully you have better luck

@atharvasankhe1153 7 ай бұрын

how to get the Jobs ( 13:30 ). Apparently, the Glue console has changed , so not sure how to go ahead

@vineetchotaliya3978 7 ай бұрын

got solution to this ? 💀

@GauravKhanna-cl7gd 7 ай бұрын

The Athena query works first time on the parquet file, and then I have to delete the unsaved folder in the cleansed bucket, has anyone dealt with this, I am still at the 5 min mark of this video. Really frustrating!!

@prasannakusugal4333 2 жыл бұрын

Thanks for the great video Darshil !!! Leant allot of new things :)

@lifefacts7368 9 ай бұрын

I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

@subashpandey518 11 ай бұрын

someone plz help, the UI for creating the job completely changed. I am not able to create new jobs.

@marvinarismendiz2814 10 ай бұрын

Here the same @Darshil Parmar

@kopalsoni4780 2 жыл бұрын

Hey all, I am stuck at 40:35. I don't see the Database option for 'New Athena data source'. Not sure if QuickSight had an update since this video was created. Any suggestions?

@kopalsoni4780 2 жыл бұрын

Answering my own question, had to change the region which was a default selection.

@shutzzzzzz 2 жыл бұрын

Thanks. Can u plz tell me whether u need to pay anything to complete this project?

@dimensionalcookie 2 жыл бұрын

thank you so much

@jolaoduwole4523 Жыл бұрын

I'm stucked at 12:55 I'm unable to go pass the id type error... I deleted parquet several times but still not working

@divyakhiani1116 Жыл бұрын

I did everything in us-west-1 (California) region, but this region is not available in quicksight. Can you help please @@kopalsoni4780

@BobbyCambos Жыл бұрын

It seems like for me at the 28:26 the parquet files didn't transformed. I checked the trigger and the region but still no finding solution. Do anyone has any idea ?

@이신우-x3m Жыл бұрын

Did you remove extra white space in prefix and retry it? I solve the same problem in this way.

@BobbyCambos Жыл бұрын

@@이신우-x3m yes, and i had the same result. Only blank database with only the column names and no parquet file uploaded.

@ahmedopeyemi2980 Жыл бұрын

Thank you so much@@이신우-x3m . After days of asking, I finally found a solution.

@Trendy-Bazar Жыл бұрын

Hi, I am getting error while creating Glue ETL job 17:00 the UI is completely different and cannot proceed further, any help?

@ChandanaandSanthosh Жыл бұрын

same here.. stuck there

@srihariraman9409 Жыл бұрын

@@ChandanaandSanthosh ive set the job pipeline using the new UI, but the script editing is mismatched

@saiganesh5702 Жыл бұрын

hey parakh did your issue with ETL got resolved? if yes can you please help me with that

@yashiyengar7366 Жыл бұрын

Same for me stuck at ETL job creating section

@dhruvingandhi1114 6 ай бұрын

@@srihariraman9409 how did u set the new UI pipeline Please mention few steps

@madmonk0 10 ай бұрын

Is there an updated version of this? The Legacy Glue UI cannot be accessed now

@florenceofori7930 8 ай бұрын

Iwas also searching for it. I'm wondering what to use now.

@kushpatel699 Ай бұрын

Hey, did it got resolved??

@kushpatel699 Ай бұрын

@@florenceofori7930 Hey, did it got resolved??

@rajrockzz3797 2 жыл бұрын

Great video bro..

@meenachir1167 2 жыл бұрын

Hey.. How to convert from csv to parquet for other regions like Russia,Korea etc..

@pawandeore1656 11 ай бұрын

informative

@kushpatel699 Ай бұрын

Hello @Darshil, thank you for this informative project. I am following your project meticulously, however, for "Add Job", AWS says this "This page has been removed from AWS Glue.". How can I get the job added. Thank you.

@krishanwannigama3450 Жыл бұрын

Anyone who is struggling with trigger ,please make the trigger in s3 bucket . That will work perfectly

@akashpandey7034 24 күн бұрын

Will the AWS will charge us for using these services shown in the video?

@shivakrishna1743 Жыл бұрын

Thanks, for this!

@observerXIII 11 ай бұрын

why did you re-create the crawler at the start of the video?

@PranavPranav-w9i 4 ай бұрын

How to create new job this option is not there in my glue

@dushii_life 3 ай бұрын

The interface is changed completely i am still figuring out what's the problem.

@Lapookie 2 жыл бұрын

INTERESTING POINT HERE : at 4:56 How can you know what primary key to choose to do the INNER JOIN ? Before watchin i tried to make a.video_id = b.id Because it sounds logic that each row is unique, so the video_id should be used and compared to the id of the other table that are also unique video row. Am i wrong ? Anyone have idea ? Thanks a lot

@avni_103 2 жыл бұрын

Go and read data columns description for that While doing such type of join u will be given proper knowledge of data

@prafulbs7216 2 жыл бұрын

S3 rigger is not working for me, I tried many times. The data is not writing into s3 cleansed bucket(json files)

@eduhilfe1886 2 жыл бұрын

Please check is there any space while defining Prefix: youtube/raw_statistics_reference_data/ . If you are coping from s3 then there may be some space after youtube/

@harshalshende69 2 жыл бұрын

hello brother , have u find the solution i m getting same error

@prafulbs7216 2 жыл бұрын

@@harshalshende69 i actually did manual work, by uploading the files.

@robertmoncriefglockrock8957 Жыл бұрын

@@eduhilfe1886 This fixed it for me thank You!!

@robertmoncriefglockrock8957 Жыл бұрын

@@harshalshende69 Check your s3 trigger, make sure KZbin/ doesn't ave a space after it

@satishmajji481 2 жыл бұрын

@Darshil Parmar - "region=us/" folder is not created for me; only ca and gb folders are created upon running the ETL job. PS: I added "predicate_pushdown = "region in ('ca','gb','us')" as well but floder is missing for "us" region. Can you please take a look at this?

@pdubocho 2 жыл бұрын

Same thing happened to me. Error occurred when initially using AWS CLI to load data into s3 buckets. After executing the command to upload the csv files, I did not hit enter after the upload was "complete". I just exited the cmd box. To fix I manually uploaded the data and re-ran processes from both videos edit: this is only valid if you go into your raw data s3 folder and don't find the folder "region=us"

@jarlezio2463 2 жыл бұрын

because us is not present in the initial dataset

@Soulfulreader786 Жыл бұрын

use aws cli for creating folders using cp command

@tanny_tales 5 ай бұрын

After adding predicat pushdown i am still getting error UNCLASSIFIED_ERROR; An error occurred while calling o116.schema. Unable to parse file: RUvideos.csv

@zaheeruddinbaber6762 2 жыл бұрын

Hi, where to get the ppt you are using?

@surya-z9e Жыл бұрын

hey bro, your videos was very understandable. could you make the video deeply about the quick sight.

@YuyuZhang-v3q Жыл бұрын

Hey is there someone can help me? The UI of ETL Jobs has changed a lot and I cannot add a job successfully.

@chinmaymaganur7133 11 ай бұрын

were you able to figure this out

@gabilinguas 11 ай бұрын

Hello! I faced this issue and I figure it out. You go to ETL Jobs, click on the button "create job from a blanck graph" and go to "job details" on the menu third item

@gabilinguas 11 ай бұрын

The second part, when it clicks next, you have to go to the visual (first item on the same tab you clicked on job details), add a node, first you choose the S3 bucket to choose the source, than next you add a new node from schema, and than a third node on the target tab

@RonitSagar 11 ай бұрын

Hey @@gabilinguas I am not able to get what you said in the last comment. Can you please little more😊

@gabilinguas 11 ай бұрын

Hey @@RonitSagar ! You can follow basically the same steps described in the "Build ETL Pipeline" in 30:33 on the video. The process is almost the same, you just have to pay attention on the different details.

@N12SR48SLC Жыл бұрын

not able to see region column in my schema, also all columns showing string as the datatype

@priyadarshinibal9304 5 ай бұрын

sameee. did you find any solution?

@vallabhajoshulakrishnachai6475 2 ай бұрын

@@priyadarshinibal9304 same here did you find the solution?

@SCREENERA Жыл бұрын

Thanks a lot Darshil and project pro

@N12SR48SLC Жыл бұрын

not able to see region column in my schema, also all columns showing string as the datatype 16:07

@AryanDurge1 5 ай бұрын

is the athena serivce free or do we need to pay for the usage

@AryanDurge1 5 ай бұрын

do we need to pay for the useage of the aws athena service ??

@dushii_life 3 ай бұрын

No it's free for new yusers for one year

@jatin7089 9 ай бұрын

I am stuck on creating Glue job as UI is different. Please anyone help here . Where to change data types. I am able to add source and target

@shubhamnikam4759 9 ай бұрын

stuck in same I added data target and source but not able to figure out how to change data type

@rohitmalviya8607 7 ай бұрын

@@shubhamnikam4759 @use google gemini

@ChaosChuckler 5 ай бұрын

How to build this project through IaaC approach? So that i can delete it to avoid AWS charges and re build when necessary

@pankajchandel1000 Жыл бұрын

is there a project where you used python notebooks or emr for processing data instead of lambda functions ?

@prafulbs7216 2 жыл бұрын

Add trigger to lambda function, not working for me. Tried many times, Please suggest.

@divyakhiani1116 Жыл бұрын

facing same issue. Did you find a solution ?

@prafulbs7216 Жыл бұрын

@@divyakhiani1116 I redid the same steps once again, I guess. I don't remember, though !

@ShaunDePonte Жыл бұрын

You didn't answer the initial question as in video 1: How to categorise videos, based n their comments and stats and what factors affect how popular a youtube video will be

@yaryna_ch 3 ай бұрын

So what we should do with data like japanese?

@Soulfulreader786 Жыл бұрын

BEFORE TRIGGER DI D YOU CHANGE LAMBDA TO TAKE ALL RECORDS..LIKE INITIALLY IT WAS [ "RECORDS"][0]

@jenithmehta9603 Жыл бұрын

Job creation UI has completly changed. I am stuck at that step.

@russophile9874 Жыл бұрын

go to the script tab and click edit. Paste the spark code from the githup repo, it will work.

@subashpandey518 11 ай бұрын

@@russophile9874 could you please explain precisely what script tab and which edit? I am stuck on this step. thanks

@manigowdas7781 Жыл бұрын

Just completed this project. Thanks for the Content , understanding AWS services and using them for our use case is really crazy thing! @DarshilParmar ❤ #AWS CLI #S3 #Lambda #Glue #Crawler #Glue Studio #Glue ETL #Athena #Database #Quicksight

@saiganesh5702 Жыл бұрын

hey manoj can you please help me with new etl job visual editor scripts I am facing trouble to understand that

@yashiyengar7366 Жыл бұрын

@@saiganesh5702 even I am facing issues in ETL job creation section due to new UI

@chinmaymaganur7133 11 ай бұрын

how did you set up etl glue job

@chinmaymaganur7133 11 ай бұрын

@@saiganesh5702 were you able to figure this out

@danielofuokwu6595 2 ай бұрын

I can’t visualize the data on quick sight it is says i don’t have permission

@vasanthkumar8120 8 ай бұрын

Hey thanks for this great video. I want to know, how much does it cost to complete this entire project on AWS?

@AngelYaelRodriguezGarcia 4 ай бұрын

do anyone had the same problem with the job interface change?

@dushii_life 3 ай бұрын

Mee too

@herdata_eo4492 2 жыл бұрын

@projectpro pls consider monthly subs instead. billed 6 mths/yearly is too much.

@ProjectProDataScienceProjects 2 жыл бұрын

Hey, we have some discounts going on and it's valid only for a few days, please share your email id and our team will get in touch with you. thanks

@ueeabhishekkrsahu Жыл бұрын

Where is the discord link?

@kaushiksarmah999 2 жыл бұрын

Hello Sir, I m not able to convert the id field type to bigint i tried the steps as according to the video multiple times. Even looked online for the procedure but got noting as such. Can you help me sir?

@deepakpradhan9743 Жыл бұрын

Has your issue resolved converting id cloumn to bigint

@yeezco2535 11 ай бұрын

@nguyentiensu4088 Жыл бұрын

When my lambda function is triggered by an S3 event, the cleaned_statistics_reference_data table is created. But when I check by SQL command "SELECT * FROM cleaned_statistics_reference_data", the result is an empty table. I tested the lambda function with a test event, and everything is OK (there is data in the cleaned_statistics_reference_data table). Please help me with a solution! Thank you!

@drishtihingar2160 Жыл бұрын

have you found the solution I am facing the same issue. Please help me

@nandinisingh9217 Жыл бұрын

@@drishtihingar2160 facing the same problem have found any solution?

@drishtihingar2160 Жыл бұрын

no not yet@@nandinisingh9217

@rohitmalviya8607 7 ай бұрын

you have to upload json files through cli AFTER creating the trigger.. lambda wont process already existing json files

@SohamKatkar-xw5uf 4 ай бұрын

@@rohitmalviya8607 Hi, can you tell exactly which json files need to be uploaded and where to upload them ?

@shantanuumrani9163 Жыл бұрын

In 16:54, I'm not able to see the region source key in my output schema. What should I do?

@-aadfi-1710 3 ай бұрын

what is the solution to this?

@SCREENERA Жыл бұрын

Don't forget close the activated services from AWS..

@sivasahoo6980 Жыл бұрын

Can you tell how to close it We have to delete bucket and etl job or something else

@SCREENERA Жыл бұрын

@@sivasahoo6980 Delete all the services

@TheAINoobxoxo 9 ай бұрын

what to do about creating jobs @darshit #darshil should i use the script that is given as now aws has moved to visual etl and simple job creation has became complex for someone who doesnt know how to work with visual etl

@banarasi91 Жыл бұрын

why my trigger is not invoked when file is uploaded in s3 ,although my test is properly working in lamda function,it is not showing any error also. i am not able to understand the issue

@sivasahoo6980 Жыл бұрын

did you get any solution

@banarasi91 Жыл бұрын

@@sivasahoo6980 its been time since i posted but if i remember correctly it was somewhere with naming or syntax where there was extra space which i was not able to find then rewatcing evrything i got it, idont exactly remember where but may be in some path

@banarasi91 Жыл бұрын

hello guys,you might be getting error at the point of testing that is because of db name has been not changed in environment variable, please take care he has forget to change db name , if you notice in athena database name is db_youtube_cleaned but it should be de_youtube_cleaned, which is giving error in lamda final testing as "Entity not found"

@sivasahoo6980 Жыл бұрын

@@banarasi91 thanks a lot yeah there is a extra space in path

@N12SR48SLC Жыл бұрын

@@banarasi91 not able to see region column in my schema, also all columns showing string as the datatype 16:07. my etl job is also failing

@vijayarana8208 2 жыл бұрын

Hello, Darshil I am kind of stuck at(22.52) of the video. My job runs successfully but the raw_statistics folder is not created. I have described the region correctly in the code. any suggestion would be helpful ,

@anupammathur918 2 жыл бұрын

Check the s3 trigger remove the space after youtube/

@ajitagalawe8028 2 жыл бұрын

caught with the same issue. In my case , files were created directly in raw_statistics. There are no sub-folders "region/". Could you please help me? Thanks

@ueeabhishekkrsahu Жыл бұрын

Can you please share your script? I have created my job but it is not executing. Please share, it will be a great help.

@officialPatel-e3q Жыл бұрын

Actually in my case i am getting confused in creating the job because in the current aws ui it directly shows visual etl there is no option of target and data transform and no option of adding a job manually..if anyone could please help me with that

@ganeshb.v.s1679 7 ай бұрын

Hi I am in the last step of building ETL pipeline. I successfully created the glue job named 'de-on-youtube-parquet-analytics-version' . The contents in the de-on-youtube-analytics bucket are getting added but there is no creation of 'final_analytics' table happening. Please help me resolve the issue. Thanks in advance

@086_AASTHASHUKLA 6 ай бұрын

hi ,I created the glue job but it isnt creating the same files under raw_statistics as shown in the video how did you do it

@NikitaRamakantChaudhari 4 ай бұрын

hi, did you get the solution for your issue

@ebubeonuegbu3467 6 ай бұрын

I added this code: predicate_pushdown = "region in ('ca','gb','us')" and got this error "Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o103.getDynamicFrame. User's pushdown predicate: region in ('ca','gb','us') can not be resolved against partition columns: []" the source S3 data in my set up is partitioned by the "region" column. Please how do I resolve this?

@iamayuv 6 ай бұрын

Bro have you found the solution ?

@FRUXT Жыл бұрын

Thanks ! Why parket file ? Is not it more simple to keep everything in json or csv ?

@marvinarismendiz2814 10 ай бұрын

think it is cause the larger volumes of data, can use spark

@isaacodeh 2 жыл бұрын

I did asked you a question on your channel about the wrangler which didn’t seems to be working for me. I don’t know if it has to do with location?

@DarshilParmar 2 жыл бұрын

Yes, it is only available in some locations

@isaacodeh 2 жыл бұрын

@@DarshilParmar oh I see! Thanks for the work you do!! You have been very helpful!!!

@Lapookie 2 жыл бұрын

@@DarshilParmar why is that ?

@DeepUpadhyay-gs1ky 6 ай бұрын

where is the link for discoed ?

@vishwajithsubhash6269 2 жыл бұрын

I understand why we need to convert JSON to parquet, but why do we convert CSV to parquet it's already clean right?

@vineetsrivastava4906 Жыл бұрын

parquet file format is more optimized and is faster- read more about it on internet

@sanikaapatil7279 8 ай бұрын

Actually in my case i am getting confused in creating the job because in the current aws ui it directly shows visual et/ there is no option of target and data transform and no option of adding a job manually.if anyone could please help me with that @@vineetsrivastava4906

@jayitankar7919 2 жыл бұрын

Hi my Lamda triger for json files is not getting fired dont know whats wrong .

@aneeqbokhari4611 2 жыл бұрын

Yeah same. Have you figured it out?

@robertmoncriefglockrock8957 Жыл бұрын

@@aneeqbokhari4611 Same here

@bukunmiadebanjo9684 Жыл бұрын

Had to stop here too. After deleting all files and re-uploading, trigger does nothing.

@shantanuumrani9163 Жыл бұрын

@@bukunmiadebanjo9684 The same thing happened with me. And one figured out to solve it?

@bukunmiadebanjo9684 Жыл бұрын

@@shantanuumrani9163 didn't find a solution. The whole UI also looks different as AWS already made changes so I decided to move to a different course and abandoned this.

@merkarii Жыл бұрын

You go so fast

@lifefacts7368 9 ай бұрын

I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

@kratos_gow610 3 ай бұрын

13:39

@venkatsaiphanindraanagam 11 ай бұрын

we created ETL job to join data so that when new data gets added to the bucket it will be automatically joined instead of running an SQL query. But shoudnt we trigger this ETL job for the data addition event in S3 ? Can anyone answer this

@bhumikalalchandani321 11 ай бұрын

No, i think only 1 time lambda trigger from s3 happens for .json file to paruqet --> then cleasend s3 bucket if filled -> from there analystics data picked.. confirm this

@merkarii Жыл бұрын

But good work

@Sdsatya 2 жыл бұрын

Excellent !!

@Lapookie 2 жыл бұрын

Thank you a lot for this project! It helps me to understand what tools we generally use as Data Engineer to build data pipelines etc. But, I don't feel like to have learned how to do it myself. I mean, i have followed you along, and understand what we made but, I need more explanations on how you process the data, how you get your bucket with AWS Lambda (the code is not explicit to get when doing this " bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')") Need exercises myself

@mananyadav6401 2 жыл бұрын

You can go through the test event that we generated. There is a json in the test event that we are using to test the function. Try to navigate that , you will get the understanding how bucket name is captured etc. hope it helps

@Lapookie 2 жыл бұрын

@@mananyadav6401 Oh yeah, i'll do that, good idea, thanks for your answer :) !

@RizwanAnsari-lt3nf 6 ай бұрын

Is there any one facing issue with the lambda function as when I have added the trigger but the nothing new file is created once i upload the json file to the same bucket.

@tanny_tales 5 ай бұрын

yes yes same,did you got the solution

@cryofficial2873 4 ай бұрын

bro did you got the solution