this is great data engineering project by the way and well explained, especially for beginner who starting to learn.. i have learnt a lot. thanks, Darshil. those who are stuck at creating AWS Glue ETL job, seems AWS updated the UI and settings of it. it's not the same way as he shown in the video, there is another way you can do that, -->he attached the PySpark code for the ETL job creation in the description, ---> then, AWS Glue --> ETL jobs --> choose option create job using script , now simply copy script in there. thats all. save the job and try running it. it should work.. Note: Make sure the data source details are modified as per your database name and table name in the script, other wise job may fail with an error Thanks!!
@KatyayaniDattiАй бұрын
@aman_nv I too have completed the project over this weekend. cheers to us 🥳 Coming to the ETL Job creation part, copy pasting the script is fine but as a beginner I find it difficult to write such script. The old UI was better where it generated the script for us based on our actions, do we have something similar in the new UI?
@iamayuv6 ай бұрын
00:03 Creating a crawler to understand and analyze data stored in AWS S3 buckets 05:48 Query execution and data type casting 11:43 Preprocessing and Efficiency for Querying 17:16 Writing data into the target bucket and creating partitions 22:05 Create a glue crawler to clean and catalog data 27:25 Data processing pipeline created using AWS Glue Studio 32:23 Created an analytical pipeline using AWS Glue to transform and store data 37:18 Building reporting version of data makes it easier for data scientists to analyze and query the data. 42:01 Create a dashboard to visualize data from KZbin
@mrbcan72156 ай бұрын
Hi, Can you explain me how the "raw_statistic" table has been created autamatcally after he created the crawl_1, When I tried same processor, it didint work on me
@robertmoncriefglockrock8957 Жыл бұрын
This is a simple error I ran into gonna post it here incase others have the same. When trying to run the job @21:25 I was getting "NameError: name 'gluecontext' is not defined. When adding the line "df_final_output = DynamicFrame.fromDF(datasink1, gluecontext, "df_final_output")" I accidentally forgot to capitalize glueContext, instead I put gluecontext Thank you for this walkthrough, I start my new Data Engineering job tomorrow and the company uses AWS so this has helped me tremendously. You are doing magic my friend
@vishnuvardhan908210 ай бұрын
Hi Robert, Hope you are doing well. it's been over a year since you posted and joined your new company. Just wanted to check if this new job was your first data engineering job or if you were already experienced in DE? And how are things at your new workplace?
@LaxmikantMaheshjiKabra6 ай бұрын
Great work Darshil! I have only 1 suggestion after finishing the whole project along with the video which took me around a total of 6-8 hours except the dashboard. My suggestion is that you can take a minute extra and explain the code properly so that we viewers can understand what transform actions are we taking in the ETL because that would make more sense to the video overall and why you chose the steps were there before and after ETL step becomes clearer. Though, thanks for this wonderful project and I am probably moving to the Azure analytics project after this one.
@Ujwalarao Жыл бұрын
Thanks for bringing me close to the real use case scenario of Data Engineering.
@freelychanu208616 күн бұрын
Well explained everything 0 to 100% , thank you so much!❤👍
@lâmtrịnh-r4m4 ай бұрын
12:08 if u still got issue with hive_bad_data after following step by step : u need to convert id to bigint by lambda funtion : add this code at line 21 " # Extract required columns: df_step_1 = pd.json_normalize(df_raw['items']) df_step_1['id']=df_step_1['id'].astype(int)"
@VishalThota-p1r4 ай бұрын
Appreciated your efforts Darshil
@rahulgyani29658 ай бұрын
Hi Darshil, Thank you for this video. I have a question for you, when you have created a cleaned version for csv to parquet, why didnt we use lambda function instead of glue job?
@youraverageguide2 жыл бұрын
Darshil is a great teacher! Great project.
@PriyankaGopi-j6m10 ай бұрын
Finally completed this project. Thankyou so much for this! You're a gem :)
@SatishSharma-rh4su10 ай бұрын
Heyy, I am facing an error while joining cleaned and raw data. Can you please help?
@sreyassawant220510 ай бұрын
Hey I am facing same issue. Can you help out?
@PranavPranav-w9i4 ай бұрын
how did you created job in glue i am not able to create there is no option for it
@assieneolivier5560 Жыл бұрын
Finally get this project done. Great project to learn data engineering!!
@chinmaymaganur713311 ай бұрын
Hi how did you set up the etl job?
@xandao305 ай бұрын
@@chinmaymaganur7133 good question
@ajtam052 жыл бұрын
Another great video. Only thing is...AWS has updated Glue console along w/ other consoles. I believe I updated accordingly, except for the schema datatypes (which it looks like I change update after the job is run). But for the script...it does look entirely different. Could you assist w/ an updated vid on using the new Glue consoles?
@jenithmehta9603 Жыл бұрын
I am facing the same issue
@ajtam05 Жыл бұрын
@Jenith Mehta If you scroll to the bottom of the navigation pane there is "LEGACY" versions. I realized after I posted this but I used that. Hope that helps. 😀
@rohanchoudhary672 Жыл бұрын
@@ajtam05 can't find that
@SCREENERA Жыл бұрын
Finally,after 100times of disappointments...i done it...Great Efforts and it's my very first Project in DataEngineering field...Thanks ...Errors are challenging but only whose who have a real interest in DataEngineering.He will definitely achieve it ..by.Done this project completely.
@N12SR48SLC Жыл бұрын
An error occurred while calling o88.getDynamicFrame. User's pushdown predicate: region in ('ca','gb','us') can not be resolved against partition columns: [] in my job 23:00
@SCREENERA Жыл бұрын
@@N12SR48SLC sorry bro.
@SCREENERA Жыл бұрын
@@N12SR48SLC yes bro
@SCREENERA Жыл бұрын
Don't forget close the activated services from AWS..after project got done
@SCREENERA Жыл бұрын
@@N12SR48SLC what's the error
@dsilvera1578 Жыл бұрын
Darshil I learned alot. I believe this is helping many persons. Thanks for all the effort you put into this.
@srinivasn4510 Жыл бұрын
best project from scratch Thanks bro☺☺
@jsingh781010 ай бұрын
Thanks Darshil!! Finally made this cool project after overcoming all those errors. Really good explanation.
@GustavoFringe-dv2yg9 ай бұрын
@@aishwaryapatel7045facing same issue
@lifefacts73689 ай бұрын
I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....
@blangussАй бұрын
I am stuck at 15:00. After adding the node for the data source path node, I added a transform node, then the node for the target path. However, I cannot seem to find a way to edit the data types like he did (from long to bigint). I cannot find the interface to compare the schemas. The UI has changed a lot and I am struggling to figure it out. Someone help
@nitishkaushik20762 жыл бұрын
simple and to the point explanation. Great work bro 👍🏻
@adib43612 жыл бұрын
How do we showcase this project in our linkedin profile our in our resume
@piyushpaikroy35792 жыл бұрын
hey Darshil.... I hope the project is complete!!
@rushikeshdarge61155 ай бұрын
Thank You for the Awesome Tutorial!!!
@ajitagalawe80282 жыл бұрын
Too good. Learned alot.Thank you
@umerimran3833 Жыл бұрын
that was the awesome project, Thank you!
@saitarun3246 Жыл бұрын
Great video Darshil, thank you so much!!
@lifefacts73689 ай бұрын
I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....
@harshitjoshi2250 Жыл бұрын
To the folks struggling with glue script to filter regions out: try deleting the region files manually from S3 (make sure to enable bucket versioning so that objects are not permanently deleted). By doing this you can check if rest of your code is good, and even go on with rest of the video if its working.
@iamgdsclead4208 Жыл бұрын
I am getting timeout error
@chayanshrangraj4298 Жыл бұрын
I think it's better if you just move the folders somewhere else so you won't have to upload them again in the future.
@lguerrero17 Жыл бұрын
Which part of the video ?
@snehakadam16 Жыл бұрын
Hey hi, can you please share the pyspark code?
@akshayrajput90492 жыл бұрын
great work darshil bro ...can you send the ppt if possible
@neha40242 жыл бұрын
Can you provide ETL script shown in the video. I am getting error even after adding predicate_pushdown
@tanny_tales5 ай бұрын
did you resolved it ? I am facing few issues in it kindly help
@shutzzzzzz2 жыл бұрын
Do i have to pay anything to complete this project? Or it is completely free?
@imenbenhassine9710 Жыл бұрын
@darshil thanks for the effort, great job!! I just finished the project and so proud of myself, my very first project switching from DA to DE. thanks a lot
@ybalasaireddy1248 Жыл бұрын
Hey hi did you get lambda timeout error by any chance?
@allenclement5672 Жыл бұрын
hey iam seeing the new aws glue ui. how did you create the job there? iam facing a lot of confusion on what to select and to navigate in that .the video ui is different.
@vighneshbuddhivant8353 Жыл бұрын
@@allenclement5672 hey did you solve this issue
@uditkapadia7104 Жыл бұрын
@@ybalasaireddy1248if you are getting time error...pls increase time
@KomilMustaev Жыл бұрын
@@allenclement5672 same problems... Did u solve them? can u help me?
@kuldeep_garg2 жыл бұрын
You are doing such a great work, please should learn from you how to teach by this learning by doing method… Please do some more projects like this using real time data, big data also so that we can learn that also. And thanks again this tutorial is helping a lot🎉❤
@sharafmomen2460 Жыл бұрын
Really great project! Just wanted to ask, when more data ends up in the landing area, will the rest of the processes after automatically go through the pipeline you created? Because it seemed like some parts you had to do manually, like using AWS Lambda.
@mkdTech3692 ай бұрын
Great
@TanmayMeda7 ай бұрын
Thank you so much for the Amazing video
@ashutoshdixit204910 ай бұрын
Good work ,it's a grate project ,helped out learning many things.
@sanikaapatil72798 ай бұрын
can you help me idont understand with new interface of awsglue
@vishalkamlapure33442 жыл бұрын
Thank you Darshil for this wonderful project.. I have been looking for such project for long time.
@ishan358 Жыл бұрын
How did you solve runtime error
@chayanshrangraj4298 Жыл бұрын
@@ishan358 What kind of error are you getting?
@lguerrero17 Жыл бұрын
@@chayanshrangraj4298 can You help me with an error ? In the step of aws glue to do the join of the tables
@chayanshrangraj4298 Жыл бұрын
@@lguerrero17 Sure! What is the error that you are facing?
@lguerrero17 Жыл бұрын
@@chayanshrangraj4298 When I try to do create the etl to generate table analytics , it creates the table but doesn't generate columns and rows.
@ajtam05 Жыл бұрын
Has anyone used ProjectPro before? I'm considering investing into it, but just wanted to see if anyone has experience with it yet? Looks promising.
@nikhilrunku88775 ай бұрын
Hi Darshil, I have been trying to implement this project. At 13:28 you have created a job, but I am not able to see that option in the current version. All I can see is to create the ETL job visually. Can you please help me with this?
@rushikeshdarge61155 ай бұрын
Did you get a solution?
@tanny_tales5 ай бұрын
@@rushikeshdarge6115 did you get a solution?
@kushpatel699Ай бұрын
Hey, did it got resolved??
@kushpatel699Ай бұрын
@@rushikeshdarge6115 Hey, did it got resolved??
@rafdeo12 күн бұрын
@@kushpatel699 proceed with creating it visually via drag and drop. s3 as source > transform schema > s3 as target. Only thing is for me the partitioned region column doesn't pull in so I am not able to do the next step. Hopefully you have better luck
@atharvasankhe11537 ай бұрын
how to get the Jobs ( 13:30 ). Apparently, the Glue console has changed , so not sure how to go ahead
@vineetchotaliya39787 ай бұрын
got solution to this ? 💀
@GauravKhanna-cl7gd7 ай бұрын
The Athena query works first time on the parquet file, and then I have to delete the unsaved folder in the cleansed bucket, has anyone dealt with this, I am still at the 5 min mark of this video. Really frustrating!!
@prasannakusugal43332 жыл бұрын
Thanks for the great video Darshil !!! Leant allot of new things :)
@lifefacts73689 ай бұрын
I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....
@subashpandey51811 ай бұрын
someone plz help, the UI for creating the job completely changed. I am not able to create new jobs.
@marvinarismendiz281410 ай бұрын
Here the same @Darshil Parmar
@kopalsoni47802 жыл бұрын
Hey all, I am stuck at 40:35. I don't see the Database option for 'New Athena data source'. Not sure if QuickSight had an update since this video was created. Any suggestions?
@kopalsoni47802 жыл бұрын
Answering my own question, had to change the region which was a default selection.
@shutzzzzzz2 жыл бұрын
Thanks. Can u plz tell me whether u need to pay anything to complete this project?
@dimensionalcookie2 жыл бұрын
thank you so much
@jolaoduwole4523 Жыл бұрын
I'm stucked at 12:55 I'm unable to go pass the id type error... I deleted parquet several times but still not working
@divyakhiani1116 Жыл бұрын
I did everything in us-west-1 (California) region, but this region is not available in quicksight. Can you help please @@kopalsoni4780
@BobbyCambos Жыл бұрын
It seems like for me at the 28:26 the parquet files didn't transformed. I checked the trigger and the region but still no finding solution. Do anyone has any idea ?
@이신우-x3m Жыл бұрын
Did you remove extra white space in prefix and retry it? I solve the same problem in this way.
@BobbyCambos Жыл бұрын
@@이신우-x3m yes, and i had the same result. Only blank database with only the column names and no parquet file uploaded.
@ahmedopeyemi2980 Жыл бұрын
Thank you so much@@이신우-x3m . After days of asking, I finally found a solution.
@Trendy-Bazar Жыл бұрын
Hi, I am getting error while creating Glue ETL job 17:00 the UI is completely different and cannot proceed further, any help?
@ChandanaandSanthosh Жыл бұрын
same here.. stuck there
@srihariraman9409 Жыл бұрын
@@ChandanaandSanthosh ive set the job pipeline using the new UI, but the script editing is mismatched
@saiganesh5702 Жыл бұрын
hey parakh did your issue with ETL got resolved? if yes can you please help me with that
@yashiyengar7366 Жыл бұрын
Same for me stuck at ETL job creating section
@dhruvingandhi11146 ай бұрын
@@srihariraman9409 how did u set the new UI pipeline Please mention few steps
@madmonk010 ай бұрын
Is there an updated version of this? The Legacy Glue UI cannot be accessed now
@florenceofori79308 ай бұрын
Iwas also searching for it. I'm wondering what to use now.
@kushpatel699Ай бұрын
Hey, did it got resolved??
@kushpatel699Ай бұрын
@@florenceofori7930 Hey, did it got resolved??
@rajrockzz37972 жыл бұрын
Great video bro..
@meenachir11672 жыл бұрын
Hey.. How to convert from csv to parquet for other regions like Russia,Korea etc..
@pawandeore165611 ай бұрын
informative
@kushpatel699Ай бұрын
Hello @Darshil, thank you for this informative project. I am following your project meticulously, however, for "Add Job", AWS says this "This page has been removed from AWS Glue.". How can I get the job added. Thank you.
@krishanwannigama3450 Жыл бұрын
Anyone who is struggling with trigger ,please make the trigger in s3 bucket . That will work perfectly
@akashpandey703424 күн бұрын
Will the AWS will charge us for using these services shown in the video?
@shivakrishna1743 Жыл бұрын
Thanks, for this!
@observerXIII11 ай бұрын
why did you re-create the crawler at the start of the video?
@PranavPranav-w9i4 ай бұрын
How to create new job this option is not there in my glue
@dushii_life3 ай бұрын
The interface is changed completely i am still figuring out what's the problem.
@Lapookie2 жыл бұрын
INTERESTING POINT HERE : at 4:56 How can you know what primary key to choose to do the INNER JOIN ? Before watchin i tried to make a.video_id = b.id Because it sounds logic that each row is unique, so the video_id should be used and compared to the id of the other table that are also unique video row. Am i wrong ? Anyone have idea ? Thanks a lot
@avni_1032 жыл бұрын
Go and read data columns description for that While doing such type of join u will be given proper knowledge of data
@prafulbs72162 жыл бұрын
S3 rigger is not working for me, I tried many times. The data is not writing into s3 cleansed bucket(json files)
@eduhilfe18862 жыл бұрын
Please check is there any space while defining Prefix: youtube/raw_statistics_reference_data/ . If you are coping from s3 then there may be some space after youtube/
@harshalshende692 жыл бұрын
hello brother , have u find the solution i m getting same error
@prafulbs72162 жыл бұрын
@@harshalshende69 i actually did manual work, by uploading the files.
@robertmoncriefglockrock8957 Жыл бұрын
@@eduhilfe1886 This fixed it for me thank You!!
@robertmoncriefglockrock8957 Жыл бұрын
@@harshalshende69 Check your s3 trigger, make sure KZbin/ doesn't ave a space after it
@satishmajji4812 жыл бұрын
@Darshil Parmar - "region=us/" folder is not created for me; only ca and gb folders are created upon running the ETL job. PS: I added "predicate_pushdown = "region in ('ca','gb','us')" as well but floder is missing for "us" region. Can you please take a look at this?
@pdubocho2 жыл бұрын
Same thing happened to me. Error occurred when initially using AWS CLI to load data into s3 buckets. After executing the command to upload the csv files, I did not hit enter after the upload was "complete". I just exited the cmd box. To fix I manually uploaded the data and re-ran processes from both videos edit: this is only valid if you go into your raw data s3 folder and don't find the folder "region=us"
@jarlezio24632 жыл бұрын
because us is not present in the initial dataset
@Soulfulreader786 Жыл бұрын
use aws cli for creating folders using cp command
@tanny_tales5 ай бұрын
After adding predicat pushdown i am still getting error UNCLASSIFIED_ERROR; An error occurred while calling o116.schema. Unable to parse file: RUvideos.csv
@zaheeruddinbaber67622 жыл бұрын
Hi, where to get the ppt you are using?
@surya-z9e Жыл бұрын
hey bro, your videos was very understandable. could you make the video deeply about the quick sight.
@YuyuZhang-v3q Жыл бұрын
Hey is there someone can help me? The UI of ETL Jobs has changed a lot and I cannot add a job successfully.
@chinmaymaganur713311 ай бұрын
were you able to figure this out
@gabilinguas11 ай бұрын
Hello! I faced this issue and I figure it out. You go to ETL Jobs, click on the button "create job from a blanck graph" and go to "job details" on the menu third item
@gabilinguas11 ай бұрын
The second part, when it clicks next, you have to go to the visual (first item on the same tab you clicked on job details), add a node, first you choose the S3 bucket to choose the source, than next you add a new node from schema, and than a third node on the target tab
@RonitSagar11 ай бұрын
Hey @@gabilinguas I am not able to get what you said in the last comment. Can you please little more😊
@gabilinguas11 ай бұрын
Hey @@RonitSagar ! You can follow basically the same steps described in the "Build ETL Pipeline" in 30:33 on the video. The process is almost the same, you just have to pay attention on the different details.
@N12SR48SLC Жыл бұрын
not able to see region column in my schema, also all columns showing string as the datatype
@priyadarshinibal93045 ай бұрын
sameee. did you find any solution?
@vallabhajoshulakrishnachai64752 ай бұрын
@@priyadarshinibal9304 same here did you find the solution?
@SCREENERA Жыл бұрын
Thanks a lot Darshil and project pro
@N12SR48SLC Жыл бұрын
not able to see region column in my schema, also all columns showing string as the datatype 16:07
@AryanDurge15 ай бұрын
is the athena serivce free or do we need to pay for the usage
@AryanDurge15 ай бұрын
do we need to pay for the useage of the aws athena service ??
@dushii_life3 ай бұрын
No it's free for new yusers for one year
@jatin70899 ай бұрын
I am stuck on creating Glue job as UI is different. Please anyone help here . Where to change data types. I am able to add source and target
@shubhamnikam47599 ай бұрын
stuck in same I added data target and source but not able to figure out how to change data type
@rohitmalviya86077 ай бұрын
@@shubhamnikam4759 @use google gemini
@ChaosChuckler5 ай бұрын
How to build this project through IaaC approach? So that i can delete it to avoid AWS charges and re build when necessary
@pankajchandel1000 Жыл бұрын
is there a project where you used python notebooks or emr for processing data instead of lambda functions ?
@prafulbs72162 жыл бұрын
Add trigger to lambda function, not working for me. Tried many times, Please suggest.
@divyakhiani1116 Жыл бұрын
facing same issue. Did you find a solution ?
@prafulbs7216 Жыл бұрын
@@divyakhiani1116 I redid the same steps once again, I guess. I don't remember, though !
@ShaunDePonte Жыл бұрын
You didn't answer the initial question as in video 1: How to categorise videos, based n their comments and stats and what factors affect how popular a youtube video will be
@yaryna_ch3 ай бұрын
So what we should do with data like japanese?
@Soulfulreader786 Жыл бұрын
BEFORE TRIGGER DI D YOU CHANGE LAMBDA TO TAKE ALL RECORDS..LIKE INITIALLY IT WAS [ "RECORDS"][0]
@jenithmehta9603 Жыл бұрын
Job creation UI has completly changed. I am stuck at that step.
@russophile9874 Жыл бұрын
go to the script tab and click edit. Paste the spark code from the githup repo, it will work.
@subashpandey51811 ай бұрын
@@russophile9874 could you please explain precisely what script tab and which edit? I am stuck on this step. thanks
@manigowdas7781 Жыл бұрын
Just completed this project. Thanks for the Content , understanding AWS services and using them for our use case is really crazy thing! @DarshilParmar ❤ #AWS CLI #S3 #Lambda #Glue #Crawler #Glue Studio #Glue ETL #Athena #Database #Quicksight
@saiganesh5702 Жыл бұрын
hey manoj can you please help me with new etl job visual editor scripts I am facing trouble to understand that
@yashiyengar7366 Жыл бұрын
@@saiganesh5702 even I am facing issues in ETL job creation section due to new UI
@chinmaymaganur713311 ай бұрын
how did you set up etl glue job
@chinmaymaganur713311 ай бұрын
@@saiganesh5702 were you able to figure this out
@danielofuokwu65952 ай бұрын
I can’t visualize the data on quick sight it is says i don’t have permission
@vasanthkumar81208 ай бұрын
Hey thanks for this great video. I want to know, how much does it cost to complete this entire project on AWS?
@AngelYaelRodriguezGarcia4 ай бұрын
do anyone had the same problem with the job interface change?
@dushii_life3 ай бұрын
Mee too
@herdata_eo44922 жыл бұрын
@projectpro pls consider monthly subs instead. billed 6 mths/yearly is too much.
@ProjectProDataScienceProjects2 жыл бұрын
Hey, we have some discounts going on and it's valid only for a few days, please share your email id and our team will get in touch with you. thanks
@ueeabhishekkrsahu Жыл бұрын
Where is the discord link?
@kaushiksarmah9992 жыл бұрын
Hello Sir, I m not able to convert the id field type to bigint i tried the steps as according to the video multiple times. Even looked online for the procedure but got noting as such. Can you help me sir?
@deepakpradhan9743 Жыл бұрын
Has your issue resolved converting id cloumn to bigint
@yeezco253511 ай бұрын
No
@nguyentiensu4088 Жыл бұрын
When my lambda function is triggered by an S3 event, the cleaned_statistics_reference_data table is created. But when I check by SQL command "SELECT * FROM cleaned_statistics_reference_data", the result is an empty table. I tested the lambda function with a test event, and everything is OK (there is data in the cleaned_statistics_reference_data table). Please help me with a solution! Thank you!
@drishtihingar2160 Жыл бұрын
have you found the solution I am facing the same issue. Please help me
@nandinisingh9217 Жыл бұрын
@@drishtihingar2160 facing the same problem have found any solution?
@drishtihingar2160 Жыл бұрын
no not yet@@nandinisingh9217
@rohitmalviya86077 ай бұрын
you have to upload json files through cli AFTER creating the trigger.. lambda wont process already existing json files
@SohamKatkar-xw5uf4 ай бұрын
@@rohitmalviya8607 Hi, can you tell exactly which json files need to be uploaded and where to upload them ?
@shantanuumrani9163 Жыл бұрын
In 16:54, I'm not able to see the region source key in my output schema. What should I do?
@-aadfi-17103 ай бұрын
what is the solution to this?
@SCREENERA Жыл бұрын
Don't forget close the activated services from AWS..
@sivasahoo6980 Жыл бұрын
Can you tell how to close it We have to delete bucket and etl job or something else
@SCREENERA Жыл бұрын
@@sivasahoo6980 Delete all the services
@TheAINoobxoxo9 ай бұрын
what to do about creating jobs @darshit #darshil should i use the script that is given as now aws has moved to visual etl and simple job creation has became complex for someone who doesnt know how to work with visual etl
@banarasi91 Жыл бұрын
why my trigger is not invoked when file is uploaded in s3 ,although my test is properly working in lamda function,it is not showing any error also. i am not able to understand the issue
@sivasahoo6980 Жыл бұрын
did you get any solution
@banarasi91 Жыл бұрын
@@sivasahoo6980 its been time since i posted but if i remember correctly it was somewhere with naming or syntax where there was extra space which i was not able to find then rewatcing evrything i got it, idont exactly remember where but may be in some path
@banarasi91 Жыл бұрын
hello guys,you might be getting error at the point of testing that is because of db name has been not changed in environment variable, please take care he has forget to change db name , if you notice in athena database name is db_youtube_cleaned but it should be de_youtube_cleaned, which is giving error in lamda final testing as "Entity not found"
@sivasahoo6980 Жыл бұрын
@@banarasi91 thanks a lot yeah there is a extra space in path
@N12SR48SLC Жыл бұрын
@@banarasi91 not able to see region column in my schema, also all columns showing string as the datatype 16:07. my etl job is also failing
@vijayarana82082 жыл бұрын
Hello, Darshil I am kind of stuck at(22.52) of the video. My job runs successfully but the raw_statistics folder is not created. I have described the region correctly in the code. any suggestion would be helpful ,
@anupammathur9182 жыл бұрын
Check the s3 trigger remove the space after youtube/
@ajitagalawe80282 жыл бұрын
caught with the same issue. In my case , files were created directly in raw_statistics. There are no sub-folders "region/". Could you please help me? Thanks
@ueeabhishekkrsahu Жыл бұрын
Can you please share your script? I have created my job but it is not executing. Please share, it will be a great help.
@officialPatel-e3q Жыл бұрын
Actually in my case i am getting confused in creating the job because in the current aws ui it directly shows visual etl there is no option of target and data transform and no option of adding a job manually..if anyone could please help me with that
@ganeshb.v.s16797 ай бұрын
Hi I am in the last step of building ETL pipeline. I successfully created the glue job named 'de-on-youtube-parquet-analytics-version' . The contents in the de-on-youtube-analytics bucket are getting added but there is no creation of 'final_analytics' table happening. Please help me resolve the issue. Thanks in advance
@086_AASTHASHUKLA6 ай бұрын
hi ,I created the glue job but it isnt creating the same files under raw_statistics as shown in the video how did you do it
@NikitaRamakantChaudhari4 ай бұрын
hi, did you get the solution for your issue
@ebubeonuegbu34676 ай бұрын
I added this code: predicate_pushdown = "region in ('ca','gb','us')" and got this error "Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o103.getDynamicFrame. User's pushdown predicate: region in ('ca','gb','us') can not be resolved against partition columns: []" the source S3 data in my set up is partitioned by the "region" column. Please how do I resolve this?
@iamayuv6 ай бұрын
Bro have you found the solution ?
@FRUXT Жыл бұрын
Thanks ! Why parket file ? Is not it more simple to keep everything in json or csv ?
@marvinarismendiz281410 ай бұрын
think it is cause the larger volumes of data, can use spark
@isaacodeh2 жыл бұрын
I did asked you a question on your channel about the wrangler which didn’t seems to be working for me. I don’t know if it has to do with location?
@DarshilParmar2 жыл бұрын
Yes, it is only available in some locations
@isaacodeh2 жыл бұрын
@@DarshilParmar oh I see! Thanks for the work you do!! You have been very helpful!!!
@Lapookie2 жыл бұрын
@@DarshilParmar why is that ?
@DeepUpadhyay-gs1ky6 ай бұрын
where is the link for discoed ?
@vishwajithsubhash62692 жыл бұрын
I understand why we need to convert JSON to parquet, but why do we convert CSV to parquet it's already clean right?
@vineetsrivastava4906 Жыл бұрын
parquet file format is more optimized and is faster- read more about it on internet
@sanikaapatil72798 ай бұрын
Actually in my case i am getting confused in creating the job because in the current aws ui it directly shows visual et/ there is no option of target and data transform and no option of adding a job manually.if anyone could please help me with that @@vineetsrivastava4906
@jayitankar79192 жыл бұрын
Hi my Lamda triger for json files is not getting fired dont know whats wrong .
@aneeqbokhari46112 жыл бұрын
Yeah same. Have you figured it out?
@robertmoncriefglockrock8957 Жыл бұрын
@@aneeqbokhari4611 Same here
@bukunmiadebanjo9684 Жыл бұрын
Had to stop here too. After deleting all files and re-uploading, trigger does nothing.
@shantanuumrani9163 Жыл бұрын
@@bukunmiadebanjo9684 The same thing happened with me. And one figured out to solve it?
@bukunmiadebanjo9684 Жыл бұрын
@@shantanuumrani9163 didn't find a solution. The whole UI also looks different as AWS already made changes so I decided to move to a different course and abandoned this.
@merkarii Жыл бұрын
You go so fast
@lifefacts73689 ай бұрын
I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....
@kratos_gow6103 ай бұрын
13:39
@venkatsaiphanindraanagam11 ай бұрын
we created ETL job to join data so that when new data gets added to the bucket it will be automatically joined instead of running an SQL query. But shoudnt we trigger this ETL job for the data addition event in S3 ? Can anyone answer this
@bhumikalalchandani32111 ай бұрын
No, i think only 1 time lambda trigger from s3 happens for .json file to paruqet --> then cleasend s3 bucket if filled -> from there analystics data picked.. confirm this
@merkarii Жыл бұрын
But good work
@Sdsatya2 жыл бұрын
Excellent !!
@Lapookie2 жыл бұрын
Thank you a lot for this project! It helps me to understand what tools we generally use as Data Engineer to build data pipelines etc. But, I don't feel like to have learned how to do it myself. I mean, i have followed you along, and understand what we made but, I need more explanations on how you process the data, how you get your bucket with AWS Lambda (the code is not explicit to get when doing this " bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')") Need exercises myself
@mananyadav64012 жыл бұрын
You can go through the test event that we generated. There is a json in the test event that we are using to test the function. Try to navigate that , you will get the understanding how bucket name is captured etc. hope it helps
@Lapookie2 жыл бұрын
@@mananyadav6401 Oh yeah, i'll do that, good idea, thanks for your answer :) !
@RizwanAnsari-lt3nf6 ай бұрын
Is there any one facing issue with the lambda function as when I have added the trigger but the nothing new file is created once i upload the json file to the same bucket.