do you have link to project too , as it github or somewhere where you have committed it ?
@vinaypattiwar10 ай бұрын
Is there any github link .
@Ta_3-k8n3 ай бұрын
Do you mind, i have a question, I stumbled into these project tutorials of yours and they are absolute gem for learners and students, so thank you for that. But do I need to have and AWS subscription in order to do these project or not, cuz I don't have money to buy one. thanks in advance.
@abdulazeezabebefe28633 ай бұрын
May I ask what the offerings of your membership list are, like what perk is available for a sergeant ,recruit or corporal tiers
@ibrahimsalaudeen11607 ай бұрын
This is the best data engineering content have since on youtube so far. Thanks for this.
@CodeWithYu7 ай бұрын
Thank you! Don't forget to spread the word! ❤️
@ibrahimsalaudeen11607 ай бұрын
@@CodeWithYu please can you help with a road map on data engineer. I'm a BI analyst want to transit.
@easypeasy552311 ай бұрын
You know you are such a gem, amazing quality paid work for free i will buy you a coffee onces i get a job brother. Keep this work on
@CodeWithYu11 ай бұрын
Thanks for the kind words! Looking forward to the coffee! 😉
@easypeasy552311 ай бұрын
@@CodeWithYu sure will have one
@RafaVeraDataEng11 ай бұрын
Don't wait for a job. It comes for sure. Let's buy a good Coffee now! It is only 5 bucks for such great content! I've not found anywhere such content quality, clear explanation and consistency of the code which is perfectly reproducible. Thanks Yusuf!
@michaelokorie674011 ай бұрын
Now beginning my Data Engineer journey and this tutorial is an absolute Gem! I was able to reproduce everything from A-Z and get it all running! Only glitch is the Broker service for some unknown reason always exits at some point so the vehicle never gets to the destination 😅. However I do still get the data on S3. Thanks again for this! Hope I can add this project to my portfolio. Looking forward to the visualisation part!
@subhamoypaul70298 ай бұрын
getting a Task not Serializable error while streaming data into S3. Checkpoints and data folders are being created in s3 but data from kafka is not getting pushed. Any idea why?
@thequang92347 ай бұрын
"Only glitch is the Broker service for some unknown reason always exits at some point" Hey if it helps, removing the KAFKA_METRIC_REPORTERS and all the CONFLUENT variables helped me not letting Kafka exits : )
@lineomatasane25234 ай бұрын
Thank you so much Yusuf! After some challenges here and there I've been able to complete the project. As a newbie in data engineering, I've learned so much in this exercise and gained more confidence. Onto the next, which is spark unstructured streaming.
@CodeWithYu4 ай бұрын
Fantastic!
@SaiPhaniRam11 ай бұрын
Thank you so much !! You are a good teacher.
@CodeWithYu11 ай бұрын
You are welcome!
@ML_Enthusiast11 ай бұрын
Another amazing pick
@Jerrel.A6 ай бұрын
Subbed! Thanks a lot for your kindness to share this amazing wisdom and knowledge!
@CodeWithYu5 ай бұрын
You’re welcome!
@pankajjaiswal314911 ай бұрын
Great Job Yu. Thanks for helping the humanity :)
@CodeWithYu11 ай бұрын
My pleasure!
@hoduytruong61511 ай бұрын
Thank you very much have a nice day
@orlandobboy4226 ай бұрын
Your tutorials are just amazing. Makes all of this stuff make sense. I would love to see one of those projects where you also use infrastructure as code with terraform for example. I know that’s more on the devops side but I had to do that at my first job as well as data engineering and was kinda lost for a while.
@AnhNguyen-hj7pd11 ай бұрын
always inspiring with handful content, keep up the good work
@mauricecolon43598 ай бұрын
I love your content Yusuf but you're when you're doing your project videos try not to jump around so much there's a lot of grey areas where the code isn't explained, or the video isn't going along with the code you posted. please consider that I want your channel to be one of the best. Ps I work at an edtech company, and I like to send my students to your channel
@shujahtali5 ай бұрын
such a great project for free hatsoff to you man🥰
@assieneolivier556010 ай бұрын
You amazing!! Keep going!!
@pratikmahajan608211 ай бұрын
I can give basic changes like Use AWS EMR in placed of AWS glue and put this project in resume and LinkedIn
@CodeWithYu11 ай бұрын
That’s another interesting angle to it! 🔥
@RodrigoBlaudt17 күн бұрын
Very nice!
@CodeWithYu17 күн бұрын
Thank you! Cheers!
@mrcrblr85011 ай бұрын
was an amazing Tutorial! you are a badass! very needy this end to end projects! and yes can you do how to connect to power bi please thanks!
@CodeWithYu11 ай бұрын
We'll see if there are more requests
@RafaVeraDataEng11 ай бұрын
If I'm not wrong, PBI has a connector available for redshift
@ukaszdugozima81611 ай бұрын
Great job ! 👏👏 Inspired !!!
@CodeWithYu11 ай бұрын
I’m glad the project got you inspired!
@DevajTheExplorer9 ай бұрын
Great content. Thank you!!!
@wiss199811 ай бұрын
Nice work..waiting for dbt and snowflake 🎉🎉😊
@CodeWithYu11 ай бұрын
Incoming… watch out! 😀
@idiyelisunday75059 ай бұрын
Wow...! such a great content
@ataimebenson8 ай бұрын
Thank you very much for this video, I learnt alot from it.
@CodeWithYu8 ай бұрын
Thank you for watching and learning from the video, it means a lot!
@nikitabogatyrev70919 ай бұрын
Thank you very much for all your project! Could you please make a end to end project with delta live tables in databricks?
@CodeWithYu8 ай бұрын
Sure thing! Don't forget to suggest this in the community section!
@fernandoa79024 ай бұрын
excellent video!
@CodeWithYu4 ай бұрын
Glad you liked it!
@aseessarkaria23235 ай бұрын
I am getting an error at about 1:50:00 in the video: ImportError: Pandas >= 1.0.5 must be installed; however, it was not found. It turns out my spark-master doesn't have enough packages, including pandas and pyarrow. I tried pip installing all of them, and then the error changed to something else that doesn't make sense Can anyone help point out what may have gone wrong?
@nabro74414 күн бұрын
yup, had the same error Fixed it by installing pandas inside the docker container. Run this command docker exec -it smartcity-spark-master-1 /bin/bash Once you're in, pip install pip install pandas finally exit the container using the command exit
2 ай бұрын
for setting up spark with docker, can you use envv variable SPARK_MODE=worker/ SPARK_MODE=master instead of the command line to create master worker containers instead?
@CodeWithYu2 ай бұрын
I suppose that could work so far the containers are not using the same KEY-VALUE pairs in the env
@RafaVeraDataEng11 ай бұрын
Great Yusuf! Thanks a lot for another terrific contribution! This is very helpful for me, as I want to implement a similar architecture for a project to driveschools here in Málaga. Just wondering how could we simulate a non-straight route between 2 points? Maybe I could get a route record (lat long) and passing it to kafka by timestamp record one each? I will replace the emergency topic for "paint points" where the students used to be suspended...
@CodeWithYu11 ай бұрын
That could work… another would be to have an algorithm that simulates curves and bends every now and then and you could get before and after values in that case
@HummingLaught8 ай бұрын
Hopefully i can complete this project + i'm trying develop it using PDM and nix-shell :)
@TMk-r5e8 ай бұрын
Please tell me through which platform this diagram was made
@morshedsarwer11 ай бұрын
I like your T-Shirt Yusuf 😀😀😀
@CodeWithYu11 ай бұрын
Haha 😀 thanks Morshed!
@abdul20ize11 ай бұрын
It would be really nice if you could share Kafka configuration docs link so we can refer for explanation of configuration.
@CodeWithYu11 ай бұрын
Apache Kafka official website is in the description
@____prajwal____11 ай бұрын
Thank you.
@CodeWithYu11 ай бұрын
My pleasure
@Arjun-b9z2 ай бұрын
Hi, Do I need to know every single tool to start this project? I am currently learning the tools as part of my course but I would really like to get a project done and came across this.
@CodeWithYuАй бұрын
you don't necessarily have to know it all. that's the whole point! you need the exposure and know how to pick up from there going forward. so keep learning!
@FAyt-ov5uo10 ай бұрын
Thank you for the great video. Can somebody help me where to find to copy at 13:50 Docker env variables
@025_h_mohitkumardora57 ай бұрын
Sir I have a doubt, can't we directly push our transformed data into warehouse, w/o passing from AWS glue architecture.
@jeanmarieabengzoa984319 күн бұрын
Thanks you for this work. Please can you explain how to create bucket policy ?
@LongNguyen-oy3qh9 ай бұрын
Where did you get the data in this project
@wreckergta547011 ай бұрын
Thank you so much
@CodeWithYu11 ай бұрын
You're most welcome
@BeyondNoise910 ай бұрын
Can you share some good resources from where I can learn pyspark ? Or from where you learn ..
@Mehtre10811 ай бұрын
For transformation, did you use pyspark.
@Mehtre10811 ай бұрын
Domain name pls
@CodeWithYu11 ай бұрын
Yes, pyspark was used
@PrathyushaReddyPingili3 ай бұрын
how can we change this to tableau visulaizations
@CodeWithYu3 ай бұрын
You’ll need to find a way to connect Tableau to the workload. However, I’m not sure Tableau and PowerBI are designed for realtime streaming. So you might want to consider tools best suited for realtime visualization
@OnkarPatole-eo5fx9 ай бұрын
Hello Yusuf, I am not able to connect my Redshift cluster with DBeaver. Could you please tell me what would be an issue?
@CodeWithYu8 ай бұрын
Majorly your VPC/Firewall permission.
@OnkarPatole-eo5fx8 ай бұрын
I was running the project again, I can see the data in the S3 bucket. But, I am not able to crawl the data, I am getting an error like accout **** denied access.
@ataimebenson8 ай бұрын
@@OnkarPatole-eo5fx create an Iam Role for your glue crawler, The Iam Role should be from Glue to S3 Access(S3 Full Access). That would give Glue Crawler access to S3
@njourawebdev11 ай бұрын
Thanks for the content, is there any free alternative for DBeaver
@CodeWithYu11 ай бұрын
You can try SQL Workbench
@dhjgj141211 ай бұрын
Do you work with something like this in your current data engineer position?
@Achilles58511 ай бұрын
Amazing Project could you do one for Azure also would be awesome thanks a lot :)
@CodeWithYu11 ай бұрын
Yes, soon!
@faizahmed00711 ай бұрын
@@CodeWithYu one for GCP too...😅
@anshulnegi18222 ай бұрын
there are no entrylevel fresher jobs for de, should a fresher target for data analyst instead?
@BeyondNoise911 ай бұрын
I have a question.. can I include this project in my resume .. and can it help ?? .. I want to move into data engineering domain from QA ( my current role )…
@JubersonOriol11 ай бұрын
Hello, I'm glad I follow your content from Latin America, my question is what route do you recommend I study to be a software architect oriented to smart cities or physical and digital integration systems, greetings from Venezuela
@CodeWithYu11 ай бұрын
Hi, you should choose the route that aligns with your passion
@JubersonOriol11 ай бұрын
@@CodeWithYu Thank you very much for the answer, I appreciate it very much, please I would like a more technical answer, since my passion is architecture and programming
@richardilemon37159 ай бұрын
Hello Mr Yu So I'm following your tutorial but I'm running into issues around 1:12:23 When I try to run the whole code Please there's no way to share my code on here but I followed you completely so I don't know if the error is from my system. The error is that I can't seem to access Kafka, I'm always getting an error
@eemayo58899 ай бұрын
Hi, if you have completed arounf 1 hour can you please connect with me. I am running into issues in the beginning(docker-compose.yaml) itself. Beginner . Would love if provided some assistance from senior.
@taz217711 ай бұрын
Hi Can you do a video on how these crypto exchanges show the real time data.
@yangwang76565 ай бұрын
Hi Yu, thanks for creating the amazing content. Some questions for the redshift part. Are the data physically loaded into the redshift or the data are actually stored in Glue or S3 bucket ? We are just redshift to read the data and maybe create another semantic layer on top of that in the next phase ?
@abiodunadebisi8149 ай бұрын
Producing IOT Data to Kafka, this where i got stuck. It kept telling me, failed to resolve broker:29092: No such host is known. I ensured that the host name is well configured with the broker still the issue is not resolve. Please I need help. Thank you.
@CodeWithYu8 ай бұрын
You need to run it on docker to recognize your broker. Otherwise you'll need to change broker to localhost
@4L3J11 ай бұрын
I had a question, which arch are you using in your mac? AMD or ARN?
@CodeWithYu11 ай бұрын
aarch64 or arm64
@sunilKumar-ci9vn11 ай бұрын
Hey Yu!! I am a college student..and i am interested in DATA Engineer Field as A Fresher How much knowledge is enough. ? Like tools and all... And their level
@biswajitsingh87909 ай бұрын
Hi, can this course be taken by someome who is a complete beginner into data engineering ?
@CodeWithYu9 ай бұрын
Yes, but this is still a little high level for absolute beginner. You may want to check out a more suitable version of this on datamasterylab.com
@DivineSam-w6m9 ай бұрын
Hi Yusuf, i am a beginner, can i work with the community version of PyCharm for these projects ?
@CodeWithYu9 ай бұрын
Yes, but this is still a little high level for absolute beginner. You may want to check out a more suitable version of this on datamasterylab.com
@ashokkumarkrishna51869 ай бұрын
@@CodeWithYu does this describe the same project in detail?... in a way a beginner can understand?...
@RafaVeraDataEng11 ай бұрын
Hi everyone! i'm stucked just at the end... time out trying to connect to redshift... has anyone set properly de vpc, security groups an permissions? im receiving time out in dbbeaver. my inbound rules are set to custom protocol TCP port 5439 any ipv4. I set publicly accesible enabled.. what am i missing? please help!
@RafaVeraDataEng11 ай бұрын
I have created again the cluster. Before: - VPC 1 avalibilty zone -a security group to this vpc with inbound port 5439 from my IP and alltrafic rule. outbound all traffic rule. -cluster subnet group -redshift cluster with public availabilty.. and I receive time out from dbeaver... :( PowerBI as well... any suggestion please?
@CodeWithYu11 ай бұрын
What this usually means is that your cluster is still not accepting connections. Have you tried using the default configurations to test? Most times, it may be your configuration that's faulty.
@CodeWithYu11 ай бұрын
Also, open your inbound and outbound port as well and associate the right VPC to your cluster
@RafaVeraDataEng11 ай бұрын
@thYu oh GOD!! more than 4 hours back and forth! just deleted all de vpc in my account. created new VPC with 2 regions, security group by default and cluster subnetgroup clicking nex next xD. finally created my 3rd cluster and assigned the VPC. Didn't work... but! I went to the security group again and added a new inbound rule: custom - port 5439 - MyIP... and.... TA DA !!! dbeaver sucesfully connected! Thanks Yusuf!
@CodeWithYu11 ай бұрын
You're welcome!
@Mehtre10811 ай бұрын
Can you pls share documents regarding this project so that we will put in resume
@CodeWithYu11 ай бұрын
You can get that in the source code. The details in the video description
@Trendy-Bazar10 ай бұрын
Hi is the full code available in the video?
@CodeWithYu10 ай бұрын
In the description
@aadhilimam825311 ай бұрын
could i do this in free tier account ?
@RafaVeraDataEng11 ай бұрын
Yes. I paid 0.4€ because I made this project with some modifications and more data volume. All for free thanks to the aws free tier. Cool doesn't it?
@DarrenColeman-d5h11 ай бұрын
anybody else run into an issue when mounting to Docker?
@longhoinh399711 ай бұрын
Need visualization pll😊
@CodeWithYu11 ай бұрын
Hahaha… I guess we’ll see if more people requests for it
@viethoangnguyen126411 ай бұрын
can u post the dataset in here please?
@CodeWithYu11 ай бұрын
The good thing is you don’t need a dataset, just run the code and the data gets created automatically.
@viethoangnguyen126411 ай бұрын
@@CodeWithYu btw i dont get about confluent, do we have to pay a lot of money to use confluent-kafka sir? or u just use the trial 1 month?
@CodeWithYu11 ай бұрын
@@viethoangnguyen1264 The docker image is free same as confluent-kafka. But if you use confluent cloud, you get free credits for like a month then you can start paying afterwards if you want to continue using their services.
@viethoangnguyen126411 ай бұрын
@@CodeWithYu but i read somewhere they said that confluent-kafka is not as supported as in java. Is it right? And 1 more question, confluent-kafka is just a library like numpy and pandas right
@thanhngohuunhat31027 ай бұрын
Hi somebody have the errror Shutdownhook called ??? :(
@ND-De-tn7ud9 ай бұрын
Thank you Yu for this amazing content. I am facing an issue while submitting a spark job and getting this error. Any help would be appreciated. " 0 artifacts copied, 12 already retrieved (0kB/16ms) 24/04/23 22:47:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" java.lang.NullPointerException: Cannot invoke "String.lastIndexOf(String)" because "path" is null"
@aliel-azzaouy700711 ай бұрын
thank you for all videos, Please can you sheare with us the source code
@CodeWithYu11 ай бұрын
Link to the source code is in the description
@errrbrrr382111 ай бұрын
bro is putting a 5 dollar payment for the source code :3
@satwikkumar-eq6fm11 ай бұрын
You're doing a great job of providing free tutorials. And charging 5 Euro for source code would not be ideal. Think from a long-term perspective. freecodecamp earns a lot with only KZbin pay, so if you can allow everyone to support you in this initial phase, you never know what you could earn in the long term. Restricting by not providing source code would definitely affect your channel growth. This is my perspective. It's up to you. And thanks for the videos@@CodeWithYu
@tintin167811 ай бұрын
Thanks, Yu . I watched the full video and liked it . Can you make a video on ETL pipeline using open-source modern data tech stack involving Duckdb , polaris etc ?
@CodeWithYu11 ай бұрын
Yeah sure… thanks for the suggestions
@nadiiar7511 ай бұрын
🤗
@truongaoquang1893Ай бұрын
🥰
@DivineSam-w6m8 ай бұрын
Hi Yusuf, Could you share the Architecture Diagram as well?
@MASponge983 ай бұрын
This is really indepth content but if I may offer a piece of advice. no one who knows data engineering well is watching these videos, you have to tailor your videos to people who dont know much. so skipping explaining why you are writing certain code is not good.
@4L3J11 ай бұрын
Just to let you know that when I ran the first time "python jobs/main.py" it returned automatically the following error "Disconnected while requesting ApiVersion: might be caused by incorrect security.protocol configuration (connecting to a SSL listener?) or broker version is < 0.10 (see api.version.request) (after 31ms in state APIVERSION_QUERY)". I solved it setting the security protocol to "PLAINTEXT" in product_config dict: producer_config = { "bootstrap.servers": KAFKA_BOOTSTRAP_SERVERS, "error_cb": lambda err: print(f"Kafka error: {err}"), "security.protocol": "PLAINTEXT", }
@LinhPham-hc4ok26 күн бұрын
Kafka error: KafkaError{code=_TRANSPORT,val=-195,str="localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 1ms in state CONNECT)"} I kept getting this error
@guavacodelab8807Ай бұрын
Please how do fix this error without having to to use confluence-airflow Broken DAG: [/opt/airflow/dags/kafka-stream.py] Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/legacy_records.py", line 50, in from kafka.codec import ( File "/home/airflow/.local/lib/python3.12/site-packages/kafka/codec.py", line 9, in from kafka.vendor.six.moves import range ModuleNotFoundError: No module named 'kafka.vendor.six.moves'
@guavacodelab8807Ай бұрын
Please how do i fix this error without having to use confluence-kafka package Broken DAG: [/opt/airflow/dags/kafka-stream.py] Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/kafka/record/legacy_records.py", line 50, in from kafka.codec import ( File "/home/airflow/.local/lib/python3.12/site-packages/kafka/codec.py", line 9, in from kafka.vendor.six.moves import range ModuleNotFoundError: No module named 'kafka.vendor.six.moves'
@CodeWithYuАй бұрын
Python 3.12 is a little problematic at this time. Can you try using Python 3.9 or 3.10? It should fix your errors
@guavacodelab8807Ай бұрын
@@CodeWithYu I was initially running python 3.9, i switched to 3.12 because of the error.