Data Ingestion From APIs to Warehouses - Adrian Brudaru

  Рет қаралды 20,523

DataTalksClub ⬛

DataTalksClub ⬛

Күн бұрын

Пікірлер: 22
@fahadshoaib8735
@fahadshoaib8735 Жыл бұрын
Starts at 10:44
@manonleroux5861
@manonleroux5861 11 ай бұрын
Classe really well made, i enjoy to have a complete .md file with all instruction writed. It's really convenient ! Thanks !
@dltHub
@dltHub 7 ай бұрын
Thank you! the intention was that here the class is delivered in a human format, and the student can take their time to deep dive on their own.
@easypeasy5523
@easypeasy5523 11 ай бұрын
Amazing content and discussion, as data engineer myself looking forward to contribution in the project.
@dltHub
@dltHub 7 ай бұрын
Thank you!
@OskarLindberg-v5h
@OskarLindberg-v5h Жыл бұрын
seems like a great tool! Combined with Mage that already has great integrations with dbt, it seems you can make a powerpul and easy-to-set-up data pipeline :-)
@dltHub
@dltHub 7 ай бұрын
Yep we are orchestrator agostic so you can run it on mage or on whatever you want. We offer a dbt runner too, so you don't have to set up credentials and config twice.
@tobiasfsdfsd
@tobiasfsdfsd 11 ай бұрын
Might be a stupid question but in 33:25 you say that if we have 4gb in file, we have 8gb in memory. Why is that?
@MAHDY51
@MAHDY51 11 ай бұрын
because it runs twice
@DylanTan
@DylanTan 11 ай бұрын
You keep the contents of the 4GB file in memory twice, in both "data" and "parsed_data" variables, thus 8GB of memory is consumed
@dltHub
@dltHub 7 ай бұрын
Besides that it was kept twice, we assume efficient data storage. In reality, if you load 4gb to a df, you might see much more ram usage.
@fabmeyer_ch
@fabmeyer_ch 11 ай бұрын
If I let both examples run in Google Colab for all rows, not just the first 5 in example 3, I am getting about 10x speed up with example 3 vs. example 2. How can this be possible?
@dltHub
@dltHub 7 ай бұрын
Because a full download at once is faster than a full download as stream. What we demonstrate in the timing is that one does a full download while the other only 5 rows.
@ИванАвито-и6э
@ИванАвито-и6э Жыл бұрын
lego 75192 millennium falcon?
@DataTalksClub
@DataTalksClub Жыл бұрын
Yes =)
@dawei7
@dawei7 11 ай бұрын
Quite hard to watch & follow. Workshop should be "writing code", step by step and not show ready code.
@MegaTarino
@MegaTarino 11 ай бұрын
The topic is too broad to write a code, he had like 1 hour to demonstrate 3 dlt features, make an introduction, show generators. It would have taken a few hours to write it. Moreover in the case of dlt the code itself is 5 lines and one has a notebook... I agree that it was sometimes hard to follow, but better so, than looking a few hours video (IMHO).
@tobiasfsdfsd
@tobiasfsdfsd 11 ай бұрын
Yeah, well most of the time it was like reading loud what is written in the file. There was not really a use of free words and only a few examples where explained better than written in the text. Seems sufficient to read the file. But I was happy when Alex tried to ask some questions or explained some things differently. Apart from that I think there is no need to watch the video
@dltHub
@dltHub 7 ай бұрын
@@tobiasfsdfsd This was a live workshop which had to be pre-prepared to fit in the time and cover everything effectively. There would be little benefit going off topic. A class is different and you are welcome to suggest it to us, if there is enough demand we will do it!
@dltHub
@dltHub 7 ай бұрын
@@MegaTarino That was exactly what we were looking to do, convey the info effectively in a short amount of time, and as with any class there's an expectation that the learner dedicates 8x the time themselves to practice and comprehend, and for this we provided the notebook and homework assignment. We barely got started covering the topic too, since it's so broad. We are working on a more comrehensive ppeline building course, but that will take 6 hours and be similarly packed, with the expectation that the student invests at least 24h on their own. Covering an entire domain in a few hours is quite efficient I would say, there's a ton of possible complexity around ELT
@blablabla-c5o
@blablabla-c5o 24 күн бұрын
i feel that for every 1 hour content, self study should be 4x at least. gotta self start ourselves
Introduction to Scaling Analytics Using DuckDB with Python
29:33
Bryan Cafferky
Рет қаралды 5 М.
Try this prank with your friends 😂 @karina-kola
00:18
Andrey Grechka
Рет қаралды 9 МЛН
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Building Production RAG Over Complex Documents
1:22:18
Databricks
Рет қаралды 25 М.
Open Source Spotlight - Kestra - Will Russell
34:37
DataTalksClub ⬛
Рет қаралды 1,1 М.
Design a Data Warehouse | System Design
14:08
Interview Pen
Рет қаралды 34 М.
Open-Source Spotlight -  dlt.sources.rest_api - Willi Müller
35:40
DataTalksClub ⬛
Рет қаралды 545
Power of #Duckdb with Postgres: pg_duckdb
1:00:20
The Geek Narrator
Рет қаралды 419