Do you Need a Data Warehouse?

  Рет қаралды 4,915

nullQueries

nullQueries

Күн бұрын

Пікірлер: 9
@alessandroceccarelli6889
@alessandroceccarelli6889 2 жыл бұрын
Thank you for your high-quality videos! In our use case, we ingest daily a .zip file containing 3 .csv’s related to sales, inventory and orders from different shops (20-30) and CRMs (4-5 ; each one with its own naming convention, dtypes, …). How would you improve the following pipeline? - Raw zip files are uploaded to a GCP bucket - The upload triggers a Python GCP Cloud function that transforms the data to create single naming/dtypes conventions and brief new columns (e.g. timestamp merging date + time) - Transformed data is uploaded to MongoDB - 3 separate collection for sales, inventory and orders - and raw .csv’s to a separate GCP bucket as parquet files (1 folder for each CRM and PoS as subfolder) - A PubSub message posted by the function triggers a GCP Function that loads processed data from MongoDB, applies ML models and stores results in separate collections (1 for each analysis type; e.g. forecast, anomaly detection, …) - A Python web app directly reads ML output data from MongoDB Thank you so much and love your videos; 🤗
@n.l.875
@n.l.875 2 жыл бұрын
Been a subscriber for a while, and I can't thank you enough for the quality of your channel. I have a request. This is an excellent video and highlights a key challenge of communicating to budgetary stakeholders that a solution may 'get something done' but will incur a considerable amount of 'technical debt'. You've treated other topics very well, and was wondering if you could do a video on Technical Debt. This is one of the least understood, but arguably the most important way to get people onboard with any technical change decision-making. I use a technical debt register that I give rough estimates on single-developer full time equivalent days for an ommitted task. The aim isn't accuracy, it is to get the conversation occurring when project managers or others are faced with huge bills because of their choices, rather than esoteric concepts like SCD maintainance.
@nullQueries
@nullQueries 2 жыл бұрын
I love talking about tech debt, I'll add it to the list of topics.
@antoruby
@antoruby 2 жыл бұрын
@@nullQueries you have weird tastes 😆 Just kidding, it’s such an important and overlooked topic that teams go back and forth in rewrites without ever understanding what’s really happening.
@yogoson8371
@yogoson8371 Жыл бұрын
Absolutely agree. Your points are spot on!
@chasedoe2594
@chasedoe2594 2 жыл бұрын
I'm just wondering, if Athena is good at joining table ? If they work on the OLTP, I guess they have to heavily join in order to get the expected results. Or they just retrieve the data and do the joins in Tableau which shouldn't be that fast isn't it.
@nullQueries
@nullQueries 2 жыл бұрын
Athena is mostly for querying files. So if the OLTP has a lot of joins like most do I wouldn't expect great performance compared to a relational database. Or another solution designed for complex data models. Athena is mostly used for ad-hoc querying of large file stores (ie: searching for data in log files)
@fishsauce7497
@fishsauce7497 8 ай бұрын
What many fail to realise is that a bad data warehouse is not just bad table structure, but also, low documentation, redundant calculations, unnecessarily complicated ETL (mostly tech debt). All of which make the warehouse unusable and difficult to maintain. I also see wrong approach when creating a warehouse e.g. just looking at existing reports to create a data model, no data profiling, no business study, no articulation of data loading rules, heavy on undocumented assumptions. Eventually the new shiny warehouse by modellers is also discarded by analysts as it is not fit for purpose, because the same mistake is repated again and again.
@poizentv
@poizentv Жыл бұрын
Hello. I hope you well. Is it possible to become a Data Warehouse developer without leaning any programming language?
Intro to the Data LakeHouse
4:49
nullQueries
Рет қаралды 6 М.
Avoid These Mistakes in Realistic Data Architectures
5:51
nullQueries
Рет қаралды 3,1 М.
Война Семей - ВСЕ СЕРИИ, 1 сезон (серии 1-20)
7:40:31
Семейные Сериалы
Рет қаралды 1,6 МЛН
Database Normalization Tutorial - Modeling 3NF for OLTP
7:53
nullQueries
Рет қаралды 10 М.
Don't Pick the Wrong Data Career
7:41
nullQueries
Рет қаралды 4,4 М.
Data Pipelines: How to make them better
6:10
nullQueries
Рет қаралды 4,6 М.
How I use SQL as a Data Analyst
15:30
Luke Barousse
Рет қаралды 859 М.
What is ETL | What is Data Warehouse | OLTP vs OLAP
8:07
codebasics
Рет қаралды 438 М.
What Does ETL Mean?  And How Does it Apply to Data Integration?
4:59
Should you switch to Snowflake?
4:54
nullQueries
Рет қаралды 21 М.
I ACED my Technical Interviews knowing these System Design Basics
9:41
Война Семей - ВСЕ СЕРИИ, 1 сезон (серии 1-20)
7:40:31
Семейные Сериалы
Рет қаралды 1,6 МЛН