Add raw "sources" to your dbt project

  Рет қаралды 79,438

Kahan Data Solutions

Kahan Data Solutions

Күн бұрын

Пікірлер: 38
@KahanDataSolutions
@KahanDataSolutions 3 жыл бұрын
Looking for help with your team's data strategy? → www.kahandatasolutions.com Looking to improve your data engineering skillset?→ bit.ly/more-kds
@HansHelmutKohls
@HansHelmutKohls 3 жыл бұрын
Really well structured and explained, this really is what dbt's own tutorials should be like.
@KahanDataSolutions
@KahanDataSolutions 3 жыл бұрын
Thank you!
@billal27nulnul
@billal27nulnul Жыл бұрын
Thanks for the video, i took over a DBT project and it seems that whoever I took over from saw this video also. Lucky for me
@krishnachinta8937
@krishnachinta8937 3 жыл бұрын
I got below error, cloud you please advice whats was it "depends on a source named 'snowflake_sample_data.store_sales' which was not found"
@MrChalak
@MrChalak 4 жыл бұрын
Great , I was searching for some dbt tutorial and came across your video. It's really informative 👍👍
@KahanDataSolutions
@KahanDataSolutions 4 жыл бұрын
Thank you! Glad you found it helpful.
@_saaskey
@_saaskey 4 жыл бұрын
This is great !! Keeping doing more of dbt with snowflake.. Ty!!
@KahanDataSolutions
@KahanDataSolutions 4 жыл бұрын
You are very welcome! I plan on it!
@adegbiteogunbode3567
@adegbiteogunbode3567 Жыл бұрын
Hello david, after following your process bit by bit , this error pop up after dbt run: (1:04:27 Running with dbt=1.5.0 11:04:28 Unable to do partial parsing because saved manifest not found. Starting full parse. 11:04:28 Encountered an error: Parsing Error Error reading adetola: staging/source.yml - Runtime Error Syntax error near line 4 ------------------------------ 1 | version: 2 2 | sources: 3 | -name: northwind 4 | database: analyticproject1 5 | schema: stg_northwind 6 | tables: 7 | -name: customer Raw Error: ------------------------------ mapping values are not allowed in this context in "", line 4, column 12 please kindly advice on how to solve this problem
@practicalhead
@practicalhead Жыл бұрын
Thanks for the clean explanations. How can we use dbtspark to read from and write to postgresql database schema instead of hive warehouse, is it possible to leverage spark distributed processing instead of native postgresql compute. Reason being, all transformations may not be enough within postgres servers, and will need to scalable.
@sunny25984
@sunny25984 2 жыл бұрын
Hi ,.... When I run the dbt after created snowflake_sample_data_store_sales,I get the below error 10:26:17 Running with dbt=1.1.0 10:26:18 Found 3 models, 4 tests, 0 snapshots, 0 analyses, 181 macros, 0 operations, 0 seed files, 1 source, 0 exposures, 0 metrics 10:26:18 10:26:22 Encountered an error: Runtime Error Database error while listing schemas in database "DEMO_DB" Database Error 002043 (02000): SQL compilation error: Object does not exist, or operation cannot be performed. Please help
@KahanDataSolutions
@KahanDataSolutions 2 жыл бұрын
Hi Sailesh. Based on the error, my guess is the role you're using for dbt doesn't have the right permissions for that DB. Check out your profiles.yml file to see what role you are using by default. Then go to Snowflake and make sure that role has permissions to perform actions on DEMO_DB and any underlying schemas you are trying to build. Hope that helps!
@pranavbharadwaj3350
@pranavbharadwaj3350 Жыл бұрын
Great work man . thanks !!
@KahanDataSolutions
@KahanDataSolutions Жыл бұрын
No problem!
@emrea6799
@emrea6799 3 жыл бұрын
thanks for the video, is really helpful! May be one little comment: It makes no sense to configure once again the source database in the source yaml file, since we can use only one db in a project and it is already defined in the profile yaml file.
@KahanDataSolutions
@KahanDataSolutions 3 жыл бұрын
Hi Emre, glad you found it helpful! Technically that is true if you always plan to source and deploy your projects within the same DB. However, often times there will be a separation. For example... You may have one database (or several) where your raw data is stored - think of it like a data lake. The pointers to these databases/schemas is what is configured as dbt sources (ex. what is in the sources.yml). But...then you actually deploy your transformed models to a different database completely. That deployment database is what would be set in your profiles.yml on the project level. Note that this could also be dynamically changed by changing the +database model configuration. Here is a good article about this design w/ dbt - blog.getdbt.com/how-we-configure-snowflake/
@emrea6799
@emrea6799 3 жыл бұрын
@@KahanDataSolutions Hi, it would be really nice to have that what you describe. Than we would have an ELT tool, and that would be great ! But the reality is: DBT is only a "T"(Transform) Tool. It simply does not support multiple databases in a project, so it is sufficient only to specify the database with its credentials in the profile file. You can still overwrite the that in the project YAML files but after compiling you should still need to have only one host and one db referenced across your project. Bottom line: If your source data for your project is not in your target db, you need to load it (with a tool like Stitch) (By the way I tried to connect to 2 db's and got "cross-database references are not implemented" error in Postgre :)
@KahanDataSolutions
@KahanDataSolutions 3 жыл бұрын
@@emrea6799 Ah I see... didn't realize you were using dbt directly on Postres. Other DB platforms do support cross db calls (such as Snowflake). Sounds like you may only need one db reference in your case if you are unable do that.
@emrea6799
@emrea6799 3 жыл бұрын
​@@KahanDataSolutions Ok I see, you mean on the same Snowflake platform allows you can query the db's underneath as it is in the video. But the other technologies do not allow that, not only Postgre. Besides normally you have a lot of use cases with cross platform calls. And for that case even Snowflake is not an option, you need an etl tool..
@abhinavjain1056
@abhinavjain1056 3 жыл бұрын
Hi, Is there any way to reference user defined functions, just like we add tables in the source files.
@KahanDataSolutions
@KahanDataSolutions 3 жыл бұрын
I think the closest thing in dbt to a custom user defined function would be the concept of a "macro". Here is some documentation - docs.getdbt.com/docs/building-a-dbt-project/jinja-macros/ With macros you can create some custom logic, take inputs and reference it in any of your models as needed. Hope this helps!
@megabyte4710
@megabyte4710 3 жыл бұрын
The series of this video is very helpful. Thanks for the video. I have a question, using templates can we loop over the multiple tables under same schema? If yes pleases suggest how can we do that. Thank you.
@KahanDataSolutions
@KahanDataSolutions 3 жыл бұрын
Glad you found it helpful! For the looping question, what are you trying to accomplish with each loop? Depending on your goals, you could create a macro, leverage the information schema to set a variable that holds all tables in a particular schema, and then use a for loop to loop through each table. Within that loop you could execute whatever logic you wanted on the particular table.
@TubeDirektor
@TubeDirektor 4 жыл бұрын
Unfortunately could not get through this one. Followed instructions as is, but get an error on the execution of the sql file Object 'DEMO_DB.PUBLIC.STORESALESDATA' does not exist or not authorized. compiled SQL at target un\dbt_demo\models\snowflake_sample_data\snowflake_sample_data__StoreSales.sql Any pointers on what could be going on?
@KahanDataSolutions
@KahanDataSolutions 4 жыл бұрын
When you look at your profiles.yml file, are you set up to run as a user who has permissions to "create views"? In the previous video, I used my default user who is assigned the ACCOUNTADMIN role, which has universal permissions.
@TubeDirektor
@TubeDirektor 4 жыл бұрын
@@KahanDataSolutions Yes the sample models were created successfully both the table and view. that is why I was confused as to why this failed.
@KahanDataSolutions
@KahanDataSolutions 4 жыл бұрын
If you go to file in the "target/" folder that it shows in the error, you can find the actual SQL being executed. Try running that in Snowflake directly and see if you can troubleshoot from there. Perhaps there is a typo somewhere or try running as a differnt user.
@TubeDirektor
@TubeDirektor 4 жыл бұрын
@@KahanDataSolutions that worked, it was indeed a typo :-) but the error msg was misleading. Thanks for the guidance, very much appreciate it. 19:10:44 | 2 of 3 OK created view model PUBLIC.snowflake_sample_data__StoreSales [SUCCESS 1 in 3.00s]
@KahanDataSolutions
@KahanDataSolutions 4 жыл бұрын
You bet! Glad it all worked out.
@prudhvik5603
@prudhvik5603 Жыл бұрын
Hey, great explanation! Will the new model inherit all the constraints from source data?
@imohammd2003
@imohammd2003 2 жыл бұрын
Thanks for the great video. One quick question. For the target database(dev or prod), you have specified the connection info in profiles.yml. But for the source database, where do we specify the connection information? Thanks!
@KahanDataSolutions
@KahanDataSolutions 2 жыл бұрын
Thanks for watching! You can set that info in the yml file where you declare your sources. You can set "database", "schema" as properties which will then fully qualify the name of the source table when you use the {{ source() }} function in queries. Here is the documentation on the source properties: docs.getdbt.com/reference/source-properties
@surjeetkarmakar3188
@surjeetkarmakar3188 2 жыл бұрын
Hi, don't we need to put the model details along with source in the schema.yml file.
@leiaduva
@leiaduva 2 жыл бұрын
From previous videos, I had renamed the models\example folder to \staging. I then created a new folder under models like you did in this current video. However, when I run a dbt run, it says I have 3 models but it only runs the "my_first_dbt_model" and "my_second_dbt_mdoel" It doesn't run my new one under snowflake_sample_data. I have the correct permissions. I thought it might be because I had been trying out the materialized settings like you did in your previous videos in the dbt_project.yml. However, I've tried every combination I can & even commented it out. models: Course_Instance: # Config indicated by + and applies to all files under models/staging/ staging: +materialized: table +schema: staging snowflake_sample_data: +materialized: view +schema: staging
@KahanDataSolutions
@KahanDataSolutions 2 жыл бұрын
Can you share what the models/ directory looks like in your project?
@leiaduva
@leiaduva 2 жыл бұрын
I can't figure out how to put a screenshot here, but I started the entire project over & I'm trying new. In File Explorer, it's Documents\SnowflakeDemo\models with a subfolder for example and a subfolder for snowflake_sample_data. It's just like yours. I didn't touch the dbt_project.yml file, so it still has models: SnowflakeDemo: # Config indicated by + and applies to all files under models/example/ example: +materialized: view I tried changing it to this: models: SnowflakeDemo: # Config indicated by + and applies to all files under models/example/ example: +materialized: view snowflake_sample_data: +materialized: view That didn't help. When I do dbt run, it says it's found 3 models but it only builds 2.
@leiaduva
@leiaduva 2 жыл бұрын
@@KahanDataSolutions I finally got it to deploy, but I had to delete the snowflake_sample_data subfolder completely. I took the source schema out & pasted it into the source schema with the sample models and I make the snowflake_sample_data__store_sales.sql reside at the same level as the samples. That's the only way it would build. Not sure what is different.
Handle JSON with dbt // jinja & macros
16:56
Kahan Data Solutions
Рет қаралды 27 М.
How to use dbt Snapshots to track data history
10:54
Kahan Data Solutions
Рет қаралды 16 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 71 МЛН
Что-что Мурсдей говорит? 💭 #симбочка #симба #мурсдей
00:19
dbt(Data Build Tool) crash course for beginners: Zero to Hero
1:23:49
5 Tips to Improve Your dbt Project
8:13
Kahan Data Solutions
Рет қаралды 54 М.
Intro to dbt Cloud // Use Free for Life
13:19
Kahan Data Solutions
Рет қаралды 15 М.
Why use DuckDB in your data pipelines ft. Niels Claeys
22:26
MotherDuck
Рет қаралды 23 М.
DBT Tutorial : Everything you need to know about Sources and Models
16:37
What are "intermediate" models in dbt?
8:27
Kahan Data Solutions
Рет қаралды 5 М.
How to treat Acne💉
00:31
ISSEI / いっせい
Рет қаралды 71 МЛН