BI Insights Inc

BI Insights Inc

Hi, I am Haq, a Data Analyst and Engineer. On this channel I talk/share tips and tricks on Business Intelligence tools, Data, Data Warehouse and Source Control. I have been involved in many Data Analytics and Process Automation projects. I am passionate about finding new ways and techniques to enhance the Data Analysis process.

On this channel I create tutorials for people who:
• are looking for a way to learn and use the power of Python in their day-to-day work.
• are looking to automate Data Analysis processes using PowerBI, Cognos and Python
• want to learn Data Analysis & Data Science to perform impactful analysis.
• are working with Data and want to find new ways to of doing things
• are looking to use Source Control to house their code

If this sounds exciting to you, consider subscribing and turning on the notifications, so you don’t miss any content.

Happy Coding everyone. May the power of the code be with you!

how to use dbt snapshot to manage slowly changing dimensions

7:26

how to use dbt snapshot to manage slowly changing dimensions

14 күн бұрын

Run AI Locally fast & secure on your machine with GPU upgrade | Host your AI

4:01

Run AI Locally fast & secure on your machine with GPU upgrade | Host your AI

28 күн бұрын

NLP & AI database integration | Get Insights from database using NLP | Chat with database | AI | NLP

8:31

NLP & AI database integration | Get Insights from database using NLP | Chat with database | AI | NLP

Ай бұрын

How to integrate api data using python & dlt | API | Data Load Tool | ETL | Python | Postgres

4:05

How to integrate api data using python & dlt | API | Data Load Tool | ETL | Python | Postgres

Ай бұрын

We put Pandas, Polars, and DuckDB to the test! which comes out on top? | Data Tools

3:44

We put Pandas, Polars, and DuckDB to the test! which comes out on top? | Data Tools

2 ай бұрын

Transform data lake to data lakehouse using Apache Iceberg | Real time ETL | Kafka | Data Lake

7:20

Transform data lake to data lakehouse using Apache Iceberg | Real time ETL | Kafka | Data Lake

2 ай бұрын

Configure & run Elementary data test in dbt | Data Tests | DBT Tests | Data Observability | P2

3:18

Configure & run Elementary data test in dbt | Data Tests | DBT Tests | Data Observability | P2

2 ай бұрын

Real time ETL: Integrate Kafka Data Stream with a Data Lake | Kafka | Data Stream | Data Lake

9:17

Real time ETL: Integrate Kafka Data Stream with a Data Lake | Kafka | Data Stream | Data Lake

2 ай бұрын

How to integrtae Pandas AI with local LLM using Ollama? | Private & Free | Ollama | AI | llama3

4:15

How to integrtae Pandas AI with local LLM using Ollama? | Private & Free | Ollama | AI | llama3

3 ай бұрын

Exciting new data observability tool for dbt | dbt native | Monitor dbt runs | data quality | P1

5:13

Exciting new data observability tool for dbt | dbt native | Monitor dbt runs | data quality | P1

3 ай бұрын

How to add dbt seed to dbt project & load reference data | dbt seeds | load data to Datawarehouse

4:20

How to add dbt seed to dbt project & load reference data | dbt seeds | load data to Datawarehouse

3 ай бұрын

ETL Incremental Data Load Approach Using DLT | Source Change Detection | Load New & Change Data

5:11

ETL Incremental Data Load Approach Using DLT | Source Change Detection | Load New & Change Data

3 ай бұрын

How to perform ETL Incremental Data Load using DLT | Data Load Tool | ETL | Python

6:13

How to perform ETL Incremental Data Load using DLT | Data Load Tool | ETL | Python

4 ай бұрын

How to connect to Postgres using Python | Query SQL Database | Pandas | Postgres

3:40

How to connect to Postgres using Python | Query SQL Database | Pandas | Postgres

4 ай бұрын

data load tool (dlt) build database data pipeline | verified source | data pipeline | etl | Python

7:37

data load tool (dlt) build database data pipeline | verified source | data pipeline | etl | Python

4 ай бұрын

how to build data pipelines with data load tool (dlt) | data pipeline | etl | Python

5:51

how to build data pipelines with data load tool (dlt) | data pipeline | etl | Python

5 ай бұрын

How to integrate Great Expecation Data Quality tests in Airflow? | Data pipeline | Data Quality

4:29

How to integrate Great Expecation Data Quality tests in Airflow? | Data pipeline | Data Quality

5 ай бұрын

Build custom private chatgpt that produces SQL | Custom LLM | Open Source LLM | Create Custom Model

9:24

Build custom private chatgpt that produces SQL | Custom LLM | Open Source LLM | Create Custom Model

5 ай бұрын

How to run LLM Locally? | Integrate LLM in your APP | Build with LLM | Ollama | Streamlit

5:04

How to run LLM Locally? | Integrate LLM in your APP | Build with LLM | Ollama | Streamlit

6 ай бұрын

How to create Great Epxectations suite? Quality Checks for Data Pipelines | Data Quality

8:36

How to create Great Epxectations suite? Quality Checks for Data Pipelines | Data Quality

6 ай бұрын

How to navigate the channel and find content on this channel? | Channel's Website |

3:01

How to navigate the channel and find content on this channel? | Channel's Website |

6 ай бұрын

Polars a multi-threaded lightning fast Python Library | The next Big Python Data Science Library

5:33

Polars a multi-threaded lightning fast Python Library | The next Big Python Data Science Library

7 ай бұрын

Data Lakehouse workflow Apache Iceberg and Nessie | How Iceberg works | Nessie Branch & Merge

10:35

Data Lakehouse workflow Apache Iceberg and Nessie | How Iceberg works | Nessie Branch & Merge

8 ай бұрын

Create on premise Data Lakehouse with Apache Iceberg | Nessie | MinIO | Lakehouse

10:44

Create on premise Data Lakehouse with Apache Iceberg | Nessie | MinIO | Lakehouse

9 ай бұрын

Orchestrate Airbyte & dbt with Dagster | Orchestrate Modern Data Stack with Dagster | Airbyte | dbt

14:08

Orchestrate Airbyte & dbt with Dagster | Orchestrate Modern Data Stack with Dagster | Airbyte | dbt

9 ай бұрын

dbt Power User extension for VS Code | Accelerate your dbt development like Pros | dbt

7:46

dbt Power User extension for VS Code | Accelerate your dbt development like Pros | dbt

10 ай бұрын

Kafka Real-Time data analysis with Streamlit | Kafka | Data Streaming | Clickhouse | Real-Time

7:28

Kafka Real-Time data analysis with Streamlit | Kafka | Data Streaming | Clickhouse | Real-Time

11 ай бұрын

Set up Clickhouse database for Kafka Streaming | Data Steraming | OLAP database | Clickhouse

7:07

Set up Clickhouse database for Kafka Streaming | Data Steraming | OLAP database | Clickhouse

11 ай бұрын

Dagster Orchestrate Jupyter Notebook | Jupyter Notebook | Schedule Notebooks with Dagster

7:58

Dagster Orchestrate Jupyter Notebook | Jupyter Notebook | Schedule Notebooks with Dagster

11 ай бұрын

Пікірлер

@jedodinho Күн бұрын

Tenho uma duvida. Na pasta do meu projeto eu criei uma venv com uma versão do python e pandas bem especifica, e eu preciso executar esse codigo nesta venv. Eu não entendi como fazer o lab identificarr a venv e executar o codigo... Usando o agendador do windows para codigos .py por exemplo, eu determino o caminho do python que eu vou usarr, assim podendo ter o ambiente correto. é possivel fazer isso no lab para os meus arquivos .ipynb ?

@BiInsightsInc Күн бұрын

Yes, you can schedule and run you Notebooks (with ipynb extentions) with the Jupyter Lab scheduler. You cannot run the ipynb via windows task scheduler. You can use the jupyter scheduler for Notebooks (ipynb) files. jupyter-scheduler.readthedocs.io/en/latest/

@honzajazz 3 күн бұрын

You should also uncomment (remove #) the row with wal_level key to take settings effect.

@afk4dyz 7 күн бұрын

Password as an environment variable is an absolute game changer.

@11folders 7 күн бұрын

Excellent Demo! Is there a way to specify more granular details about the source? For instance, I may want to specify the page or section heading for a source, especially if I want to cite it as a reference.

@BiInsightsInc 7 күн бұрын

Thanks. I think cou can print the page number if that's part of the metadata stored in the vector db along with the chunk identifier. You'd need to extract/store that information while preparing your source data.

@mdanowarhossain3802

@mdanowarhossain3802 7 күн бұрын

Where is the .pkl files?

@BiInsightsInc 7 күн бұрын

You save your trained model as a pickle file. Here is the link to the Notebook where we save the model to disk as a picke file. github.com/hnawaz007/pythondataanalysis/blob/main/ML/Email%20Spam%20Classifier.ipynb

@ashwinkumar5223

@ashwinkumar5223 8 күн бұрын

I mean we use website instead of pdf file

@BiInsightsInc 8 күн бұрын

Yes, you can use website as a source. You'd need a web scrapper component to scrape data from the web and format it for your RAG app. Here is article on this subject: medium.com/@iankelk/rag-detective-retrieval-augmented-generation-with-website-data-5a748b063040

@ashwinkumar5223

@ashwinkumar5223 8 күн бұрын

Can we do RAG with one website to interact

@BiInsightsInc 8 күн бұрын

Yes, we can use a website as a data source.

@Croat955 9 күн бұрын

great video it helped me a lot

@huyvu4741 9 күн бұрын

free?

@BiInsightsInc 9 күн бұрын

Yes, there is an open source version!

@ivanl7786 13 күн бұрын

Hello! That's a great explanation, thanks! Please tell me how the data transfer is carried out? Are we using the RAM of the server where Python is installed or are we using the RAM of the server where PostgreSQL is installed? I want to understand if this scenario is suitable if there is a table with 30 million rows on the SQL Server side?

@BiInsightsInc 12 күн бұрын

Hey there, in this use we utilized Pandas and it loads the data in memory of the server where Python is installed. So you would need to make sure either data fits in the server's memory/load in batches or use the chunking strategy to load your data. Hope this helps.

@thecarlostheory

@thecarlostheory 15 күн бұрын

Thank u for the help! Very useful! Leave private models!

@user-ke8lb9fu5h

@user-ke8lb9fu5h 16 күн бұрын

Thank you sir, you helped me understand airflow, and I did the same thing following the same process but from mysql - extract-load -> transformation -> load with free employees database and I did share it on my github and linkedin tagging this video.

@larsh8560 17 күн бұрын

Quite cool. Is this just "magic" of DBT or is this normal in other tools also? (asking as someone who remembers SCDs as painful in SSIS when I worked with it years ago)

@BiInsightsInc 17 күн бұрын

This is the dbt "magic". In other tools you need to go through multiple steps and sometimes need custom code to achieve this.

@mahraneabid 17 күн бұрын

hi sir the edited model cant be seen by ollama, when I call ollama list in CMD its display only the ollama3.1, why?

@BiInsightsInc 12 күн бұрын

If you do not see the custom model in your ollama ecosystem then check the model file to make it's correct. Here is an example of the custom model file from openwebui. openwebui.com/m/darkstorm2150/Data-Scientist:latest

@mahraneabid 17 күн бұрын

when he said "would you like me to break down the sales by product" and you responded with yes will do the action that he mention or will not?

@BiInsightsInc 12 күн бұрын

It may work if the SQL model is able to generate sql for the question. You can try it and let us know if this extended option works.

@rafaelg8238 18 күн бұрын

Great video, tks.

@BiInsightsInc 18 күн бұрын

Glad you liked it!

@hashimraza422 19 күн бұрын

HI Boss - Thanks for sharing this, can you help regarding this, we are looking exactly this for our Prod and Dev environment.

@BiInsightsInc Күн бұрын

Hi there, you can shoot me an email here: [email protected] We can discuss what are you requirements and take it from there.

@HashimRaza-k4k

@HashimRaza-k4k 19 күн бұрын

Hi Boss - Thanks for your time and effort, i need to know this process , can you help in this regard

@abhisheksaini5563

@abhisheksaini5563 21 күн бұрын

I have connect with on prem sql server using cloud composer then at cloud composer how do i install the driver

@BiInsightsInc 18 күн бұрын

Hi there, the GCP Composer Worker's Pod image runs on ubuntu you can install the the drivers on the on the composer image. I haven't tried it personally but here is a link to SQL Server driver install: stackoverflow.com/questions/60346440/google-composer-how-do-i-install-microsoft-sql-server-odbc-drivers-on-environme

@laophan4591 24 күн бұрын

I 'd like to create a custom folder to keep the csv on the host and link with the airflow image. I find that practice called mount volume by adding: - ${AIRFLOW_PROJ_DIR:-.}/custom_folder:/opt/airflow/custom_folder => but after that I check by run ls-a command the terminal not show the folder, could you please help ?

@BiInsightsInc 23 күн бұрын

You can mount a directory to the Airflow image under volume section. Make sure your directory exists in the location where your docker file is located. In the following example we mount the dags from the local folder to Airflow image. volumes: - ./dags:/opt/airflow/dags

@penishilton6940

@penishilton6940 25 күн бұрын

ty king

@adilsaju 26 күн бұрын

AMAZING

@ambernaz6793 27 күн бұрын

Hi Nawaz, could you please guide me if I want to load data to Power BI. How the code will be different. I am new in this field and I am learning ETL and data pipelining. Thank you.

@BiInsightsInc 27 күн бұрын

Hi there, you load the data in storage layer i.e. flat files, database, datalake. So you can use any of the pipelines to load data in a storage layer. Power BI reads data from the storage layer. I have a Power BI series. Here is the link: kzbin.info/aero/PLaz3Ms051BAnnlZfFxXs3ezSVM54OlYBr

@ambernaz6793 27 күн бұрын

@@BiInsightsInc Thank you

@michaelaustin1638

@michaelaustin1638 27 күн бұрын

Awesome video! How did you get the various categories when creating a model?

@BiInsightsInc 27 күн бұрын

Thanks. Those are defaults in the OpenWebUI. You can select relevant categories for a custom model.

@ankit7918 28 күн бұрын

What laptop specification is best to run ml model. Please give some laptop recommendations.

@BiInsightsInc 27 күн бұрын

You can pick up a Lenovo laptop that has atleast 12Gb VRAM and four RAM slots upgradable to 128Gb RAM and 3 NVMe slots. Here is one that fits the descirption: www.lenovo.com/us/en/p/laptops/thinkpad/thinkpadp/thinkpad-p16-gen-2-(16-inch-intel)/21fa0027us

@user-kk8xf1jc8h

@user-kk8xf1jc8h 28 күн бұрын

i got this error- Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etlnew'. (18456) (SQLDriverConnect); [28000] [Microsoft][ODBC Driver Manager] Invalid connection string attribute (0); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etlnew'. (18456)") even tho i have changed Server Authentication too

@BiInsightsInc 28 күн бұрын

Check your connection string to make sure it is valid. Also, make sure your credentials are valid and you can connect to database via SQL Server management studio or DBeaver. Enable Mix Mode authentication if you haven't already. SQL management studio, go to > Properties > Security > Server Authentication, and under that check the option "SQL Server and Windows Authentication mode"

@laophan4591 28 күн бұрын

appreciate your video, I wish I knew it earlier!

@jeanchindeko5477

@jeanchindeko5477 29 күн бұрын

3:18 Apache Iceberg has ACID transactions out of the box, and it’s not Nessie which brings ACID transactions to Iceberg. In Iceberg specification the catalog only has knowledge of the list of snapshots, and the catalog doesn’t track the list of individual files part of commit or snapshots.

@Pattypatpat7122

@Pattypatpat7122 Ай бұрын

This was great, much easier on my Windows machine than my Linux machine for a change. Just a question, your table definitions in the video for AdventureWorks don't appear to be the same as the available ones on the Microsoft site for versions 2019 or 2022. I created some dummy tables based on the same table definitions in your GitHub, but obviously my dummy data doesn't relate, so I can't properly test if this model is properly generating the correct SQL. Do you have a link to the database you were using?

@BiInsightsInc 28 күн бұрын

Hi the database used in the video is developed on top of Adventureworks 2019. Here the link to repo and the series. github.com/hnawaz007/dbt-dw hnawaz007.github.io/mds.html

@aniketrele7688

@aniketrele7688 Ай бұрын

Hi, Is the connector name and topic name always same? Can you name your ropic something else? If you want to have multiple topic for 1 connector then it will be helpful. Thanks in advance.

@BiInsightsInc Ай бұрын

Hi there, no your connector name can be different than your topic name. You can have multiple connectors read from the same topic.

@junaidmalik660

@junaidmalik660 Ай бұрын

thanks a lot for the detailed video, i want to ask about he accuracy of results? is it accurate or not for big datasets

@BiInsightsInc Ай бұрын

The results are good on various data seizes. However, you should be careful with the data size. PandasAI uses a generative AI model to understand and interpret natural language queries. The model has a token limit and if your data exceeds that limit then it won’t be able to process your request.

@tiagovianez8482

@tiagovianez8482 Ай бұрын

Teacher, where is the source of this data? I would like to insert them into my database. In my case I will insert it into PostgreSQL, run the ETL and write it to s3. Could you provide me with the source?

@BiInsightsInc Ай бұрын

Hi the data source is a MS SQL Server sample database called Adventureworks. You can download and restore it. I have a tutorial on how to install SQL Server and restore this database. Here is a link: kzbin.info/www/bejne/m2bQp6KBqrtmrtU

@krishnarajuyoutube

@krishnarajuyoutube Ай бұрын

can we run llama 3 locally on any simple VPS Server, or do we need GPUS ?

@BiInsightsInc Ай бұрын

Hi you'd need a gpu to run llm. By the way VPS servers can have GPUs.

@diwaspoudel7 Ай бұрын

Hi there do you have dockeryml file containing mssql connection

@BiInsightsInc Ай бұрын

Yes, I have done a video on it where I install additional sql server providers and connect to sql server. Here is the link: kzbin.info/www/bejne/qmXLZampirGqfKc&lc=UgxQFElBNgK2dwKo5kV4AaABAg

@mohdmuqtadar8538

@mohdmuqtadar8538 Ай бұрын

Great video What if the response from database exhaustes the context window of the model.

@BiInsightsInc Ай бұрын

Thanks. If you are encountering model's maximum context length then you can try the following. 1. Choose a different LLM that supports a larger context window. 2. Brute Force Chunk the document, and extract content from each chunk. 3. RAG Chunk the document, only extract content from a subset of chunks that look “relevant”. Here an example of these from LangChain. js.langchain.com/v0.1/docs/use_cases/extraction/how_to/handle_long_text/

@GordonShamway1984

@GordonShamway1984 Ай бұрын

Wonderful as always and just in time. Was going to build a similar use case that auto generates database docs for business users next week. This comes in handy🎉 Thank you again and again

@BiInsightsInc Ай бұрын

Glad it was helpful! Happy coding.

@KevinHa-wg8qv Ай бұрын

Hi. I encountered this error when trying to add debezium connector via api call. Would you please help? Thanks. Failed testing connection for jdbc:postgresql://localhost:5432/AdventureWorks with user 'etl': Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections. [io.debezium.connector.postgresql.PostgresConnector]

@ryanschraeder8681

@ryanschraeder8681 Ай бұрын

What happens if you kill the airflow web server, or localhost? Will the DAG still run on the schedule you specified?

@BiInsightsInc Ай бұрын

If the services are down then DAG won’t run. You want to make sure your server remains on for the DAG to execute on schedule.

@gustavoleo Ай бұрын

Namaste Haq !!! Thank you so much for making this video!, and also sharing your repo, I'm bit confused how you build the connection string. would you mind to share it? UI had checked you Connect to SQL Server with Python notebook also, but didn't realize what's is not correct on my ConnectionStringCredentials()!

@BiInsightsInc Ай бұрын

Thanks. The connection strings defined in the secrets.toml file. I have covered it in the initial videos. You can watch them here. kzbin.info/www/bejne/r2rSoHyAbNacmLc kzbin.info/www/bejne/pJrQnWR7qLKsb9E

@dltHub Ай бұрын

❤ Thank you for this amazing video!

@rafaelg8238 Ай бұрын

Great video 👏🏻

@cvarak3 Ай бұрын

Hi, would you suggest this method to extract data from an active postgres table that has ~5billion rows? If not do you have any videos on what method you would suggest to extract from postgres to s3? Thanks! (Tried with airbyte but keeps failing)

@BiInsightsInc Ай бұрын

Hi, if you have a kafka cluster running then you can stream data from Postgres to Kafka. A cluster can handle large dataset. You can stand up your own or utilize confluent cloud. Once this set up is in place then configure an S3 sink connector. I have covered that in the following vidoe. kzbin.info/www/bejne/oJDHdoimi56medE

@danielvoss2483

@danielvoss2483 2 ай бұрын

Great job, keep going 👍

@coolkillakhan 2 ай бұрын

i lover uoui