DataSpark

21:25

Handle or Fill Null Values Using PySpark Dynamically | Real Time Scenario | #pyspark #dataengineers

Ай бұрын

30:13

ADF UNTIL ACTIVITY || REAL TIME SCENARIO || VERIFY COUNT OF RECORDS || #azuredatafactory

Ай бұрын

24:17

Lakehouse Arch || DWH v/s DATALAKE v/s DELTALAKE || #dataengineering #databricks

2 ай бұрын

31:46

ADF COPYDATACTIVITY || Copy Behavior || Quote & Escape Characters || Hands On || #dataengineering

2 ай бұрын

23:17

Data Validation with Pyspark || Rename columns Dynamically ||Real Time Scenario

3 ай бұрын

32:44

Pyspark with YAML File || Part-3 || Real Time Scenario || #pyspark #python #interviewquestions

4 ай бұрын

15:30

Pyspark with YAML File || Part-2 || Real Time Scenario || #pyspark #python #interviewquestions

5 ай бұрын

33:32

Pyspark with YAML file || Part-1 || Pyspark Real Time Scenario || #pyspark

5 ай бұрын

32:12

Data Insertion in DimDate || Using SQL || Real Time Scenario

5 ай бұрын

22:54

Data Validation using pyspark || Handle Unexpected Records || Real Time Scenario ||

5 ай бұрын

29:49

Data Validations using Pyspark || Filtering Duplicate Records || Real Time Scenarios

6 ай бұрын

24:18

Data Validation Using Pyspark || ColumnPositionComparision ||

6 ай бұрын

28:44

Data Validation with Pyspark || Schema Comparison || Dynamically || Real Time Scenario

7 ай бұрын

37:34

Data Validation with Pyspark || Real Time Scenario

7 ай бұрын

42:35

Implementing Pyspark Real Time Application || End-to-End Project || Part-5 || HiveTable ||MYSQL

Жыл бұрын

47:58

Introduction to Spark [Part-1] || Spark Architecture || How does it works internally !!

Жыл бұрын

1:09:10

Implementing Pyspark Real Time Application || End-to-End Project || Part-4

Жыл бұрын

33:28

Implementing Pyspark Real Time Application || End-to-End Project || Part-3||

Жыл бұрын

50:30

Implementing Pyspark Real Time Application || End-to-End Project || Part-2

Жыл бұрын

1:27:21

Implementing Pyspark Real Time Application || End-to-End Project || Part-1

Жыл бұрын

39:59

Implementing SCD-Type2 in ADF||Part2-Updated

Жыл бұрын

59:01

Implementing SCD-Type2 in Azure Data Factory Dynamically ||Part-1

Жыл бұрын

40:22

Implementing FBS Project with Azure Part-3 ||Azure Data Engineer End-To End Project

Жыл бұрын

52:11

Implementing FBS Project with Azure Part-2 ||Azure Data Engineer End-To End Project

Жыл бұрын

1:03:08

Implementing FBS Project with Azure Part-1 ||Azure Data Engineer End-To End Project

Жыл бұрын

22:09

Excel Multiple Sheets to Azure SQL Dynamically || Using Azure Data Factory || Data Factory Pipelines

Жыл бұрын

34:24

Full Load Data Pipeline Using Azure Data Factory Part 1 || Azure Data Factory || Data Engineering

Жыл бұрын

40:27

Incremental Data Loading Part - 2 || For Multiple Tables Using Azure Data Factory

Жыл бұрын

36:55

Incremental Data Loading Part 1 || For Single Table Using Azure Data Factory

Жыл бұрын

Пікірлер

@sainadhvenkata 7 күн бұрын

@dataspark Could you please provide those data links again because those link got expired

@tejathunder 19 күн бұрын

sir, please upload continuation for this project.

@samar8136 Ай бұрын

Now it is possible to save delta table with column name having spaces: Rename and drop columns with Delta Lake column mapping

@DataSpark45 Ай бұрын

Renamed columns or removed the spaces and then created

@shaasif Ай бұрын

thank you so much for your real time project explanation on 5 parts it's really awesome..can you please upload remaining multiple files and file name concept video

@DataSpark45 Ай бұрын

Hi actually that concept covered in the Data validation playlist. By creating metadata files. Thanks

@shaasif Ай бұрын

@@DataSpark45 can you share you email id i want to communicate with you

@amandoshi5803 Ай бұрын

source code ?

@DataSpark45 Ай бұрын

def SchemaComparision(controldf, spsession, refdf): try: #iterate controldf and get the filename and filepath for x in controldf.collect(): filename = x['filename'] #print(filename) filepath = x['filepath'] #print(filepath) #define the dataframes from the filepaths print("Data frame is creating for {} or {}".format(filepath, filename)) dfs = spsession.read.format('csv').option('header', True).option('inferSchema', True).load(filepath) print("DF Created for {} or {}".format(filepath, filename)) ref_filter = refdf.filter(col('SrcFileName') == filename) for x in ref_filter.collect(): columnNames = x['SrcColumns'] refTypes = x['SrcColumnType'] #print(columnNames) columnNamesList = [x.strip().lower() for x in columnNames.split(",")] refTypesList = [x.strip().lower() for x in refTypes.split(",")] #print(refTypesList) dfsTypes = dfs.schema[columnNames].dataType.simpleString() #StringType() : string , IntergerType() : int dfsTypesList = [x.strip().lower() for x in dfsTypes.split(",")] # columnName : Row id, DataFrameType : int, reftype: int missmatchedcolumns = [(col_name, df_types, ref_types) for (col_name, df_types, ref_types) in zip(columnNamesList, dfsTypesList, refTypesList) if dfsTypesList != refTypesList] if missmatchedcolumns : print("schema comparision has been failed or missmatched for this {}".format(filename)) for col_name, df_types, ref_types in missmatchedcolumns: print(f"columnName : {col_name}, DataFrameType : {df_types}, referenceType : {ref_types}") else: print("Schema comaprision is done and success for {}".format(filename)) except Exception as e: print("An error occured : ", str(e)) return False

@maheswariramadasu1301 Ай бұрын

Highly underrated channel and need more videos

@ArabindaMohapatra Ай бұрын

I just started watching this playlist. I'm hoping to learn how to deal with schema-related issues in real time.Thanks

@DataSpark45 Ай бұрын

Thanks a million bro

@gregt7725 2 ай бұрын

That is great - but how to handle deletion from Source ? Actually I do not understand why after sucessful changes/inserts - any deleletion of sorurce (e.g row number 2) creates duplicated rows of previously changed records.( last_updated_date do it - but why )

@DataSpark45 2 ай бұрын

Hi, can you please share the details or pic where u have the doubt

@erwinfrerick3891 2 ай бұрын

Great explain, very clearly, this video very helpfull for me

@DataSpark45 2 ай бұрын

Glad to hear that!

@ChetanSharma-oy4ge 2 ай бұрын

how can i find this code? is there any repo where you have uploaded it.?

@DataSpark45 2 ай бұрын

Sorry to say this bro , unfortunately we lost those files

@waseemMohammad-qx7ix 2 ай бұрын

thank you for making this project it has helped me a lot.

@maheswariramadasu1301 2 ай бұрын

This video really help me because tomorrow I have to explain about this topics I am searching in the KZbin for the best explanation.This video helps me to know about from scratch

@mohitupadhayay1439 2 ай бұрын

Amazing content. Keep a playlist for Real time scenarios for Industry.

@mohitupadhayay1439 2 ай бұрын

Very underrated channel!

@DataSpark45 2 ай бұрын

Thanks a million bro

@ajaykiranchundi9979 2 ай бұрын

Very helpful! Thank you

@shahnawazahmed7474 2 ай бұрын

I'm looking for ADF training will u provide that! how can I contact you?Thanks

@DataSpark45 2 ай бұрын

Hi, u can contact me through LinkedIn Lokeswar Reddy Valluru

@MuzicForSoul 2 ай бұрын

sir, can you please also show us the run failing, you are only showing passing case, when I tested by swaping the columns in dataframe it is still not failing because the set still have them in same order.

@DataSpark45 2 ай бұрын

Set values will come from reference df .so it always a constant one

@pranaykumar581 2 ай бұрын

Can you provide me the source data file?

@DataSpark45 2 ай бұрын

Hi in the description i provided the link bro

@MuzicForSoul 2 ай бұрын

why we have to do ColumnPositionComparision? shouldn't the column name comparison you did earlier catch this?

@irfanzain8086 3 ай бұрын

Bro, thanks a lot! Great explaination 👍 Can you share part 2?

@vamshimerugu6184 3 ай бұрын

Sir Can you make a video on how to connect adls to DataBricks using Service principle

@DataSpark45 3 ай бұрын

Thanks for asking, will do that one for sure .

@rohilarohi 3 ай бұрын

This video helped me a lot.hope we can expect more real time scenarios like this

@SuprajaGSLV 3 ай бұрын

This really helped me understand the topic better. Great content!

@DataSpark45 3 ай бұрын

Glad to hear it!

@SuprajaGSLV 3 ай бұрын

Could you please upload a video on differences between Data Lake vs Data warehouse vs Delta Tables

@DataSpark45 3 ай бұрын

Thanks a million. That will do for sure

@Lucky-eo8cl 3 ай бұрын

Good explanation bro👏🏻. It's Really helpful

@DataSpark45 3 ай бұрын

Glad to hear that

@vamshimerugu6184 3 ай бұрын

I think schema comparison is the important topic in pyspark . Great explanation sir ❤

@DataSpark45 3 ай бұрын

thank you bro

@vamshimerugu6184 3 ай бұрын

Great explanation ❤.Keep upload more content on pyspark

@DataSpark45 3 ай бұрын

Thank you, I will

@saibhargavreddy5992 3 ай бұрын

I found this very useful as I had a similar issue with data validations. It helped a lot while completing my project.

@DataSpark45 3 ай бұрын

Glad it helped!

@maheswariramadasu1301 3 ай бұрын

This video helps me to understand the multiple even triggers in adf

@DataSpark45 3 ай бұрын

Glad to hear that

@maheswariramadasu1301 3 ай бұрын

It's help me a lot for learning pyspark easily

@0adarsh101 3 ай бұрын

can i use databricks community edition?

@DataSpark45 3 ай бұрын

Hi, You can use databricks, then you have to play around dbutils.fs methods in order to get the list / file path as we did in get_env.py file. Thank you

@VaanisToonWorld-rp5xy 3 ай бұрын

Please share files for FBS project

@DataSpark45 3 ай бұрын

unfortunately, we lost those files and account

@SaadAhmed-js5ew 4 ай бұрын

where's your parquet file located?

@DataSpark45 3 ай бұрын

Hi, r u talking about source parquet file! It's under source folder

@OmkarGurme 4 ай бұрын

while working with databricks we dont need to start a spark session right ?

@DataSpark45 4 ай бұрын

No need brother, we can continue with out defining spark session, i just kept for practice

@listentoyourheart45 4 ай бұрын

Nice explanation sir

@kaushikvarma2571 4 ай бұрын

is this continuation of part - 2? In part-2 we have neve discussed about test.py and udfs.py

@DataSpark45 4 ай бұрын

yes, test.py here i used for just to run the functions and about udfs.py please watch it from 15:oo min onwards

@kaushikvarma2571 4 ай бұрын

To solve header error, replace csv code to this "elif file_format == 'csv': df = spark.read.format(file_format).option("header",True).option("inferSchema",True).load(file_dir)"

@memesmacha61 Ай бұрын

Thank ypu bro

@charangowdamn8661 4 ай бұрын

Hi sir how can I reach to you can you ?

@charangowdamn8661 4 ай бұрын

Hi sir how can I reach to you can you please share your mail id or how can I connect to you

@DataSpark45 4 ай бұрын

you can reach me on linkedin id : valluru lokeswar reddy

@aiviet5497 5 ай бұрын

I can't download the dataset 😭.

@DataSpark45 5 ай бұрын

Take a look at this : drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing

@sauravkumar9454 5 ай бұрын

sir, you are the best, love how you have taught and mentioned even the small of things. Would be looking forward to videos like this.

@DataSpark45 5 ай бұрын

Sure I will

@World_Exploror 5 ай бұрын

how did you define reference_df and control_df

@DataSpark45 5 ай бұрын

we defined as a table in any DataBase. As of know i used them as a csv

@mrunalshahare4841 5 ай бұрын

Can you share part 2

@finnegan2741 5 ай бұрын

✋ Promo_SM

@vishavsi 5 ай бұрын

I am getting error with logging. Python\Python39\lib\configparser.py", line 1254, in __getitem__ raise KeyError(key) KeyError: 'keys' can you share the code written in the video?

@DataSpark45 5 ай бұрын

sure, here is the link drive.google.com/drive/folders/1QD8635pBSzDtxI-ykTx8yquop2i4Xghn?usp=sharing

@vishavsi 5 ай бұрын

Thanks@@DataSpark45

@subhankarmodumudi9033 5 ай бұрын

did your problem resolved? @@vishavsi

@jitrana6813 5 ай бұрын

how can we use spark.sql instead pyspark dataframe select cmds, can you advise how can we do

@DataSpark45 5 ай бұрын

Hi when you write df to hive generally we use df.saveasTable() . so that table will created in Hive environment then we can use spark.sql(select * from table). If you don't want to use HIVE then probably we use df.registerTempTable("TableName")

@ritesh_ojha 5 ай бұрын

<Error> <Code>AuthenticationFailed</Code> <Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:ea8e17b4-701e-004d-1db1-573f6a000000 Time:2024-02-04T21:31:20.0816196Z</Message> <AuthenticationErrorDetail>Signature not valid in the specified time frame: Start [Tue, 22 Nov 2022 07:36:34 GMT] - Expiry [Wed, 22 Nov 2023 15:36:34 GMT] - Current [Sun, 04 Feb 2024 21:31:20 GMT]</AuthenticationErrorDetail> </Error>

@DataSpark45 5 ай бұрын

where did you got this error bro

@ritesh_ojha 5 ай бұрын

@@DataSpark45 while downloading data. But i got data from part 2

@user-fz1rj6gz2g 5 ай бұрын

Thank you for the amazing project sir. can you please provide the GitHub link for this project or the project file

@user-fz1rj6gz2g 6 ай бұрын

thanks for the amazing content , please upload more videos like this

@DataSpark45 5 ай бұрын

Thank you, I will

@ranjithrampally7982 6 ай бұрын

Do u provide training ?

@DataSpark45 6 ай бұрын

As of Now i'm not providing training bro.But you can reach out me at any time for any sort of doubts Thank you

@vinothkannaramsingh8224 6 ай бұрын

Sort the both ref/df column name based on alphabetical order and compare column names ? will it be sufficient ?

@DataSpark45 6 ай бұрын

Certainly, whatever the order will mention at reference_df is the correct order as we expect.If we sort dfs column names in alphabetical order then their would be chances of failure. Thank you

Ең жақсы KZbin

Пікірлер