Пікірлер
@sainadhvenkata
@sainadhvenkata 7 күн бұрын
@dataspark Could you please provide those data links again because those link got expired
@tejathunder
@tejathunder 19 күн бұрын
sir, please upload continuation for this project.
@samar8136
@samar8136 Ай бұрын
Now it is possible to save delta table with column name having spaces: Rename and drop columns with Delta Lake column mapping
@DataSpark45
@DataSpark45 Ай бұрын
Renamed columns or removed the spaces and then created
@shaasif
@shaasif Ай бұрын
thank you so much for your real time project explanation on 5 parts it's really awesome..can you please upload remaining multiple files and file name concept video
@DataSpark45
@DataSpark45 Ай бұрын
Hi actually that concept covered in the Data validation playlist. By creating metadata files. Thanks
@shaasif
@shaasif Ай бұрын
@@DataSpark45 can you share you email id i want to communicate with you
@amandoshi5803
@amandoshi5803 Ай бұрын
source code ?
@DataSpark45
@DataSpark45 Ай бұрын
def SchemaComparision(controldf, spsession, refdf): try: #iterate controldf and get the filename and filepath for x in controldf.collect(): filename = x['filename'] #print(filename) filepath = x['filepath'] #print(filepath) #define the dataframes from the filepaths print("Data frame is creating for {} or {}".format(filepath, filename)) dfs = spsession.read.format('csv').option('header', True).option('inferSchema', True).load(filepath) print("DF Created for {} or {}".format(filepath, filename)) ref_filter = refdf.filter(col('SrcFileName') == filename) for x in ref_filter.collect(): columnNames = x['SrcColumns'] refTypes = x['SrcColumnType'] #print(columnNames) columnNamesList = [x.strip().lower() for x in columnNames.split(",")] refTypesList = [x.strip().lower() for x in refTypes.split(",")] #print(refTypesList) dfsTypes = dfs.schema[columnNames].dataType.simpleString() #StringType() : string , IntergerType() : int dfsTypesList = [x.strip().lower() for x in dfsTypes.split(",")] # columnName : Row id, DataFrameType : int, reftype: int missmatchedcolumns = [(col_name, df_types, ref_types) for (col_name, df_types, ref_types) in zip(columnNamesList, dfsTypesList, refTypesList) if dfsTypesList != refTypesList] if missmatchedcolumns : print("schema comparision has been failed or missmatched for this {}".format(filename)) for col_name, df_types, ref_types in missmatchedcolumns: print(f"columnName : {col_name}, DataFrameType : {df_types}, referenceType : {ref_types}") else: print("Schema comaprision is done and success for {}".format(filename)) except Exception as e: print("An error occured : ", str(e)) return False
@maheswariramadasu1301
@maheswariramadasu1301 Ай бұрын
Highly underrated channel and need more videos
@ArabindaMohapatra
@ArabindaMohapatra Ай бұрын
I just started watching this playlist. I'm hoping to learn how to deal with schema-related issues in real time.Thanks
@DataSpark45
@DataSpark45 Ай бұрын
Thanks a million bro
@gregt7725
@gregt7725 2 ай бұрын
That is great - but how to handle deletion from Source ? Actually I do not understand why after sucessful changes/inserts - any deleletion of sorurce (e.g row number 2) creates duplicated rows of previously changed records.( last_updated_date do it - but why )
@DataSpark45
@DataSpark45 2 ай бұрын
Hi, can you please share the details or pic where u have the doubt
@erwinfrerick3891
@erwinfrerick3891 2 ай бұрын
Great explain, very clearly, this video very helpfull for me
@DataSpark45
@DataSpark45 2 ай бұрын
Glad to hear that!
@ChetanSharma-oy4ge
@ChetanSharma-oy4ge 2 ай бұрын
how can i find this code? is there any repo where you have uploaded it.?
@DataSpark45
@DataSpark45 2 ай бұрын
Sorry to say this bro , unfortunately we lost those files
@waseemMohammad-qx7ix
@waseemMohammad-qx7ix 2 ай бұрын
thank you for making this project it has helped me a lot.
@maheswariramadasu1301
@maheswariramadasu1301 2 ай бұрын
This video really help me because tomorrow I have to explain about this topics I am searching in the KZbin for the best explanation.This video helps me to know about from scratch
@mohitupadhayay1439
@mohitupadhayay1439 2 ай бұрын
Amazing content. Keep a playlist for Real time scenarios for Industry.
@mohitupadhayay1439
@mohitupadhayay1439 2 ай бұрын
Very underrated channel!
@DataSpark45
@DataSpark45 2 ай бұрын
Thanks a million bro
@ajaykiranchundi9979
@ajaykiranchundi9979 2 ай бұрын
Very helpful! Thank you
@shahnawazahmed7474
@shahnawazahmed7474 2 ай бұрын
I'm looking for ADF training will u provide that! how can I contact you?Thanks
@DataSpark45
@DataSpark45 2 ай бұрын
Hi, u can contact me through LinkedIn Lokeswar Reddy Valluru
@MuzicForSoul
@MuzicForSoul 2 ай бұрын
sir, can you please also show us the run failing, you are only showing passing case, when I tested by swaping the columns in dataframe it is still not failing because the set still have them in same order.
@DataSpark45
@DataSpark45 2 ай бұрын
Set values will come from reference df .so it always a constant one
@pranaykumar581
@pranaykumar581 2 ай бұрын
Can you provide me the source data file?
@DataSpark45
@DataSpark45 2 ай бұрын
Hi in the description i provided the link bro
@MuzicForSoul
@MuzicForSoul 2 ай бұрын
why we have to do ColumnPositionComparision? shouldn't the column name comparison you did earlier catch this?
@irfanzain8086
@irfanzain8086 3 ай бұрын
Bro, thanks a lot! Great explaination 👍 Can you share part 2?
@vamshimerugu6184
@vamshimerugu6184 3 ай бұрын
Sir Can you make a video on how to connect adls to DataBricks using Service principle
@DataSpark45
@DataSpark45 3 ай бұрын
Thanks for asking, will do that one for sure .
@rohilarohi
@rohilarohi 3 ай бұрын
This video helped me a lot.hope we can expect more real time scenarios like this
@SuprajaGSLV
@SuprajaGSLV 3 ай бұрын
This really helped me understand the topic better. Great content!
@DataSpark45
@DataSpark45 3 ай бұрын
Glad to hear it!
@SuprajaGSLV
@SuprajaGSLV 3 ай бұрын
Could you please upload a video on differences between Data Lake vs Data warehouse vs Delta Tables
@DataSpark45
@DataSpark45 3 ай бұрын
Thanks a million. That will do for sure
@Lucky-eo8cl
@Lucky-eo8cl 3 ай бұрын
Good explanation bro👏🏻. It's Really helpful
@DataSpark45
@DataSpark45 3 ай бұрын
Glad to hear that
@vamshimerugu6184
@vamshimerugu6184 3 ай бұрын
I think schema comparison is the important topic in pyspark . Great explanation sir ❤
@DataSpark45
@DataSpark45 3 ай бұрын
thank you bro
@vamshimerugu6184
@vamshimerugu6184 3 ай бұрын
Great explanation ❤.Keep upload more content on pyspark
@DataSpark45
@DataSpark45 3 ай бұрын
Thank you, I will
@saibhargavreddy5992
@saibhargavreddy5992 3 ай бұрын
I found this very useful as I had a similar issue with data validations. It helped a lot while completing my project.
@DataSpark45
@DataSpark45 3 ай бұрын
Glad it helped!
@maheswariramadasu1301
@maheswariramadasu1301 3 ай бұрын
This video helps me to understand the multiple even triggers in adf
@DataSpark45
@DataSpark45 3 ай бұрын
Glad to hear that
@maheswariramadasu1301
@maheswariramadasu1301 3 ай бұрын
It's help me a lot for learning pyspark easily
@0adarsh101
@0adarsh101 3 ай бұрын
can i use databricks community edition?
@DataSpark45
@DataSpark45 3 ай бұрын
Hi, You can use databricks, then you have to play around dbutils.fs methods in order to get the list / file path as we did in get_env.py file. Thank you
@VaanisToonWorld-rp5xy
@VaanisToonWorld-rp5xy 3 ай бұрын
Please share files for FBS project
@DataSpark45
@DataSpark45 3 ай бұрын
unfortunately, we lost those files and account
@SaadAhmed-js5ew
@SaadAhmed-js5ew 4 ай бұрын
where's your parquet file located?
@DataSpark45
@DataSpark45 3 ай бұрын
Hi, r u talking about source parquet file! It's under source folder
@OmkarGurme
@OmkarGurme 4 ай бұрын
while working with databricks we dont need to start a spark session right ?
@DataSpark45
@DataSpark45 4 ай бұрын
No need brother, we can continue with out defining spark session, i just kept for practice
@listentoyourheart45
@listentoyourheart45 4 ай бұрын
Nice explanation sir
@kaushikvarma2571
@kaushikvarma2571 4 ай бұрын
is this continuation of part - 2? In part-2 we have neve discussed about test.py and udfs.py
@DataSpark45
@DataSpark45 4 ай бұрын
yes, test.py here i used for just to run the functions and about udfs.py please watch it from 15:oo min onwards
@kaushikvarma2571
@kaushikvarma2571 4 ай бұрын
To solve header error, replace csv code to this "elif file_format == 'csv': df = spark.read.format(file_format).option("header",True).option("inferSchema",True).load(file_dir)"
@memesmacha61
@memesmacha61 Ай бұрын
Thank ypu bro
@charangowdamn8661
@charangowdamn8661 4 ай бұрын
Hi sir how can I reach to you can you ?
@charangowdamn8661
@charangowdamn8661 4 ай бұрын
Hi sir how can I reach to you can you please share your mail id or how can I connect to you
@DataSpark45
@DataSpark45 4 ай бұрын
you can reach me on linkedin id : valluru lokeswar reddy
@aiviet5497
@aiviet5497 5 ай бұрын
I can't download the dataset 😭.
@DataSpark45
@DataSpark45 5 ай бұрын
Take a look at this : drive.google.com/drive/folders/1XMthOh9IVAScA8Lk-wfbBnKCEtmZ6UKF?usp=sharing
@sauravkumar9454
@sauravkumar9454 5 ай бұрын
sir, you are the best, love how you have taught and mentioned even the small of things. Would be looking forward to videos like this.
@DataSpark45
@DataSpark45 5 ай бұрын
Sure I will
@World_Exploror
@World_Exploror 5 ай бұрын
how did you define reference_df and control_df
@DataSpark45
@DataSpark45 5 ай бұрын
we defined as a table in any DataBase. As of know i used them as a csv
@mrunalshahare4841
@mrunalshahare4841 5 ай бұрын
Can you share part 2
@finnegan2741
@finnegan2741 5 ай бұрын
✋ Promo_SM
@vishavsi
@vishavsi 5 ай бұрын
I am getting error with logging. Python\Python39\lib\configparser.py", line 1254, in __getitem__ raise KeyError(key) KeyError: 'keys' can you share the code written in the video?
@DataSpark45
@DataSpark45 5 ай бұрын
sure, here is the link drive.google.com/drive/folders/1QD8635pBSzDtxI-ykTx8yquop2i4Xghn?usp=sharing
@vishavsi
@vishavsi 5 ай бұрын
Thanks@@DataSpark45
@subhankarmodumudi9033
@subhankarmodumudi9033 5 ай бұрын
did your problem resolved? @@vishavsi
@jitrana6813
@jitrana6813 5 ай бұрын
how can we use spark.sql instead pyspark dataframe select cmds, can you advise how can we do
@DataSpark45
@DataSpark45 5 ай бұрын
Hi when you write df to hive generally we use df.saveasTable() . so that table will created in Hive environment then we can use spark.sql(select * from table). If you don't want to use HIVE then probably we use df.registerTempTable("TableName")
@ritesh_ojha
@ritesh_ojha 5 ай бұрын
<Error> <Code>AuthenticationFailed</Code> <Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:ea8e17b4-701e-004d-1db1-573f6a000000 Time:2024-02-04T21:31:20.0816196Z</Message> <AuthenticationErrorDetail>Signature not valid in the specified time frame: Start [Tue, 22 Nov 2022 07:36:34 GMT] - Expiry [Wed, 22 Nov 2023 15:36:34 GMT] - Current [Sun, 04 Feb 2024 21:31:20 GMT]</AuthenticationErrorDetail> </Error>
@DataSpark45
@DataSpark45 5 ай бұрын
where did you got this error bro
@ritesh_ojha
@ritesh_ojha 5 ай бұрын
@@DataSpark45 while downloading data. But i got data from part 2
@user-fz1rj6gz2g
@user-fz1rj6gz2g 5 ай бұрын
Thank you for the amazing project sir. can you please provide the GitHub link for this project or the project file
@user-fz1rj6gz2g
@user-fz1rj6gz2g 6 ай бұрын
thanks for the amazing content , please upload more videos like this
@DataSpark45
@DataSpark45 5 ай бұрын
Thank you, I will
@ranjithrampally7982
@ranjithrampally7982 6 ай бұрын
Do u provide training ?
@DataSpark45
@DataSpark45 6 ай бұрын
As of Now i'm not providing training bro.But you can reach out me at any time for any sort of doubts Thank you
@vinothkannaramsingh8224
@vinothkannaramsingh8224 6 ай бұрын
Sort the both ref/df column name based on alphabetical order and compare column names ? will it be sufficient ?
@DataSpark45
@DataSpark45 6 ай бұрын
Certainly, whatever the order will mention at reference_df is the correct order as we expect.If we sort dfs column names in alphabetical order then their would be chances of failure. Thank you