Compare Two excel or csv files using python Pandas library.

  Рет қаралды 31,239

Gulshan Kumar

Gulshan Kumar

Күн бұрын

Пікірлер: 65
@srikishoreka2376
@srikishoreka2376 2 жыл бұрын
Hi gulshan this content is very much helpful
@MelvinBagby
@MelvinBagby 10 ай бұрын
Very well explained, and not too long. Thank you!
@khushitayal1470
@khushitayal1470 2 жыл бұрын
Awesome content and helpful too😁👍. Please keeping posting such good content. Waiting for your posts.😁👍
@Sece1
@Sece1 3 жыл бұрын
this was a great content man, you should continue to do so. Thank you
@gulshankumar17
@gulshankumar17 3 жыл бұрын
thank you
@ajay13nt
@ajay13nt 2 жыл бұрын
Great tutorial.. thank you so much brother
@vijyalaxmi8925
@vijyalaxmi8925 3 жыл бұрын
Hi Gulshan, the content was really good , keep up doing the good work, just if u can explain each line of code with all the operators functionalities, then it would better for newbies like me . Thank you for your efforts 👍😌
@joxxen
@joxxen 2 жыл бұрын
thank you! finally what I'm looking for :):)
@noerhakeem
@noerhakeem Жыл бұрын
where do we put the file?
@kamchor6747
@kamchor6747 2 жыл бұрын
Thank you so much
@kids-poem687
@kids-poem687 Жыл бұрын
Hello Gulshan just i have to check if two excel sheets are matching or not.can you help me out
@angelreyes3414
@angelreyes3414 3 жыл бұрын
You are awesome! Thank you!
@gulshankumar17
@gulshankumar17 3 жыл бұрын
Thank you so much.
@boxersony4225
@boxersony4225 3 жыл бұрын
The columns are in not same the order ,it will works ?
@uppalaadarsh3261
@uppalaadarsh3261 3 жыл бұрын
Hi , in spark scala , I want to compare CSV file using test cases ,but I don't know how to compare those , is it based on number of columns
@gulshankumar17
@gulshankumar17 3 жыл бұрын
Hi there, actually I haven't worked on spark scala, but few weeks ago I started working on PySpark, so I have added code for the Pyspark version in the GitHub link github.com/gulshan17/Compare-files-using-Pandas . The script name is compare_files_pyspark.py. you can take reference from this script and implement it in spark. Basically what it's doing is performing a anti-join(link for reference spark.apache.org/docs/latest/sql-ref-syntax-qry-select-join.html). You can use the same and compare those csv files. Let me know incase of any concern, I'll help you out.
@uppalaadarsh3261
@uppalaadarsh3261 3 жыл бұрын
@@gulshankumar17 thank you
@Mathias-cq3zo
@Mathias-cq3zo 3 жыл бұрын
I'm trying the same code but for some reason when I execute the first "df_join" it is not creating a new .xlsx file with both files joined. Any guess??
@gulshankumar17
@gulshankumar17 3 жыл бұрын
this is because I haven't written those DataFrames to excel in my code, they are just present as pandas DataFrame in memory. To save the files as xlsx files, just use this df.to_excel('filename.xlsx', index = False) df.to_csv('filename.csv', index = False) if you want to write it in csv.
@srikishoreka2376
@srikishoreka2376 2 жыл бұрын
And i have a doubt if a sheet contain a file path with name and sheet contain only name then how we should remove that file path and check both the sheet ?
@shrinivasulunandyala9269
@shrinivasulunandyala9269 3 жыл бұрын
How can find the matched records from both files, with specified multiple columns
@sampathkumar7234
@sampathkumar7234 3 жыл бұрын
I need to simply join the two extracted DB values and get the result in a report with all the values and the mistmatched values should be highlighted. Can u make a video for it
@ujjwalwadera6858
@ujjwalwadera6858 2 жыл бұрын
I want to append data from different CSV(total 12 ) files into one file .. the coloums are same in all the files .. But the data is different .. how do I do that? It's just a 1 year data and I want to add in the same sheet
@gulshankumar17
@gulshankumar17 2 жыл бұрын
save all the files path in a list, let's say the list name is 'file_paths' then you can write the following code df = pd.DataFrame() for path in file_paths: df = df.append(pd.read_csv(path))
@thatgirl6926
@thatgirl6926 2 жыл бұрын
hey thanks alot, I have no errors; yet i cant find the df_join file where i can find it? shall i create it? to see the merged files on it ...
@gulshankumar17
@gulshankumar17 2 жыл бұрын
use this code # %% import pandas as pd # %% df1 = pd.read_excel('file1.xlsx') df2 = pd.read_excel('file2.xlsx') # %% df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer') # %% df1.rename(columns = lambda x : x + '_file1', inplace = True) df2.rename(columns = lambda x : x + '_file2', inplace = True) # %% df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer') # %% records_present_in_df1_not_in_df2 = df_join.loc[df_join[df2.columns.to_list()].isnull().all(axis = 1), df1.columns.to_list()] # %% records_present_in_df2_not_in_df1 = df_join.loc[df_join[df1.columns.to_list()].isnull().all(axis = 1), df2.columns.to_list()] # %% to save the variable in an excel file, use the following code records_present_in_df1_not_in_df2.to_excel('records_present_in_df1_not_in_df2.xlsx', index = False) records_present_in_df2_not_in_df1.to_excel('records_present_in_df2_not_in_df1.xlsx', index = False) df_join.to_excel('df_join.xlsx', index = False)
@aditishekhar22
@aditishekhar22 2 жыл бұрын
Hi, im getting AttributeError: 'numpy.ndarray' object has no attribute 'merge' when i run the code you have shared. Im using python 3.10. Please help how to run the same in python 3.10
@gulshankumar17
@gulshankumar17 2 жыл бұрын
from the error I can see you have read your file into a numpy array instead of pandas DataFrame. convert your array into a data frame using pd.DataFrame or read using excel or civ directly using pd.read_excel or pd.read_csv function. You can download the code from the GitHub link present in the video description.
@hiraleo31
@hiraleo31 4 жыл бұрын
Hi. I have two excel lists and i want the records in separate list which exist in excel1 but not in excel 2, both have same header. I hope this python script works well for my task.
@gulshankumar17
@gulshankumar17 4 жыл бұрын
this will work
@manuprakashmp5958
@manuprakashmp5958 Жыл бұрын
Hi thanks for this video, i have one scenario like two different XL files and having different values and some are same so i want both compared result in one new result file and both files having different rows and columns can you please make a video
@piotrjanuszek2053
@piotrjanuszek2053 Жыл бұрын
Hello, this is awesome. Works like a charm 😮 Is there any way so we can get the same elements from two files? Not the difference but elements/values that are the same. I tried modifying on my own but I can’t make it 😅 Again, great video!
@gulshankumar17
@gulshankumar17 Жыл бұрын
just add the following line of code to existing script. records_present_in_df1_and_df2 = df_join.loc[(df_join[df1.columns.to_list()].notnull().any(axis = 1)) && (df_join[df2.columns.to_list()].notnull().any(axis = 1)), :]
@bucket_farmer
@bucket_farmer Жыл бұрын
This code says Invalid Syntax can you please help
@gulshankumar17
@gulshankumar17 Жыл бұрын
@@bucket_farmer please contact me on linkedin. heres my linkedin profile- www.linkedin.com/in/gulshankumar17/
@akshaybhat9040
@akshaybhat9040 2 жыл бұрын
what if both dataframes columns are not matching but still have to match values in each rows please help
@gulshankumar17
@gulshankumar17 2 жыл бұрын
you mean the columns are in different order, in that case you can pass the columns name in the merge function as df1.merge(df2, left_on = [col1_file1, col2_file], right_on = [col2_file2, col1_file2], how = 'outer') in the above case col1_file1 will be compared with col2_file2 and col2_file1 will be compared with col1_file2.
@akshaybhat9040
@akshaybhat9040 2 жыл бұрын
@@gulshankumar17 not about order columns names from both datarames does not match at all but only information in the column from two dataframes matches in some of that I need to find which are matching
@gulshankumar17
@gulshankumar17 2 жыл бұрын
@@akshaybhat9040 even if the column names are different, just pass the column names in left_on and right_on parameters
@akshaybhat9040
@akshaybhat9040 2 жыл бұрын
@@gulshankumar17 hello gulshan, I am still facing issues can you please share your mail id so that I can connect with you and get my solution?
@gulshankumar17
@gulshankumar17 2 жыл бұрын
@@akshaybhat9040 www.linkedin.com/in/gulshankumar17/ let's connect on Linkedin
@SheloMystics
@SheloMystics 2 жыл бұрын
How to add the unmatched rows to the matched value column without merging the entire dataset.
@joshuajadedelacruz6267
@joshuajadedelacruz6267 2 жыл бұрын
Hi im getting error on imorting the data, it saysNo such file or directory.
@gulshankumar17
@gulshankumar17 2 жыл бұрын
Hi, make sure the excel files are present in the same folder where you have your python script. Also the name of file should be correct, in my case the file names are file1.xlsx and file2.xlsx.
@joshuajadedelacruz6267
@joshuajadedelacruz6267 2 жыл бұрын
@@gulshankumar17 I got it already, how about the df_join, my merge file should be 2 columns hut it only reflect 1 column
@gulshankumar17
@gulshankumar17 2 жыл бұрын
@@joshuajadedelacruz6267 make sure you are renaming the columns like I have renamed by adding '_file1' and '_file2'
@poojakunder
@poojakunder 2 жыл бұрын
Hi Gulshan , Thanks a lot for sharing this 🙂!! I have a quick question we can use this script to compare excel with 3000 rows as well right ?
@gulshankumar17
@gulshankumar17 2 жыл бұрын
yeah
@poojakunder
@poojakunder 2 жыл бұрын
@@gulshankumar17 Thanks a lot 😊
@MJAnshu
@MJAnshu Жыл бұрын
Hi , Can I execute this script in Databricks..?
@gulshankumar17
@gulshankumar17 Жыл бұрын
yes, databricks has support for python and pandas. You can even use the pyspark version of the script which is present on my Github, link mentioned in the description.
@tusharsharma744
@tusharsharma744 Жыл бұрын
Sir compare values deprecation Warning showing
@vibhutisalvi
@vibhutisalvi 3 жыл бұрын
hey....I need to compare 2 files having 10 columns each (join on 3 columns forming the unique key) and generate the output, with one additional column "comment" 1. comment = "Additional in File 1" if records does not exists in file 2 2. comment = "Additional in File 2" id records does not exists in file 1 3. comment = "data mismatch" ...if there is a match on 3 columns, but any of the other 7 columns data is a mismatch. 4. highlight the mismatch cell in the output excel, and excel should have both the values.
@chandanchan2770
@chandanchan2770 2 жыл бұрын
hi Gulshan can we connect in mail have some doubts need to clarify
@zackszigeti3713
@zackszigeti3713 Жыл бұрын
Jeez I cannot thank you enough. I am going to see if I can figure out my problem on my own, but if you can assist I would be so grateful! I have this as the dataset for two different CSV when I do your DF_Join command. Thank you so much - does exactly as intended and matches data from one day on same line as data from next day. any indescrepencies show that data did not appear in one of two days. Symbol, Type, Strike Price, Exp Date /// Symbol, Type, Strike Price, Exp Date MS, Call, 85, 4/21/2023 /// MS, Call, 85, 4/21/2023 HOWEVER, Mr. Kumar, I had to edit both CSV to remove a certain column. This column is Volume. How can I make it so itll show the volume but not try and match that specific colum? If I include volume (because this is the only column that always differs no matter the row (1,2,3,4) or column (A,B,C,D)), then it just separates everything and is not helpful. Again, I am forever grateful for this video and I will try to figure it out on my own. I believe i have to adjust this lambda x:x but i am not sure.
@gulshankumar17
@gulshankumar17 Жыл бұрын
Hey Zack, I have updated the code to exclude the columns while matching the data. heres the updated code github.com/gulshan17/Compare-files-using-Pandas/blob/master/compare_files.py#enroll-beta in line 19 and 20, just mention the column names you want to exclude while matching/joining the data. like if you want to exclude the Volume column, just add the Volume column in the square braces as below: df1_columns_to_exclude_join = ['Volume'] df2_columns_to_exclude_join = ['Volume'] The code in the script looks like this. # columns to exclude while joining, mention the columns to exclude while matching the data in comma separated like ['column1', 'column2'] df1_columns_to_exclude_join = [] df2_columns_to_exclude_join = [] here's my linkedin ID - www.linkedin.com/in/gulshankumar17/ if you need further help.
@zackszigeti3713
@zackszigeti3713 Жыл бұрын
@@gulshankumar17 Sir you are the best. I will definitely reach out if I need additional help. Wishing you nothing but happiness! much love from new york
@BreatheOutAnxiety
@BreatheOutAnxiety 3 жыл бұрын
Hi, great content!! Can we compare two csv files with different column names but common values?
@gulshankumar17
@gulshankumar17 3 жыл бұрын
yes, as long as the columns are in the same order, this will work. Let me know if you find any issue, I'll help out.
@BreatheOutAnxiety
@BreatheOutAnxiety 3 жыл бұрын
@@gulshankumar17thank you for the response :) Does this imply that the numbers of columns in both tables should be equal too?
@gulshankumar17
@gulshankumar17 3 жыл бұрын
@@BreatheOutAnxiety yes, if you want to compare two files then number of columns should be same, isn't it? if you have some extra columns in your file which you don't want to compare, you can drop those columns and then compare.
@shaktisingh-fz7cs
@shaktisingh-fz7cs 2 жыл бұрын
explain this in pycharm or some IDE,, where it is easy to understand.. as a beginer i understood nothing
@innigaadu
@innigaadu Жыл бұрын
No Proper explanation of what the commands mean, Or what is the logic that you are trying to implement. Just a bunch of commands . Not very helpful
@kushalgedam1400
@kushalgedam1400 3 жыл бұрын
hi Gulshan, above video really help to compare file thank you for video.. i am try to load below code it is running but dataframe showing as NULL value.. code===values_present = Gics_join.loc[Gics_join[Gicspresent.columns.to_list()].isnull().all(axis = 1),Gicspast.columns.to_list()]
@gulshankumar17
@gulshankumar17 3 жыл бұрын
Hi Kushal, your code seems correct, may be there is no difference between the two files. Try checking the variable names and file names you are reading as DataFrame.
Compare excel files using Python with Pandas
11:23
Stalin Jayapal
Рет қаралды 51 М.
How To Compare Excel Files and Find Differences
10:47
Kenji Explains
Рет қаралды 110 М.
GIANT Gummy Worm #shorts
0:42
Mr DegrEE
Рет қаралды 152 МЛН
Automate Excel Work with Python and Pandas
21:29
John Watson Rooney
Рет қаралды 101 М.
Comparing Value Difference Between 2 CSV Files using pandas
6:13
Column-wise comparison of two Dataframes | PySpark | Realtime Scenario
12:44
Introducing Python in Excel
19:01
Leila Gharani
Рет қаралды 1,7 МЛН
Python Excel - Reading Excel files with Pandas read_excel
19:33
Very Academy
Рет қаралды 84 М.
GIANT Gummy Worm #shorts
0:42
Mr DegrEE
Рет қаралды 152 МЛН