How do I merge DataFrames in pandas?

  Рет қаралды 158,714

Data School

Data School

Күн бұрын

Пікірлер: 295
@NoName-qx9zc
@NoName-qx9zc 4 жыл бұрын
I'd like to thank the author. You really do a great job. Everything is structured, decomposed and coherent. Some guys just jump in complex coding without really explaining what's going on there.
@gardnmi
@gardnmi 4 жыл бұрын
The best new feature with merge is the validate option to make sure your join is 1:1, 1:M, etc. This is very useful for machine learning projects or end user reports that rely on upstream data that is updated regularly. It's saved me headaches a few times.
@dataschool
@dataschool 4 жыл бұрын
The "validate" option is great, I agree! I also like "indicator", which I explained here: twitter.com/justmarkham/status/1153653794829418496
@milindbebarta2226
@milindbebarta2226 2 жыл бұрын
You are really good at explaining things. One of the better teachers on youtube. Thanks a ton for this video and I hope there's more coming.
@dataschool
@dataschool 2 жыл бұрын
Thank you!
@jonass7456
@jonass7456 3 жыл бұрын
Dude! Let me tell you, you saved me a lot of time and work! Thank you so much!
@dataschool
@dataschool 3 жыл бұрын
Great to hear!
@themustknowfacts510
@themustknowfacts510 3 жыл бұрын
I'm not able to read that file "u.item" , I copied the same code from GitHub but pandas wasn't able to read that. It showed me Unicode Error... How do I solve that issue..
@ChrisMao_708
@ChrisMao_708 3 жыл бұрын
insert this encoding='latin-1' and you will be fine
@citizen_deb
@citizen_deb 4 жыл бұрын
Thank you so much Kevin, your neat explanation along with the file you share makes it so clear, was really needing it!
@NiireNolweva
@NiireNolweva 3 жыл бұрын
Very clear and informative. Thank you very much.
@dataschool
@dataschool 3 жыл бұрын
You're very welcome!
@jaysoni7812
@jaysoni7812 4 жыл бұрын
Were is the link of the data set which has been used in this video. I want to practice this with your data set can you please send me link?
@BC-gc7bv
@BC-gc7bv 4 жыл бұрын
You are an excellent teacher!!! I'm a fan. TY.
@tommonks2490
@tommonks2490 4 жыл бұрын
Excellently explained as always. Keep up the great work!!
@dataschool
@dataschool 4 жыл бұрын
Thank you!
@SamSam-mh5jt
@SamSam-mh5jt 3 жыл бұрын
Thank you so much for the clear and concise explanation
@dataschool
@dataschool 3 жыл бұрын
You're welcome!
@joseluisbeltramone599
@joseluisbeltramone599 3 жыл бұрын
Thank you very much for the precise explanation, just what I needed to know!
@dataschool
@dataschool 3 жыл бұрын
You're very welcome! 🙏
@nowyouknow2249
@nowyouknow2249 4 жыл бұрын
Thanks a lot Kevin We have missed you.
@dataschool
@dataschool 4 жыл бұрын
Thank you! 😊
@SR-lf3ic
@SR-lf3ic 2 жыл бұрын
hi, when I used pd.concat([df1,df2]), I got a tuple object instead of a dataframe object. I am using Python 3.9 environment. I would like to know what should I do to get a dataframe object rather than a tuple object?
@zapy422
@zapy422 4 жыл бұрын
Thank you for this video. I have been struggling with merge and concat today :)
@dataschool
@dataschool 4 жыл бұрын
You're very welcome! Glad it's helpful to you!
@fschmidkonz
@fschmidkonz 3 жыл бұрын
You're great teacher! I see the despite having a large 100K row file, the number of rows do not get expanded after the merge. They beautifully stay the same and just add the movie titles to the reviews. Can you comment on why this is not always the case. I have tried and my output file gets expanded by a few rows (17 out of 1000) and I have not been able to figure out why. I have checked multiple videos and some come absurd not practical solutions (like the files are the same size) or arbitrarily eliminate any dups (despite some may be valid rows), but none explain the reason and how to identify those rows that could be dups. Your comments are appreciated.
@AsMa-eg
@AsMa-eg 3 жыл бұрын
thank u so much. very clear and to the point.
@dataschool
@dataschool 3 жыл бұрын
You're welcome!
@Isabel-ec2sq
@Isabel-ec2sq 4 жыл бұрын
Thank you!! I finally got the dataframe I wanted!
@zezodiaa1025
@zezodiaa1025 2 жыл бұрын
great video. my question is when im working on project when exactly i have to combine ?
@Moc2Talk
@Moc2Talk 3 жыл бұрын
slowly talk is very helpfull to me. I have 2 questions. The first is : What's if i want merge only one certain column (rating) from df rating to df movie . The second: What's if I want to sum the rate of each Movie_Id . Tks you so much and looking for your answer.
@ayodejiakinfenwa
@ayodejiakinfenwa Жыл бұрын
Plesae i am trying to merge two datasets as you have explained but it is giving an error that i should check for duplicates
@christleiroezi8878
@christleiroezi8878 4 жыл бұрын
I have a data frame and I have a list and a tuple , I want to merge all three together . I am aware merge can only do two tables at a time, but do you have any helpful hints on how to go about merged the table , list and df. I want make to make the result a new data frame
@mohammadj.shamim9342
@mohammadj.shamim9342 4 жыл бұрын
Dear Kiven, I have some difficulties in fine tuning PLSRegression sklearn.cross_decomposition.PLSRegression. Can you please touch this issue one day?
@dataschool
@dataschool 4 жыл бұрын
Thanks for your suggestion!
@michael3226
@michael3226 2 жыл бұрын
the resulting dataset I got has a value of null. What do i do?
@saikiranhr
@saikiranhr 2 жыл бұрын
Thanks for the amazing video. One simple question. How to join tables on multiple indices (like 4 or 5)?
@tirtha9
@tirtha9 3 жыл бұрын
Lets say a pandas df and mysql have column A, B, C and same schema, Column A in SQL is the primary key. now how to upsert a pandas df to mysql table? When primary key conflicts, then update the remaining columns, when doesn't conflict/exists, then do an Insert Into.. Whats the most efficient way to do this?
@AnoNymous-dh2sv
@AnoNymous-dh2sv 2 жыл бұрын
What's the concat video? You say there is one, but I can't find it with search.
@dataschool
@dataschool 2 жыл бұрын
It's at the end of this video: kzbin.info/www/bejne/Z2bUXpypbbWSfpY Hope that helps!
@bommubhavana8794
@bommubhavana8794 3 жыл бұрын
I am a beginner in python, I am not sure what join is the best to use in different scenarios. Can you help me through it?? I genuinely learnt a lot from your videos. I would really appreciate your help. Thank you in advance
@pradeepkapoor355
@pradeepkapoor355 3 жыл бұрын
Thanks for putting u[p some amazing content on pandas data manipulation and analysis. Can you please make a video on how to get results of the unmatched rows after performing a join/merge. In real-world scenarios, many time there are unmatched records from 2 data frames which need to checked for a match in a 3rd data frame. So please help in explaining this piece as well.
@dataschool
@dataschool 3 жыл бұрын
Thanks so much for your suggestion! I'll consider it for the future.
@pradeepkapoor355
@pradeepkapoor355 3 жыл бұрын
@@dataschool Looking forward to it.
@lualmeidasouza
@lualmeidasouza 4 жыл бұрын
How do I merge df1 and df2 by two columns (fiels) at clausula on? For example: dfUltStatus = pd.merge(dfUltStatus, dfDescStatus, on=['CODIGO_STATUS','SUB_CODIGO_STATUS'], how = 'left') The object is merge the two data frames through these two fields to bring the description field.
@job2k6
@job2k6 3 жыл бұрын
Very helpful, thank you.
@dataschool
@dataschool 3 жыл бұрын
You're welcome!
@WaqasAhmed-om8ph
@WaqasAhmed-om8ph 4 жыл бұрын
sir, I hope that you and your family are good and healthy. sir, I have two questions if you have time kindly answer. 1. pandas have a lot of function and each function has a bulky parameter, how to remember all the functions and their parameters? (2) sir, although I practice but every new exercise a lot of bugs waiting for me. Thank you....!
@dataschool
@dataschool 4 жыл бұрын
Practice! That's all there is to it.
@WaqasAhmed-om8ph
@WaqasAhmed-om8ph 4 жыл бұрын
@@dataschool thnax....!
@ДмитрийИгнатьев-з5т
@ДмитрийИгнатьев-з5т 4 жыл бұрын
Hello, Many thanks for you tutorial. It's great!!! But i.m stuck is any techics to join two dataframes if one of them stack other not stack?
@joshuabarragan8414
@joshuabarragan8414 Жыл бұрын
I need help
@codewithluq
@codewithluq 3 жыл бұрын
Hi Kevin, I have a troublesome Question Here I am analyzing a dataset which is totally textual. I want to assign Grading for certain text in a column by appending a new column of Grading to each existing column. I have achieved it using a for loop but I can't save the dataframe created because the for loop overwrites the created it. I need help. Code of for loop for (ColumnName,ColumnData) in b_questions.iteritems(): b_questions['Grading'] = b_questions[ColumnName].map({'Consistently Good':4,'Outstanding':5,'Satisfactory':3}) data = b_questions.loc[:,[ColumnName,'Grading']] print(data)
@dataschool
@dataschool 3 жыл бұрын
If I'm understanding your question, I think you just need to run this one line of code: b_questions['Grading'] = b_questions['Insert column name here'].map({'Consistently Good':4,'Outstanding':5,'Satisfactory':3}) Hope that helps!
@shaheenalhirmizy9648
@shaheenalhirmizy9648 4 жыл бұрын
Hi kevin how are you doing, is there any way using pandas or another library for conditional merging?, if I want to choose from two data Thank you very much
@dataschool
@dataschool 4 жыл бұрын
Could you describe in more detail what you mean by "conditional merging"? Thanks!
@shaheenalhirmizy9648
@shaheenalhirmizy9648 4 жыл бұрын
I mean if we have two different tables has same numbers of columns and We want to merg them but, not all data only the rows of data we want using condisonal formulas
@dataschool
@dataschool 4 жыл бұрын
You should perform the operation in two steps: first do the filter, and then do the merge.
@SM-ie7ge
@SM-ie7ge 4 жыл бұрын
Thanks for another great video. How do we join on multi-index?
@cvishnuteja597
@cvishnuteja597 4 жыл бұрын
Hi, can you please post video on realtime large csv file having millions of rows using chunks or modin and how to merge those chunks after importing in Pandas.
@dataschool
@dataschool 4 жыл бұрын
Thanks for your suggestion! FYI, if your computer does not have enough RAM to load a large DataFrame into memory, reading the DataFrame in chunks will not solve that problem. It will be just as large once you merge the chunks back together (which you can do using the "concat" function.)
@yassaryelurkar3631
@yassaryelurkar3631 2 жыл бұрын
where to add column names?
@wilsonmupfururirwa6523
@wilsonmupfururirwa6523 4 жыл бұрын
Hi wanted to ask how you check for data consistency in columns. Like checking for integers in a string column or trying to find values like 2A in a column with double letter values eg. AA, BB etc
@dataschool
@dataschool 4 жыл бұрын
Great question, though there's no "one way" to catch all of these issues! Here are some tricks that might be helpful, though: kzbin.info/www/bejne/iJ2smombnsxmnsU
@veddev8493
@veddev8493 4 жыл бұрын
can you upload tutorial on dask dataframe because it is necessary to work with large dataset,or any tutorial of pyspark
@dataschool
@dataschool 4 жыл бұрын
Thanks so much for your suggestion!
@diegorosa2292
@diegorosa2292 3 жыл бұрын
This is the video i was looking for, thank you so much, very well explained. Just one question: When you are joining the 2 indexes with different names, I figured out that, unlike the first example you made (where ID's name was the same), the 2 id's I joined are showing up both in the result. So in my case i have "Subj ID" and "ID" that are the same except for the name that's different. When I use pd.merge(db1,db2, left_on="Subj ID" rigtht_on="ID") what happens is that the 2 keys are showing up both in 2 different collumns (and this happens in your tutorial aswell). Is there a way so I can remove one? because at the moment i have one more collumn that shows the same key. Thank you very much!
@Octaphea
@Octaphea 2 жыл бұрын
Hey have you figured this out?
@ranjithphd516
@ranjithphd516 3 жыл бұрын
In Python how to row data into colum
@tanmaysinghi1868
@tanmaysinghi1868 2 жыл бұрын
thanks for the content, id appreciateit even more if you taught at a quicker pace, playing the video at 1.25x makes it better.
@jochenbrosien9556
@jochenbrosien9556 4 жыл бұрын
Kevin - I like how easy you make it look like. But here's my question - after watching I tried to apply the knowledge. I have a df1 with 3 columns and dtype='object', I have df2 with 7 colums and dtype='object'. When applying pd.merge(df1,df2) and applying .shape, I only get the column headers, no rows. What am I doing wrong?
@calluma8472
@calluma8472 3 жыл бұрын
This means you don't have any matching data between the two dataframes. Sounds like you are looking for pd.concat , which just blindly glues together.
@smstoaj
@smstoaj 4 жыл бұрын
I need help in solving a problem assume two dataframe df1 = pd.DataFrame({'Text': ['Some text 1', 'Some text 2','The monkey eats a banana','Some text 4']}) df2 = pd.DataFrame({'Keyword': ['apple', 'banana', 'chicken'], 'Type': ['fruit', 'fruit', 'meat']}) df1 Text 0 Some text 1 1 Some text 2 2 The monkey eats a banana 3 Some text 4 df2 Keyword Type 0 apple fruit 1 banana fruit 2 chicken meat Thus, the preferable outcome would be: Text Type 0 Some text 1 - 1 Some text 2 - 2 The monkey eats a banana fruit 3 Some text 4 - the problem, however, is that banana is in a sentence not a standalone value. Thanks in advance
@yufeizheng5149
@yufeizheng5149 4 жыл бұрын
May I ask how to use "on"? thank you!
@karakol86
@karakol86 4 жыл бұрын
Can you do a video about group by and agg?
@IntotheLloyd
@IntotheLloyd 4 жыл бұрын
I very rarely join on indexes as most data I analyze already has a unique identifier in the core table that I keep left joining too. On a high-level, I understand how indexes work and what an index is, but I was just wondering if anybody has a practical reason as to why you would join on an index?
@andreacazzaniga8488
@andreacazzaniga8488 4 жыл бұрын
The groupby gives you an indexed df where the index is the field of the groupby. If you are crazy enough you can skip resetting the index and keep working with the indexed df. Save a line of code, fuck up an entire codebase.
@dataschool
@dataschool 4 жыл бұрын
Some people like to put a unique (and meaningful) identifier in the index, whereas other people prefer it as a column. If you prefer the former, then it's most natural to join on an index. Does that make sense?
@IntotheLloyd
@IntotheLloyd 4 жыл бұрын
Andrea Cazzaniga I always just add .reset_index() at the end of my groupby and merge it back to my dataframe.
@saragordon6902
@saragordon6902 3 жыл бұрын
Could you do a video on how to compare in Pandas two columns in each excel file to see if they match and if they do add a column called matches to the first excel file with results of true or false?
@barefootalex
@barefootalex 3 жыл бұрын
df['match column'] = (df['col1'] == df['col2']) df = df.match_column.apply(lambda x: "Match" if x == True else "No match")
@maamounhajnajeeb209
@maamounhajnajeeb209 2 жыл бұрын
Thank you very much
@dataschool
@dataschool 2 жыл бұрын
You're welcome!
@sreecharandyaga7577
@sreecharandyaga7577 4 жыл бұрын
Legend @16:26
@fanwang6279
@fanwang6279 5 ай бұрын
Good stuff
@dataschool
@dataschool 5 ай бұрын
Thanks!
@_rsk_
@_rsk_ 4 жыл бұрын
Thanks a lot for the video Kevin. Helped me understand Panda's Merge better. The Pandas documentation doesn't mention about pd.merge(df1,df2) and suggests usage as df1.merge(df2). Whats the difference between the two ? Also, there are two references for merge in the Pandas documentation 1. pandas.merge (pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html#pandas.merge) and 2. pandas.DataFrame.merge (pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge). While contents and examples in both are same, it creates confusion :-(
@dataschool
@dataschool 4 жыл бұрын
I agree that it's confusing! The first page that you linked to, pandas.merge, is pd.merge. That's because pd is just an alias for pandas. The second page you linked to, pandas.DataFrame.merge, is the DataFrame merge method. The first is a top-level function, and the second is a DataFrame method. The pandas documentation doesn't recommend one or the other, but I definitely recommend the first. Hope that helps!
@_rsk_
@_rsk_ 4 жыл бұрын
@@dataschool Thanks Kevin. That helps.
@islamalmsarrhad2152
@islamalmsarrhad2152 2 жыл бұрын
thank you
@dataschool
@dataschool 2 жыл бұрын
You’re welcome!
@JoshDan12
@JoshDan12 4 жыл бұрын
... K!
@jeongbaekim3610
@jeongbaekim3610 4 жыл бұрын
Love your videos, very helpful to me with learning pandas and thank you for your consistent updates~ After writing this I found that your link(CODE FROM THIS VIDEO: nbviewer.jupyter.org/github/j... ) above is broken, could you pls fix it?
@dataschool
@dataschool 4 жыл бұрын
Thanks for your kind words! I see that the nbviewer service is currently having problems rendering new notebooks, so you can use this link instead to access the notebook: github.com/justmarkham/pandas-videos/blob/master/pandas_merge.ipynb
@summerzhang9484
@summerzhang9484 4 жыл бұрын
Thanks for the videos Kevin! I love your teaching style and how you make each concept so crystal clear. Please keep making these videos! Just signed up to become a patron of yours and am taking your course on Data Camp (I wish you taught more courses on there!) Once I master Pandas will try out your machine learning course too :) ps your son is so adorable
@dataschool
@dataschool 4 жыл бұрын
You are too kind, Summer! Thank you SO much for your kind words AND for becoming a patron! 🙌
@rawanfouda2291
@rawanfouda2291 4 жыл бұрын
That was honestly really good! thank you so much for your work
@lolkids7833
@lolkids7833 4 жыл бұрын
Thanks, Kevin.. this is the clearest explanation of the merge I have seen.
@dataschool
@dataschool 4 жыл бұрын
Thank you so much!
@vitoroliveira6363
@vitoroliveira6363 4 жыл бұрын
wonderfull, loved your slow passed english, that helped me a lot
@dataschool
@dataschool 4 жыл бұрын
Glad it helped!
@mschuer100
@mschuer100 4 жыл бұрын
This, by far, is the best explanation of these concepts. Thanks for sharing.
@dataschool
@dataschool 4 жыл бұрын
Wow, thank you so much for your kind words! 🙏
@dannylockett9445
@dannylockett9445 2 жыл бұрын
I really enjoy your tutorials, thanks so much! I have 5 csv files that come out daily each containing a date column. i want to merge them all using the date as the merge field. i tried a basic merge with 2 of the csv files and date was used as the merge-on field by default - so it worked. ultimately i just need one date column in my masterfile with all the other column data merged. should I continue to do this or is it better to set the date column as the index, or something else?
@autonish
@autonish 4 жыл бұрын
Brilliant Stuff, All videos are awesome. Clearly explained all fundamentals...Thanks for making this stuff easy. On a different line, you remind me of "Sheldon" from the TV series The Big bang theory and this is a compliment. :)
@dataschool
@dataschool 4 жыл бұрын
Ha! So many people have said that 😄
@alndr4u
@alndr4u Жыл бұрын
How to merge two dataframes based on 4 common columns with repatative elements?
@LuisRivera-ce9lm
@LuisRivera-ce9lm 2 жыл бұрын
I just wanted to thank you for such a great explanation of joins. I did not have it explained to me and struggled for the longest time to understand them. It takes a good teacher and someone who can understand it simply for one to understand it. Seriously, you are amazing!!
@dataschool
@dataschool 2 жыл бұрын
Thank you so much! 🙏
@da_ta
@da_ta 4 жыл бұрын
Thanks Kevin I have been looking for this for long time!
@dataschool
@dataschool 4 жыл бұрын
Awesome! I'm so glad to hear this is the video you needed! 🙌
@JustJoelTV
@JustJoelTV 2 жыл бұрын
Great video, informative and clear. Thanks
@dataschool
@dataschool 2 жыл бұрын
You're welcome!
@Octaphea
@Octaphea 2 жыл бұрын
Great video. However I have a little issue. I have 3 data frames that I am trying to merge together. The first is a pretty long database with columns (cust_id, gained_on gained_from_supplier, lost_to_supplier, sales_channel_id) the second is the supplier data frame (supplier_name, supplier_id) what I am trying to do is merge the supplier id and name from the second data frame, to the database frame which has the ID so supplier id to the number using the lefton/right on but instead it returns both columns - the supplier ID and name of both dataframes. Then the same with the channel data frame (sales_channel_name, sales_channel_id) and merge this with the sales_channel_id in the database dataframe and show the name instead. Any help would be appreciated, thank you!
@akinsikuelizabeth5780
@akinsikuelizabeth5780 4 жыл бұрын
Superb!!! I got Evey explanation, thanks
@dataschool
@dataschool 4 жыл бұрын
You're welcome!
@amrita301157
@amrita301157 2 жыл бұрын
This is one of the best ever videos on pandas functions that I have watched. Well done Data School. I will look forward to more such videos.
@dataschool
@dataschool 2 жыл бұрын
Thank you so much! 🙏
@sanjay123644
@sanjay123644 3 жыл бұрын
Excellent way of teaching. Thanks Kevin
@dataschool
@dataschool 3 жыл бұрын
Glad it was helpful! 🙌
@omidadib5052
@omidadib5052 3 жыл бұрын
Awesome tutorial, Thank you very much man!
@dataschool
@dataschool 3 жыл бұрын
You're welcome!
@anahata1710
@anahata1710 4 жыл бұрын
*Python по скайпу. Научу мыслить нестандартно. Решаем задачки, строим утилиты, игры. Data Science и всё, что с эти связано. Телега у меня в контактах. Напиши мне*
@osmanhussein3893
@osmanhussein3893 3 жыл бұрын
This is very helpful. Thank you so much.
@dataschool
@dataschool 3 жыл бұрын
You're very welcome!
@bilalahmad9177
@bilalahmad9177 3 ай бұрын
You are a great instructor. I have learned a lot from you regarding pandas. The video with title "How do I merge DataFrames in pandas?" has left some queries in my mind. I would be thankful to you if you clear those too. What type of join is used here movie_ratings = pd.merge(movies , ratings)? if it is inner join it should result in 1682 rows in total in movie_ratings dataframe, as movies dataframe has 1682 rows. But in video i have observed that movie_ratings results in 100,000 rows of data.
@vipinamar8323
@vipinamar8323 3 жыл бұрын
Nice teaching method. precision over pace.
@dataschool
@dataschool 3 жыл бұрын
Glad it was helpful!
@svengunther7653
@svengunther7653 4 жыл бұрын
You are doing a really great job with this. Thank you so much! :)
@dataschool
@dataschool 4 жыл бұрын
Thanks!
@bharatapar3937
@bharatapar3937 4 жыл бұрын
Hi , if there is a duplicate record in second dataframe (like 'Houston' in below case) and i want to print only one Houston in the final output after doing outer join (as only one Houston is present in first data frame) but its not happening and in the final output after outer join, Houston is repeating in left dataframe also as mentioned below. Pl see the below test data. Pl help. DataFrame -1 ============= ID1 City Population 1 CHICAGO 3000 5 HOUSTON 14000 7 NEW JERSEY 18000 7 NEW JERSEY 20000 DataFrame -2 ============= ID2 City POPULATION 4 ARIZONA 2000 5 HOUSTON 3000 5 HOUSTON 4000 5 HOUSTON 5000 7 NEW JERSEY 3000 8 MICHIGAN 4000 det = pd.merge(df1,df2,left_on=['ID1'],right_on=['ID2'],how='outer', indicator='indicator',suffixes=('_A','_B')) Actual Output: ============= ID1 City_A Population ID2 City_B POPULATION indicator 0 1.0 CHICAGO 3000.0 NaN NaN NaN left_only 1 5.0 HOUSTON 14000.0 5.0 HOUSTON 3000.0 both 1 5.0 HOUSTON 14000.0 5.0 HOUSTON 4000.0 both 1 5.0 HOUSTON 14000.0 5.0 HOUSTON 5000.0 both 4 7.0 NEW JERSEY 18000.0 7.0 NEW JERSEY 3000.0 both 5 7.0 NEW JERSEY 20000.0 7.0 NEW JERSEY 3000.0 both 6 NaN NaN NaN 4.0 ARIZONA 2000.0 right_only 7 NaN NaN NaN 8.0 MICHIGAN 4000.0 right_only Expected Output: ============ ID1 City_A Population ID2 City_B POPULATION indicator 0 1.0 CHICAGO 3000.0 NaN NaN NaN left_only 1 5.0 HOUSTON 14000.0 5.0 HOUSTON 3000.0 both 2 5.0 NaN NaN NaN HOUSTON 4000.0 both 3 5.0 NaN NaN NaN HOUSTON 5000.0 both 4 7.0 NEW JERSEY 18000.0 7.0 NEW JERSEY 3000.0 both 5 7.0 NEW JERSEY 20000.0 NaN NaN NaN both 6 NaN NaN NaN 4.0 ARIZONA 2000.0 right_only 7 NaN NaN NaN 8.0 MICHIGAN 4000.0 right_only
@vijayreddy1730
@vijayreddy1730 4 жыл бұрын
Hi Kevin , First of all thanks for the wonderful lecturer , I am facing a problem to merge two data frames which i have shown you below .. Data frame 1: BackupServer BackupDay StartDate ClientName BackupStatus Backup re-run(Y/N) Incident Reason for the Backup Failures Backup Final Outcome RGSIBAK004 01-05-2020 2020-04-30 06:40:29 RGBPLNM110 Completed NaN NaN NaN NaN RGSIBAK004 01-05-2020 2020-04-30 06:53:07 RGPIAPP037 Completed NaN NaN NaN NaN RGSIBAK004 01-05-2020 2020-04-30 15:32:38 RGPIISD001 Failed Yes IN893523 VM disconnected Failed RGSIBAK004 01-05-2020 2020-04-30 18:00:08 RGPPFTP005 Completed NaN NaN NaN NaN RGSIBAK004 01-05-2020 2020-04-30 18:00:02 RGPQWEB069 Completed NaN NaN NaN NaN Data Frame 2 : BackupServer BackupDay StartDate Client Name Backup Status Backup Rerun (Y/N) Incident Failures Backup Final Result RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpqbda112.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc051.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc050.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpppcc011.fdnet.com Activity completed successfully. NaN NaN NaN NaN RGPIAUN003.FDNET.COM 2020-05-01 Thu Apr 30 21:00:03 EDT 2020 rgpdbda105.fdnet.com Activity completed successfully. NaN NaN NaN NaN Although the two data frames have three column names "Backupserver" , "Backupday" and start date ...the content in the columns is different and i am not able to merge these two data frames into one ? Can you help me on this?
@sch0ll1
@sch0ll1 3 жыл бұрын
Thanks man! You saved my weekend :*
@dataschool
@dataschool 3 жыл бұрын
Glad I could help!
@shashi_kamal_chakraborty
@shashi_kamal_chakraborty 2 жыл бұрын
Thanks! very nicely explained. Now, I can perform joins using Pandas, quite effortlessly.
@dataschool
@dataschool 2 жыл бұрын
Glad it helped!
@shashi_kamal_chakraborty
@shashi_kamal_chakraborty 2 жыл бұрын
@@dataschool Yeah! beside books, I follow you, especially for Pandas. Great help. Thanx...
@dataschool
@dataschool Жыл бұрын
You're welcome!
@JunaidInHenan
@JunaidInHenan 4 жыл бұрын
above logic is beautifully explained, hi kevin, i have a question if you could please reply, I have three csv files csv1(20000 rows), csv2(20000 rows),cvs3(20000 rows), i want to merge these files into single data frame without losing a single record? Like i want to read these files into a one data frame that should have 60000 rows ideally. P.S: All the files have same columns (PostID, time, tweetURL, Content, RetweetNum , LikeNum, CommentsNum, Verified, Following, Follower). And in the resulting data frame i want to have all these columns at once as heading and want all 60000 rows. Is it possible ? kevin i will wait for your reply man, i know this post is old, maybe your read my question. THANK YOU
@jqts6490
@jqts6490 3 жыл бұрын
Thanks for the video. I was able to successfully meagre and find some errors from Ids I did not find using VBA vlookups. I was curious. Is there a way to highlight difference between columns in this merged database. example: Number of Vehicles_SS: 7 vs Number of Vehicles_SA: 2 and it would highlight the row, or even just those those values, base on the ID it was merged on? I am having a hard time find this. Trying to get rid of VBA, which i have doing this, But it is SUPER slow with the data I have to process.
@vinayakchikkorde8151
@vinayakchikkorde8151 3 жыл бұрын
I have the source file and target file. so in that, I have to compare 140 columns and show the result if it matches or not. for example, there is a column as Country1 in source and in target as Country2. to compare that i will use if(source['country1]==target['country2])return True else return false. to compare 140+ columns it will take time to compare 140 columns. and in both of the file columns are not in ordered. so how can I solve this?
@gregf9160
@gregf9160 4 жыл бұрын
Thank you so much for the concise clear explanation. Much appreciated.
@jeevakumara5599
@jeevakumara5599 2 жыл бұрын
hi bro, I am currently working in a project. The mentors says that use foreign keys and primary keys in pandas and create table with the keys. so my question is, the usage of foreign and primary keys in pandas is possible or if we can't what shall I do to merge the two tables contains the same column which we are doing in the MYSQL coding. Thank you.
@mochammadirfanbaihaqi279
@mochammadirfanbaihaqi279 3 жыл бұрын
Love the way you explain it, thanks for your vids. Keep it up (thumbs)
@cgpmth6449
@cgpmth6449 2 жыл бұрын
How to merge multiple large dataframes in a fast way? I joined with usual merge() but it seems too slow. I found a clue of using pandas.Index() with the merge method, but i don't know how to use it.
@hardikvegad3508
@hardikvegad3508 4 жыл бұрын
Sir if we have hundreds of columns without the name. Then how can we name them using pandas and a for loop or lambda function because if we try to name them using names=[] it will be a very time-consuming process. The name of the columns can be col1, col2 , col3...etc.
@mehnazjabeen
@mehnazjabeen 2 жыл бұрын
How to verify if all the columns are incorporated in the merged DataFrame by using simple comparison Operator in Python after merging two DataFrame?
@hectoralvarorojas1918
@hectoralvarorojas1918 4 жыл бұрын
Great work as allways. Very useful. Thanks for sharing it! By the way, any chance you get some video done about PySpark? It will be very usefull to treat this from the biginning considering examples based on a local connection (one computer) first and then a couple of examples emulating a cluster connection.
@dataschool
@dataschool 4 жыл бұрын
Thanks for your kind words as always, Hector! Sorry, I don't have any videos about PySpark, but I appreciate the suggestion! 👍
@hectoralvarorojas1918
@hectoralvarorojas1918 4 жыл бұрын
@@dataschool I would love for you to do that. I am possitive that you will get a lot of interested guys, among them me of course. My best regards!
@BHARATHEEYUDU.
@BHARATHEEYUDU. 4 жыл бұрын
I looking python data analyst jobs What are the prerequisites tools and technologies should I learn django django is must be for pandas data analysis Please advise me
@АлексейДуховный-ф1г
@АлексейДуховный-ф1г 3 жыл бұрын
Единственный англоговорящий человек, которого можно понять не зная английский
@ruthliganad8274
@ruthliganad8274 3 жыл бұрын
how about not a specific file? for example all .csv or all .tsv file? how to concatenate a header to that file? Thanks
@eliasaudi2877
@eliasaudi2877 2 жыл бұрын
What would we use to show ONLY all the values that do not match ? .... i.e. anything other that inner join
@CristianBittel
@CristianBittel 4 жыл бұрын
Great as teacher, calm, taking your time to clearly explain fundamentals!
@dataschool
@dataschool 4 жыл бұрын
Thanks so much for your kind words, I truly appreciate it!
@shivamsaway6803
@shivamsaway6803 4 жыл бұрын
Does it happen while merging two data frames, only heads get to merge, No data get merged inside the new data frame?
@ramachalprajapati1176
@ramachalprajapati1176 3 жыл бұрын
How to get the common mobile number from two different csv file having the different column name
@vighneshmane2080
@vighneshmane2080 4 жыл бұрын
can you explain how to print the row header and column header if I have particular condition In the row? EX: day1 day2 day3 day4 day5 place 1 2 5 6 7 8 place 2 1 1 1 6 8 place 3 2 3 5 10 11 the condition here is place with more the 5 unit at particular day? OUTPUT I need is place1 day2 place2 day4 place 3 day3 help me with this
4 new time-saving tricks in pandas
14:51
Data School
Рет қаралды 46 М.
Pandas functions: merge vs. join vs. concat
16:15
Mısra Turp
Рет қаралды 26 М.
отомстил?
00:56
История одного вокалиста
Рет қаралды 7 МЛН
OYUNCAK MİKROFON İLE TRAFİK LAMBASINI DEĞİŞTİRDİ 😱
00:17
Melih Taşçı
Рет қаралды 12 МЛН
How do I use the MultiIndex in pandas?
25:01
Data School
Рет қаралды 174 М.
How to Reshape Dataframes | Pivot, Stack, Melt and More
12:49
Mısra Turp
Рет қаралды 25 М.
How to combine DataFrames in Pandas | Merge, Join, Concat, & Append
13:40
How do I apply a function to a pandas Series or DataFrame?
17:58
Data School
Рет қаралды 202 М.
Python Pandas Tutorial 9. Merge Dataframes
7:41
codebasics
Рет қаралды 246 М.
How do I make my pandas DataFrame smaller and faster?
19:06
Data School
Рет қаралды 66 М.
What do I need to know about the pandas index? (Part 1)
13:37
Data School
Рет қаралды 134 М.
Pandas Functions: Apply vs. Map vs. Applymap
11:53
Mısra Turp
Рет қаралды 26 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 271 М.