Your pandas questions answered!

  Рет қаралды 69,483

Data School

Data School

Күн бұрын

In this video, I'm answering a few of the pandas questions I've received in the KZbin comments:
0:18 When reading from a file, how do I read in only a subset of the columns or rows?
2:53 How do I iterate through a Series or a DataFrame?
4:24 How do I drop all non-numeric columns from a DataFrame?
6:03 How do I know whether I should pass an argument as a string or a list?
SUBSCRIBE to learn data science with Python:
www.youtube.co...
JOIN the "Data School Insiders" community and receive exclusive rewards:
/ dataschool
== RESOURCES ==
GitHub repository for the series: github.com/jus...
"read_csv" documentation: pandas.pydata.o...
"iterrows" documentation: pandas.pydata.o...
"select_dtypes" documentation: pandas.pydata.o...
"describe" documentation: pandas.pydata.o...
== LET'S CONNECT! ==
Newsletter: www.dataschool...
Twitter: / justmarkham
Facebook: / datascienceschool
LinkedIn: / justmarkham

Пікірлер: 120
@sinjini3189
@sinjini3189 4 жыл бұрын
I have watched several data science tutorial videos, but none of them explained the concepts as clearly as you did. Thank you so much!
@scottlucas3710
@scottlucas3710 7 жыл бұрын
You always come up with solid, well explained answers before I even ask the question
@dataschool
@dataschool 7 жыл бұрын
Thanks! I enjoy answering these questions.
@hemantsah8567
@hemantsah8567 4 жыл бұрын
I would blame youtube algos for not suggesting your channel when I needed the most. Now, I can understand these things better . Thanks Kevin........
@dataschool
@dataschool 4 жыл бұрын
You're very welcome!
@kissnicky7001
@kissnicky7001 5 жыл бұрын
the best teacher in Data Science
@dataschool
@dataschool 5 жыл бұрын
Thank you!
@dineshsingh9309
@dineshsingh9309 7 жыл бұрын
Hello, Nice work done by Data School. I have few questions on working with files. This time let us consider a CSV file 1.How can I read a csv file of size more than 300 mb to 10 gb file? 2. How many different ways avaliable to work with very large files ? 3. Is using chunk size is nessary ?? I would request you to make a detail video on working with very large size file and tag me in so i can gain indeep understanding. Hats off to Data School, Thank You again.
@dataschool
@dataschool 7 жыл бұрын
Great questions! I will certainly consider them for a future video. I will tag you in the comments if and when I release it.
@dwikisetyawan1848
@dwikisetyawan1848 5 жыл бұрын
i never skip the ad, just for you my man
@dataschool
@dataschool 5 жыл бұрын
Ha! 😄
@balajirajaram9512
@balajirajaram9512 3 жыл бұрын
Excellent video as always ❤❤ For the question, how to drop every non numeric columns in pandas? We can also use the dataframe._get_numeric_data() to select only the numeric columns from the dataset.
@awaraamin6850
@awaraamin6850 8 жыл бұрын
You are great man! Thank you
@dataschool
@dataschool 8 жыл бұрын
Thanks!
@NicatBehbudov
@NicatBehbudov 6 жыл бұрын
You are amazing sir! One question. How can I keep practicing everything you taught? Do you have like 'challenges' or 'small projects' where you ask the question and we try to solve the problem? Any suggestions? I keep doing what you teach after watching the video but still, it would be great to have a practice ground on pandas. Thank you!
@dataschool
@dataschool 6 жыл бұрын
That is an excellent question! I am working on a pandas course now that includes additional exercises... stay tuned! One idea for practice in the meantime is to pick a dataset on Kaggle Datasets, and either download it and use it for practice, or create a Kaggle Kernel to practice it online. Hope that helps!
@lucassantana7511
@lucassantana7511 6 жыл бұрын
First I click on the "Like" button, then I watch the video. U r great, man. Thanks a lot for these videos!!
@dataschool
@dataschool 6 жыл бұрын
Ha! Thanks so much :)
@nikhilj.206
@nikhilj.206 4 жыл бұрын
That's exactly what I was about to type
@MrPaglynn
@MrPaglynn 7 жыл бұрын
Excellent channel. Your work is very good. Used some of your tips in my code and getting a deeper understanding of it. Thank you.
@dataschool
@dataschool 7 жыл бұрын
Great to hear! Thanks for your kind comments!
@pgupta24
@pgupta24 3 жыл бұрын
Excellent ! good explanations.
@dataschool
@dataschool 3 жыл бұрын
Many thanks!
@surajkhanna1129
@surajkhanna1129 4 жыл бұрын
Hey! awesome Tutorial. one doubt though, If I have a column which is supposed to be numeric in nature and by-mistake if there is a string value in one of the rows in that column, it treats the column as non-numeric, so while using the select_dtypes(include=[np.number]), there might be data loss why does python considers the column which has maximum numeric values and very less non-numeric values , as non-numeric?
@lonewolf2547
@lonewolf2547 6 жыл бұрын
Really gr8 video....So informative.....DUDE.....u are really good
@dataschool
@dataschool 6 жыл бұрын
Thanks very much for your kind words!
@shankarkr1603
@shankarkr1603 6 ай бұрын
Only fectch the required columns command is not working for me
@mazkaibil9108
@mazkaibil9108 5 жыл бұрын
Hello, i love your videos! I was wondering whether there is a series of videos for plotting? Also, how to display the data values for each data plot on a line plot? Thank you! 😃
@dataschool
@dataschool 5 жыл бұрын
Thanks for your suggestion! I don't have a series on plotting right now.
@MrDavisv
@MrDavisv 6 жыл бұрын
I love your tutorials! Can you make a video on joins (inner,left, full) in Pandas? Also, could you provide an example of lookup dataframes as an alternative to joins as well?
@dataschool
@dataschool 6 жыл бұрын
Thanks for your suggestion! And thanks for your kind words :)
@rexster_v5624
@rexster_v5624 4 жыл бұрын
Hi ! i just have this one small question, how do you check your pandas version ? i have tried a lot but I could n''t find an answer.
@ashishgavit1661
@ashishgavit1661 3 жыл бұрын
1.Find the version of the Pandas running on any system. # importing pandas as pd import pandas as pd # Check the version print(pd.__version__) 2. Find the version of the dependencies for the given version of the Pandas running on any system. # importing pandas as pd import pandas as pd # Check the version of the dependencies pd.show_versions()
@akhilarayapati5292
@akhilarayapati5292 4 жыл бұрын
Helloo...thank you for sharing your knowledge.... I had a question...when i filtered my excel into only 2columns(say salary and country) from many then i want to iterate through salary with different conditions like inbetween 2000 like that for particular countries say India,US lika that..... And should have particular count for each country can you please help me with this problem..how to get this output ?
@suren6885
@suren6885 4 жыл бұрын
Question: 1. How to choose between Pandas and Spark 2. Why Spark doesn't have these cool features of Pandas. Request: Can you please do similar tutorial videos for PySpark
@namratapatil5181
@namratapatil5181 4 жыл бұрын
Hii...I am here to learn pandas.u teach amazing , I am watching Ur videos but now I am confuse as u make two playlist on pandas , so could you please tell me which one that I should to follow to learn pandas fully?
@yuliagorodetskaya3855
@yuliagorodetskaya3855 7 жыл бұрын
Hello, first, thank you very much for your great classes!! I would like to ask some questions about sorting and filtering the Date type data (like 10/07/2006). How do I sort all columns by date column? and how do I apply filter, for example, by month 02? Thank you, Julia
@dataschool
@dataschool 7 жыл бұрын
You're very welcome! To answer your question, the first step would be to convert the date column to datetime type which I explain in this video: kzbin.info/www/bejne/r3TKe3qpnJWLl5Y Once that step is done, you can sort the DataFrame using the 'sort_values' method, which I explain here: kzbin.info/www/bejne/sIqXlaJ8a92Grrs And, you can filter using the dt.month attribute of the date column. Filtering is explained here: kzbin.info/www/bejne/aHKpeIOag9NnfK8 Hope that helps!
@yuliagorodetskaya3855
@yuliagorodetskaya3855 7 жыл бұрын
Thank you very much!!! Yes, its helped!!! :))
@deep6858
@deep6858 2 жыл бұрын
Hi I am new to python and for the first time deploying python code to AWS lambda. I zipping dependencies on windows 10 machine and uploading dependencies for pyodbc and pandas but the code is not recognizing both modules on AWS, though working on local windows machine. Does installing and zipping dependencies MUST be done on Linux . Thanks
@dataschool
@dataschool 2 жыл бұрын
I'm not sure, sorry!
@gabrieladias799
@gabrieladias799 5 жыл бұрын
Hi teacher kevin, I would like to know why my output for the code is not in a tuple like yours. Code --> In[ 1 ] for index, row in iterrows: print(index, row.City, row.State) Out[ 1 ] 0 Ithaca NY 1 Willingboro NJ 2 Holyoke CO 3 Abilene KS
@dataschool
@dataschool 5 жыл бұрын
It's a difference in Python 2 vs 3.
@KhalilYasser
@KhalilYasser 3 жыл бұрын
You can make it as tuple using `print((index, row.City, row.State))`
@michaelangellotti5741
@michaelangellotti5741 5 жыл бұрын
Excellent explanations. Simple and short.
@dataschool
@dataschool 5 жыл бұрын
Thanks!
@truthseekerbeast4188
@truthseekerbeast4188 5 жыл бұрын
For anyone who wants to practice pandas, here is everything you need github.com/guipsamora/pandas_exercises
@dataschool
@dataschool 5 жыл бұрын
Thanks for sharing!
@bhavanishanker5884
@bhavanishanker5884 7 жыл бұрын
At the outset, I would like to offer my encomiums to your illustrative lectures. I have a question - How do I Match Merge two data frames in Python (Not Inner, outer, left or right), The merge should resemble like Match Merge in SAS , Please explain
@dataschool
@dataschool 7 жыл бұрын
I'm not familiar with "match merge", sorry!
@navalnaware5098
@navalnaware5098 5 жыл бұрын
how can we script to create a single excel master workbook xlsx file if we receiving data from different source in different format .it will be great if you can help me on this .... if you want i can share the files to get an idea on it .
@dataschool
@dataschool 5 жыл бұрын
I'm sorry, I won't be able to help you with this - good luck!
@autotestweb7724
@autotestweb7724 7 жыл бұрын
Hey...Great work and effort put in this tutorial..Weldone..keep it up.. please I would like to know if data analysis could be done with .log file using the read_csv in pandas or pandas only works with .txt and .csv files. Most of the files I have to analyse are printer log .log files and I think pandas would make my life a lot easier.
@dataschool
@dataschool 7 жыл бұрын
pandas works with tabular data files. If your log files are tabular (meaning that each row includes the same set of columns), then you can work with them in pandas. How exactly you read them into pandas will depend on their formatting. Hope that helps!
@burcakotlu7858
@burcakotlu7858 5 жыл бұрын
Thank you very much for these well prepared videos. I have a question what if we have varying number of columns and column names that we need to consider in criteria. For example, in run1 we have only there columns to consider with names: sig1. sig2 and sig3 df = df[(df['sig1']==1) | (df['sig2']==1) | (df['sig3']==1)] In run2 we have four columns to consider with names: sig1. sig2, sig3 and sig4 df = df[(df['sig1']==1) | (df['sig2']==1) | (df['sig3']==1) | (df['sig4']==1] Column names and column numbers are resolved during runtime and condition is always the same (==1). Is there way to handle this case? Thanks in advance.
@dataschool
@dataschool 5 жыл бұрын
I think what you need is the "isin" method, shown in this video: kzbin.info/www/bejne/j4GspZmHbZykoK8 Just update the list that you pass to "isin" as needed. Does that help?
@Apollo77392
@Apollo77392 2 жыл бұрын
Hi ,is there any ml teaching videos in your blog?
@dataschool
@dataschool 2 жыл бұрын
I have a free ML course! courses.dataschool.io/introduction-to-machine-learning-with-scikit-learn
@hitsviralonly2215
@hitsviralonly2215 5 жыл бұрын
Hi Friend, Thanks for you video first...i want to print only first 10 columns for index,row in ufo.iterrows(): print(index, row.genre, row.duration).head(10) gives me error
@dataschool
@dataschool 5 жыл бұрын
I'm not sure if it's possible with iterrows, I'm sorry!
@kashishjain78
@kashishjain78 4 жыл бұрын
can you tell about iterators and generators
@dataschool
@dataschool 4 жыл бұрын
That's a bit outside the scope of my teaching, sorry!
@nikolaygerashchenko4873
@nikolaygerashchenko4873 8 жыл бұрын
Thank you for finally making it all clear for me. I have to say that your explanation is really easy to understand if I compare it to machine learning courses I saw and passed before. It would be great if you have some time to answer my question. Is there a way to import an html to pandas dataframe when it has different separators - one space between 1st and 2nd column, 2 spaces between 2nd and 3rd and so on? Thank you in advance.
@dataschool
@dataschool 8 жыл бұрын
+Nikolay Gerashchenko Glad you like the series! Regarding your question, HTML is not a tabular format, and so it's not suitable for directly reading into pandas. But if you did have tabular data in which the separator character varied, you could potentially use regular expressions in the "sep" argument to the "read_table" function to match a variable pattern.
@elilavi7514
@elilavi7514 8 жыл бұрын
Thank you for answering questions ! Very helpful . I have a question about iteration of pandas dataframe : I have a filling that .apply() method with lambda function in it is faster than iterating over panda Series or dataframe with `for` . I think there are some factorization methods to work with pandas . Can you please add on this ? Thanks again !
@dataschool
@dataschool 8 жыл бұрын
+Eli Lavi I would agree with you, that using the vectorized operations that are built into pandas (such as apply) are faster than using 'for' loops. However, there are cases in which 'for' loops are the only option, which is why it's worth knowing about. Thanks for the question!
@abhaygupta7931
@abhaygupta7931 3 жыл бұрын
Sir, amazing content....
@sanjeev4u15
@sanjeev4u15 7 жыл бұрын
awesome explanation...(y)
@dataschool
@dataschool 7 жыл бұрын
Thanks!
@pebre79
@pebre79 3 жыл бұрын
Thanks!
@dataschool
@dataschool 3 жыл бұрын
You are so very welcome, and thank you yet again!! 🙌
@SpokenStuff
@SpokenStuff 6 жыл бұрын
How do I read tabular data via API service call into pandas?
@dataschool
@dataschool 6 жыл бұрын
I would make the API call, save the result, and then read that result into pandas. If the result is JSON, for example, I think pandas has a read_json function.
@awaisahmad5301
@awaisahmad5301 4 жыл бұрын
i love your tutorials sir.i want to ask in data science what skills we need for data cleaning in python ?
@dataschool
@dataschool 4 жыл бұрын
That's a great question, though a bit broad... my best suggestion is just to find case studies of data cleaning and examine them for insights. Hope that helps!
@harshinde
@harshinde 8 жыл бұрын
Hi kevin, Great video series. In relation to the first question, I was wondering if there was a preferred or a neat way to sample a huge file into a dataframe? The first nrows you choose might not be representative of the entire file contents and how could one sample a file into a dataframe to get a sense of the type of data.
@dataschool
@dataschool 8 жыл бұрын
Great question! I'll try to cover that in detail in a future video :)
@nikhilj.206
@nikhilj.206 4 жыл бұрын
Great work!
@dataschool
@dataschool 4 жыл бұрын
Thanks!
@spurthishetty6834
@spurthishetty6834 5 жыл бұрын
Awesome!
@dataschool
@dataschool 5 жыл бұрын
Thanks!
@townheadbluesboy
@townheadbluesboy 7 жыл бұрын
Your videos are great clear and easy to understand. How can you select random rows from a dataframe?
@dataschool
@dataschool 7 жыл бұрын
Glad you like the videos! There is a DataFrame sample method for this purpose: pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sample.html
@ahowl7mx
@ahowl7mx 8 жыл бұрын
Learned python pandas with your tutorials over the weekend. Thanks so much! The lesson about the and/or logic with the booleans was quite helpful but I'm looking to spin it the following way: How do I "print the filename" of certain files in a directory if they contain a string? By that I mean, I want to run code that only impacts some of the csv files in my directory :). import glob import pandas as pd path = r'C:\Users\Desktop\Experiment\' allFiles = glob.glob(path + "/*.csv") & (search filename 'AMX_error' = true) #
@dataschool
@dataschool 8 жыл бұрын
Glad the videos have been helpful to you! I think I understand what you are trying to accomplish. It seems like all you need to do is open each file within your for loop, and then perform whatever check/operation you desire within the for loop. Does that help? Or maybe I'm misunderstanding what exactly you are struggling with?
@ahowl7mx
@ahowl7mx 8 жыл бұрын
Specifically for each filename in the folder I have to search for a couple strings. Like the filename will have an "AMX" or "AR" in it, and it will also have a date (..Daily_AMX_2016-08-07_...). Tricky because I have to convert that string into a date format. Then for every file add a column with "AMX" or "AR" depending on the filename plus add a column with the converted date. Would be a great trick to know :) I think it would use the re.search.xxx function but not sure.
@dataschool
@dataschool 8 жыл бұрын
Yes, regular expressions would be useful for that task!
@PradeepKumar6
@PradeepKumar6 8 жыл бұрын
Great !!! Awesome.Thanks for this. One question though, Is it possible to select(or drop) a numeric( or character) columns in a range of columns. For example , If I have a dataframe with a,b,c,d,e,f,g,h,i columns in order and let us assume a,d,e and i are numeric columns. Now i want to drop only numeric columns exist between c and g (drop d & e). One way that by using usecol approach and typing the names of variables which i want to keep but imagine if this list is large then it becomes cumbersome and lot to type. Is there any efficient way to do it in pandas. Thanks
@dataschool
@dataschool 8 жыл бұрын
+Pradeep Kumar You're very welcome! To answer your question, you can select a range of columns by name using the "loc" method. For example: df.loc[:, 'c':'f'] You can select a range of columns by position using the "iloc" method. For example: df.iloc[:, 2:6] For any given scenario in which you want to drop (or keep) certain columns, you can chain together a combination of loc, iloc, drop, select_dtypes, and perhaps other methods to achieve the desired result. The optimal method depends on the exact scenario. I'll cover loc and iloc in an upcoming video :) Hope that helps!
@PradeepKumar6
@PradeepKumar6 8 жыл бұрын
+Data School Thank You !!!
@dataschool
@dataschool 8 жыл бұрын
FYI, I just posted a video about loc and iloc, if you want to learn more: kzbin.info/www/bejne/rqfTf3Rtl6hrmdU Enjoy! :)
@NishantKumar-cg6tw
@NishantKumar-cg6tw 7 жыл бұрын
How to compare common series in two data frame in Pandas?
@dataschool
@dataschool 7 жыл бұрын
If you want to check for equality, you can use the Series method 'equals': pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.equals.html
@naveenbandi9193
@naveenbandi9193 6 жыл бұрын
Sir can u plz upload videos on matplotlib
@dataschool
@dataschool 6 жыл бұрын
Thanks for your suggestion!
@pacrii
@pacrii 8 жыл бұрын
Have you shown the different ways to make a DataFrame? Also what are Panels? I don't know of any reason to use them, so I'd appreciate your input.
@dataschool
@dataschool 8 жыл бұрын
+pacrii I haven't yet covered different ways to create DataFrames, but that sounds like a great idea. Thanks! I have heard of Panels but have never used them... I'll check into it...
@pacrii
@pacrii 8 жыл бұрын
Many thanks.
@dataschool
@dataschool 8 жыл бұрын
I know you asked this question about creating DataFrames months ago, but I finally made a video to answer it: kzbin.info/www/bejne/Y4DZYoFnlKuVhpo Hope that helps! :)
@pacrii
@pacrii 8 жыл бұрын
Thanks for responding. Brilliant :)
@vishenmaharaj251
@vishenmaharaj251 7 жыл бұрын
Great videos!! thank you
@dataschool
@dataschool 7 жыл бұрын
You're welcome!
@husseinsaad7969
@husseinsaad7969 6 жыл бұрын
You are awesome man!
@dataschool
@dataschool 6 жыл бұрын
Thanks!
@scottlucas3710
@scottlucas3710 8 жыл бұрын
GREAT JOB !!
@dataschool
@dataschool 8 жыл бұрын
Thanks!
@hadesishykaru
@hadesishykaru 7 жыл бұрын
Thanks a lot!
@dataschool
@dataschool 7 жыл бұрын
You're welcome!
@zjzhuang2981
@zjzhuang2981 7 жыл бұрын
hi! how can i add series to an empty dataframe as rows?
@dataschool
@dataschool 7 жыл бұрын
I'm sorry, I don't understand your question. Could you clarify what you are trying to accomplish? Thanks!
@zjzhuang2981
@zjzhuang2981 7 жыл бұрын
I had solved my problem, thanks though!
@pacrii
@pacrii 8 жыл бұрын
Really sleek :)
@dataschool
@dataschool 8 жыл бұрын
+pacrii Thanks!
@gcm4312
@gcm4312 8 жыл бұрын
Great format!
@dataschool
@dataschool 8 жыл бұрын
+Gian Carlo Martinelli Glad you like it! I'll plan to do more like this :)
@peacekeepermoe
@peacekeepermoe 2 жыл бұрын
1:52 I think you meant usecols=[0, 3] I thought I did something wrong haha. Great tutorials, very clear and concise. I am very grateful to have come across your channel and videos. God bless you!
@dataschool
@dataschool 2 жыл бұрын
Thanks for your kind words!
@im4485
@im4485 3 жыл бұрын
Lol. Podliza must be Russian or Russian speaking....funny word
How do I use the "axis" parameter in pandas?
8:34
Data School
Рет қаралды 82 М.
How do I make my pandas DataFrame smaller and faster?
19:06
Data School
Рет қаралды 66 М.
РОДИТЕЛИ НА ШКОЛЬНОМ ПРАЗДНИКЕ
01:00
SIDELNIKOVVV
Рет қаралды 3 МЛН
Every parent is like this ❤️💚💚💜💙
00:10
Like Asiya
Рет қаралды 18 МЛН
Learning Pandas for Data Analysis? Start Here.
22:50
Rob Mulla
Рет қаралды 102 М.
Python Pandas Tutorial (Part 7): Sorting Data
15:40
Corey Schafer
Рет қаралды 188 М.
More of your pandas questions answered!
19:24
Data School
Рет қаралды 28 М.
What do I need to know about the pandas index? (Part 1)
13:37
Data School
Рет қаралды 134 М.
How do I select multiple rows and columns from a pandas DataFrame?
21:47
How do I handle missing values in pandas?
14:28
Data School
Рет қаралды 197 М.
Loop / Iterate over pandas DataFrame (2020)
11:05
Chart Explorers
Рет қаралды 83 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 271 М.
How do I avoid a SettingWithCopyWarning in pandas?
13:30
Data School
Рет қаралды 44 М.
How do I use the MultiIndex in pandas?
25:01
Data School
Рет қаралды 174 М.