Albandy... Love it. Thank you for NOT editing out the mistakes. It makes it more real world and useful.
@miguelcastillo17425 жыл бұрын
I started to research about Machine Learning a couple months ago, it was so confusing to figure out where or how to start. After hours of researches I found an answer suggesting that the best way to start on ML is to understand Data Analysis first, and this one of the best and complete series I found on YT. I really appreciate it, man. :D
@marisagonzalez6794 жыл бұрын
Just started your Data Analysis Tutorials and I am loving them so far! You are a really talented instructor and have this charismatic energy that makes the learning process more enjoyable. Thank you very much for sharing your knowledge with us!
@atrumluminarium5 жыл бұрын
"pandas at the end of the day, Can be a multidimensional array, And how might that happen, by the way" Damn... Sentdex dropping those lit rhymes
@dompatrick81144 жыл бұрын
BARS
@atrumluminarium3 жыл бұрын
@Kingsley Ronnie thx but nobody cares
@extrememike4 жыл бұрын
Why someone would dislike this tutorial?. I replayed at least twice and re winded a couple of times and end up learning. That's what matters right. Thanks!
@siddhantmittal11575 жыл бұрын
Every time when I decide to learn something new Sentdex is there Keep up the good work sentiiii...
@hfbvmARF5 жыл бұрын
sentdex is on a roll with these tutorials. first Django. now data analysis
@Skull3r11219905 жыл бұрын
I just wanted to thank You for your Tutorials. Especial for the "older" Finance and visualisation one. They helped me a lot to create my own little Stockanalysis Program for my exam. I also recommented your Channel to my professor :D I´d be happy if you could do a new one on finance and stock analyse tutorial maybe without machine learning, more like the Tkinter Tutorial
@sentdex5 жыл бұрын
Thank you for the recommendation and kind words! Not sure about doing more finance stuff, but maybe. I just really enjoy applying ML at the end of the day to finance data.
@eugeniosp34 жыл бұрын
Agreed you’re a badass señor
@shaoboji4375 жыл бұрын
First, thanks for the video. The join consumes lots of ram is because for each date there are two rows (because of the type field), so each join will double the rows per date, I think.
@vinayakgosale84705 жыл бұрын
Quick tip: if you put a semicolon at the end of your .plot(..) functions, you will no longer see the matplotlib object printed in the output area of Jupyter just above the figures. Hope this is useful.
@sentdex5 жыл бұрын
Thank you for the info!
@ahmedhany50375 жыл бұрын
His videos are just GREAT ! Definitely the best youtuber out there !
@sentdex5 жыл бұрын
Thanks!
@SuiGio5 жыл бұрын
Sentdex. Been watching you since you had 1000 subs. I loved your content since.!
@justinsiegel23575 жыл бұрын
I love when you stop to take a sip from your always interesting coffee mugs! Also your videos are really helpful. Thank you!
@ihorchernin4174 жыл бұрын
Your cups are so funny :) I thought you used panda cup because it is tutorials about pandas but you changed this :D Great tutorials!
@ppp1364ppp5 жыл бұрын
pls never stop posting videos. great job. I learnt alot from your channel
@sentdex5 жыл бұрын
No plans to stop any time soon :D
@bluejimmy1684 жыл бұрын
at 18:14, graph_df.join(region_df[f'{region_price25ma'}]) is a left join? does it mean that the data on the right with different dates than the left will get dropped? It implies that all region data have the same date? Couldnt one region have more data(like from 2000 - 2019 and other one region is from 1996-2017)? Example, lets say California has no data on 9-15-2020 but Oregon has data on 9-15-2020 then the join will drop data from Oregon because left join. Is pandas default join a left join? Thanks.
@nito__moreno2 жыл бұрын
What is funny about this is that since many people are using your tutorials, Copilot is filling the code automatically with your exercises!
@richardbennett43654 жыл бұрын
Good tutorial. Easy to follow. Humorous. A little strange with the English, though. "I don't know why . . . RAM is exploding, but I know why. It's because of the date." I had to go back and listen to that sentence a few times to figure out what the instructor meant exactly.
@masmoudi55955 жыл бұрын
i didnt understand what's the problem with type column in 21:00
@siddharthdhakane53415 жыл бұрын
Another one so fast !! I'm loving this.
@gt95385 жыл бұрын
hey Sentdex, I forgot to put it in the limit [:16] and I think it blows the memory because you are basically loading a full outer join into memory. Due to the multiple dates you keep computing combinations of rows with same dates that are increasing thus *BUUUM* I think.
@v.d.13314 жыл бұрын
guys, the reason that the graph is 'noisy' is because it has 2 dates, one for the type = organic and other for type = conventional. If you filter by type like this you should get a smooth gaph: import pandas as pd df = pd.read_csv("C:/Users/windows 7/Downloads/avocado-prices/avocado.csv") df['Date'] = pd.to_datetime(df["Date"]) albany_df = df[df['region']=='Albany'] albany_df.set_index('Date', inplace = True) albany_df[albany_df['type']=='organic']["AveragePrice"].plot()
@livecoderepeat70365 жыл бұрын
Thanks so much for the tutorial. Looking forward to getting up to video 6!
@firdausizaharagandes54493 жыл бұрын
what a great tutorial, many thanksss setndex!!
@damianshaw84565 жыл бұрын
When you use set_index you can use verify_integrity=True, this will check the uniqueness of the index. Which Is generally what you want out of an index and also when you use join you don't get many-to-many joins which increase your dataframe size exponentially (and explode RAM)
@sentdex5 жыл бұрын
Cool, thanks for the info!
@fb645785 жыл бұрын
"verify_integrity = 'True' " will not work as there is already duplicate records (Dates) an dwill not output the record sets. Apply "verify_integrity = False ", and it will populate the records and speed up.
@damianshaw84565 жыл бұрын
@@fb64578 that's the point, it will throw an error and let you know of duplicates. This prevents you from going any further until you understand why.
@TXfoxie4 жыл бұрын
wow, this tutorial is a brand new experience for me to relearn Pandas. Never notice some simple codes could cause RAM explode and I had my first RAM crash.
@mahagonx5 жыл бұрын
To fix the ram explosion, you removed all non-organic avocados. Is there a way to avoid that huge loss by e.g. creating a new index column "date_organic" which combines the Date and organic information but has all unique entries? (Or another solution?) Thanks!
@smahesh7775 жыл бұрын
It is absolutely not clear why the 'type' was causing the RAM consumption issue and finally we are only plotting the data for 'organic' only and not plotting data for the combined sales. No one has raised a concern about this. So am I missing something here. Should I go through a different playlist before going through this 'Data Analysis' playlist ?
@crowdquest12035 жыл бұрын
great part 2! masterful work as usual.
@pipi_delina4 жыл бұрын
@sentdex when I try to plot graph_df just as you did at 25:36 I get some error that goes like TypeError: no numeric data to plot...
@marx4274 жыл бұрын
this is priceless for my project, love you dude
@fb645785 жыл бұрын
Would you define what is 'PLU'? I don't understand how does it relate to column header 'Type' and why it takes up more RAM. Pls advise. Many thanks.
@chicolucio885 жыл бұрын
I would like to know too
@aaroncapps40574 жыл бұрын
This is the product lookup number. It is the number grocery stores use to categorize produce.
@trexholland53894 жыл бұрын
I dropped the null values and I still get a graph with gap, only this time it has smaller gaps on both ends rather than a single huge one in the beginning, why is that?
@zacogames5 жыл бұрын
Your coffee mugs kill me!
@sohailchoudhry11394 жыл бұрын
I am new to Pandas and Python - and your tutorials are great! I suppose by adding an index on Date and type would have fixed that issue with "memory" as region_df.set_index(["Date","type"],inplace=True) The above will solve the problem with the speed of JOIN but would it return the correct data... I am not sure - will explore it more
@sharmakartikeya3 жыл бұрын
I am having problem understanding from the point you start graphing. 15:26
@nadyamoscow24613 жыл бұрын
Fantastic tutorial. Thanks a lot
@malavtreasurer5 жыл бұрын
How did the date column came to dataframe graph_df?. You have only included region prices. please explain. Hoping for quick reply.
@vigneshsivamani64665 жыл бұрын
the join happens by comparisons with index value(on default which can be changed) so the index will be present and the additional columns specified by us in the second df to be joined are joined with first df by comparison with the index of two dataframe
@1991kgokul4 жыл бұрын
Is there a way to create the albany_df selection from both region and type (ie something like albany_df = (df[df['region'] == 'Albany' and df['type'] == 'organic']) ?
@stailor455 жыл бұрын
How many topics covered/videos do you think will be in this playlist?
@SkysimAir4 жыл бұрын
I liked the first part, but this one here leaves me quite puzzled. If I understood correctly, you kind of dropped the "type" column which leaves you with just a subset of the original data. But what if you wanted to have both "organic" and "conventional" avocados in your data set?
@artiomcopushiu26123 жыл бұрын
When you was fixing a ram problem why did you use only organic type. Turns out you got rid of conventional type and that means big part of the data
@kevinpeeples7265 жыл бұрын
At 25:00 when we have it spit out only the "organic" types my browser still spits out the regions at about a second per region, but when I check the processes my ram is not exploding and my system is running 16 GB at 3000 MHZ. Code is identical, any ideas as to why it is still outputting so slow?
@kevinpeeples7265 жыл бұрын
The rest of the graphing process all runs quick and at the same time as the video, just spitting out the names takes a while.
@franckjh5 жыл бұрын
THANK YOU! i was going to take the course on your web and result u refresh it >D
@milanlora5 жыл бұрын
I didn't really understand the "organic" part. What would've happened if you chose the other object? Are there multiple rows with the same date now?
@sentdex5 жыл бұрын
Yes, there are multiple rows with same date. Organic vs regular. There are two prices, since organic avocados cost more than regular.
@siekphried5 жыл бұрын
@@sentdex So at the end you only ploted "organic" prices? Thanks!
@cuyunal87525 жыл бұрын
Anyone know how to make out plot bigger and clear. If I just assign the size, my plot become bigger as the sizes I assigned, however, it become blur.
@ayanavadasgupta32975 жыл бұрын
At the time of plotting the graph graph_df.plot() I am getting only a single curve... Not that crowded one with so many legends shown in the tutorial... Why?
@asweqrsa4 жыл бұрын
I have the same issue. Could you figure out why that happened?
@avishekdas87245 жыл бұрын
without any doubt you are the most appropriate one to teach data science, 338 For Atlanta total rows ==> 676 For BaltimoreWashington ==> 1352 For Boise total rows ==> 2704 every time it's multiplied by two. What is the calculation behind this? Eagerly waiting for your answer...
@SS-xt5ul5 жыл бұрын
what is rolling? can some one explains?
@ismaelRR4 жыл бұрын
Why did with organic goes more quickly?
@EmanueleCagliero5 жыл бұрын
for some reason, only the last Series is shown at the end, i.e. NewYork_25ma. I cannot figure it out why the join does not work. Pandas version 0.24.2, Python 3.7.3
@Lucas-if8wt4 жыл бұрын
I do not understand why we can just join the region_df on index. Not all dates would overlap right? I mean, for each date the region can be different (some dates it's Albany, on others it is Atlanta, etc.)
@uwuslayer02114 жыл бұрын
i get a gap in the chart even after dropna(), why is that happening?
@rahimhashimov4724 жыл бұрын
As you said, RAM is exploding))). In 19:40, it gives error like: MemoryError: Unable to allocate 84.5 MiB for an array with shape (11075584, 1) and data type float64
@basirucamara16484 жыл бұрын
I have the same problem
@cenkkol87595 жыл бұрын
First of all, thanks for the tutorial. I like your way of teaching very much and I appreciate the effort you put in to make these videos. And I got a question. As far as I know from my sql knowledge isn't it a better way to make an index something unique. Instead of making date index if we had added a custom index, wouldn't it solve the exploding ram problem? And we dont have to discard "type==conventional" rows in that way.
@alikasim9354 жыл бұрын
i am getting this error continously MemoryError: Unable to allocate 84.5 MiB for an array with shape (11075584, 1) and data type float64
@Derek.C5 жыл бұрын
at 25:12 I'm getting this error: TypeError: Empty 'DataFrame': no numeric data to plot graph_df.tail() works fine
@pawanbhatt3144 жыл бұрын
your index would not be set as date or you'll be trying to plot string values, just take integer values column to plot.
@epmusiccover4 жыл бұрын
check the empty part is not empty()
@ashwinrajesh81624 жыл бұрын
The rolling graph without sorting the indices gives me a pretty nice curve on google collab? maybe the dataset changed?
@eyebiofeedback5 жыл бұрын
Great stuff sentdex. Pandas is amazing
@jeremymast9114 жыл бұрын
So whats the dif from using this scenario vs using Pandas.pivot_table?
@adityachakraborty58635 жыл бұрын
Sir. After performing code snippet at 1:42 I am getting the following error Please help I am using Jupyter Notebook
@chaitanyaj32655 жыл бұрын
run it again...it will work
@EduardoWurch5 жыл бұрын
Isn't possible to create the "graph_df" dataframe using the pivot_table method?
@ollie68454 жыл бұрын
MAN I LOVE YOU, YOU ROXXX
@swL19414 жыл бұрын
Why did we do if statements? after the for loop?
@vanessanarayanassamy87045 жыл бұрын
Can you tell me how to install statsmodels in Python. I tried it but it does not work. Any suggestions. It works on Anaconda though.
@kushagrgoyal5 жыл бұрын
what is a PLU? i got very confused as to how the 'type' is causing issues with the date.....
@amkarkare964 жыл бұрын
I think it is Price Look Up to probably references its quality and other attributes
@mikelmendibeabarrategi11022 жыл бұрын
This is great! Thanks :)
@Adhithya20034 жыл бұрын
Legend has it he still spelled albany wrongly and he edited it.
@misharial-essa74075 жыл бұрын
First of all, thanks for such a great effort and tutorials! I am totally new to data science and Python, but I felt that the "join" function is causing the delay and RAM explosion due to redundancy (when I displayed the dataframe I found a lot of redundant dates and values). I used "concat" instead: graph_df = pd.concat([graph_df,region_df[f'{region}_price25ma']],axis=1) and it is running a lot faster and the redundancies are gone. I still might be wrong, so your feedback (or anyone's feedback for that matter) would be really appreciated.
@gbagba812 жыл бұрын
thanks a lot my friend it worked amazing. love
@sajidalam19894 жыл бұрын
Is it all about we have to learn for machine learning engineer ?
@slavoie5 жыл бұрын
Thank you! This was awesome as always ;).
@fatimak64404 жыл бұрын
13:22 that coffee sip 'sounded' satisfying. screw python vizzes, now i need coffee. great vid. thank you.
@MatRIVERAGALVEZERNESTO3 жыл бұрын
Great content! Very well explained with full insights of what to look for and how to fix it! Just a question, even though we choose our organic or conventional type, we still have duplicates in our Dates. Even more, with the full data, we only have 164 unique values out of 18.000+ values. Isn't the problem still there?
@devyanianan1454 жыл бұрын
I got such values under price25ma "AxesSubplot(0.125,0.2;0.775x0.68) ". What's wrong? @sentdex
@LazoCerrado5 жыл бұрын
Sentdex, what do you think about Power BI, is it better to use pandas?
@vikrantmahto90495 жыл бұрын
help this error is coming while using to_datetime ---> AttributeError: 'DataFrame' object has no attribute 'to_datetime"
@manojperumarath82174 жыл бұрын
For me albandy_df.plot() didn't work I changed to import matplotlib.pyplot as plt plt.show()
@yaqiongli49545 жыл бұрын
Thank you for doing this!! I have a question here though. Your moving/rolling average is the average of the first 25 points, so how you sort your data actually matters. If the previous records/rows have different regions, then plotting the rolling mean for different regions doesn't make sense to me. Can someone explain?
@SS-xt5ul5 жыл бұрын
Hello dear sentdex. I cannot work with jupyter , instead I use pycharm, how to plot my data frame? it doesn't show any plot :(
@dopplesoddner28995 жыл бұрын
me too
@jacksnack43655 жыл бұрын
This block makes me so confused. Can you explain it with easy language? if graph_df.empty: graph_df = region_df[[f"{region}_price25ma"]] # note the double square brackets! (so df rather than series) else: graph_df = graph_df.join(region_df[f"{region}_price25ma"])
@siddhantbhoite46675 жыл бұрын
I am not able to import a csv file into jupyter labs
@mystisification5 жыл бұрын
To re-create the df along categories, why don't you use the pivot_table() method instead ? Cool video!
@julianurrea5 жыл бұрын
Also with a groupby could has been done, but is just the 2nd video of the series, there is a lot more to cover about pandas...
@user-nh8ud2cj5s4 жыл бұрын
I have keyerror in pandas using albany region Please tell me how to set it
@alex_87045 жыл бұрын
Thank you for this new wonderful video
@sentdex5 жыл бұрын
Happy to share!
@raghavarora50774 жыл бұрын
Hello sentdex, I want to know how to get a job as an entry-level data analyst. I have have done the Python and DSA course, IBM data science professional certificate which includes data analysis, data visualization, ML and more. How to approach companies and what projects to work upon to get an entry level role?
@shyamskcetlib14095 жыл бұрын
Hi I logged in discord to sentdex but unable to chat what is the reason
@ThePugEngineer5 жыл бұрын
I was just looking for a tutorial to visualize reproduction of a Pug with the 349 known breeds. Seems that I have to wait for the continuation of that series, hopefully heading towards GANs soon!
@vamsivaddadi2224 жыл бұрын
I really lost at the graph_df and the following code so someone explain it please.
@leefriar69915 жыл бұрын
The reason the index of dates was incorrect in plotting, was as the month and day was descending the year was ascending.
@szecek5 жыл бұрын
at 1:20 the command is %matplotlib inline
@dwx82485 жыл бұрын
I think it is better to use %matplotlib notebook instead of %matplotlib inline
@szecek5 жыл бұрын
@@dwx8248 Doesn't work for me. I can't see any plots when using notebook instead of inline.
@dwx82485 жыл бұрын
@@szecek hmm... strange. Could possibly be a version issue? But hey if inline works for you then go with what works. I've just always liked the zoomability and interactivity that comes with notebook
@TXfoxie4 жыл бұрын
graph_df = region_df[[f"{region}_price25ma"]], I understand the double brackets but I don't understand f"{region}, what is this? Re expression? Sentdex didn't explain it at all in his vid. I tried to check regular expression but I don 't understand what is the purpose of f"{region}. Can someone who knows it explain?
@sentdex4 жыл бұрын
It's called an f string. It's a string where you pass vars inside the brackets. More info at the bottom of this page from the basics series: pythonprogramming.net/horizontal-winner-learn-python-3-tutorials/
@TXfoxie4 жыл бұрын
@@sentdex Thank you so much Sensei Sentdex. I could not believe how quickly I forgot all about this f-string formatting. I finished that vid a month ago. I will check more about how to use F-string formatting. I really appreciate your quick response.
@kzr_5674 жыл бұрын
can you please explain the rolling mean thing? i am confused.
@angrymurloc76263 жыл бұрын
Hey this is late, but I just saw your lonely comment and I wanna help. Rolling mean (or moving average) just means you're replacing the value at any index, by the mean value over the last n days (in this video 25). It's called rolling average because you are 'rolling' the 25 day averaging function over the entire dataset.
@RedDetuning4 жыл бұрын
Hello, @Sentdex... I am a beginner in python and your videos are awesome.. I am trying to run this code but have encountered an error and can't figure the problem. graph_df.join (region_df['f{region}_price25ma'] is seems to ask for suffix...how can i deal with this ??..i have include the error also... Anyways thankyou for your videos. def _join_compat(self, other, on=None, how='left', lsuffix='', rsuffix='', Error: columns overlap but no suffix specified: Index([u'{region}_price25ma'], dtype='object')
@Singh_Sahdev Жыл бұрын
I think u r not assigning this value to any thing U should write graph_df=...rest of ur line of code And then check
@elahelashgari85134 жыл бұрын
You man teach it great, thanks
@prudhvibuditi99635 жыл бұрын
Are you going to cover agian machine learning in this series?
@sentdex5 жыл бұрын
At the end, yes, one example to guide into it, then you're expected to follow one of the more ML-dedicated video series.
@MohitSharma-gl6mo5 жыл бұрын
Great series ...
@johnny_kim4414 жыл бұрын
I really dont understand “rolling” can anyone explain?
@berkayislek164 жыл бұрын
As I understand, let's say you gave n=25 for rolling as in the example. You pick 25th item and get an average of that item and previous 24 items (total 25 items). Then you get 26th item and get an average of 26th item and previous 24 items (1-26, total 25 items). By doing this your graph becomes more smooth. This is why in the dataframe if you add a column named 'price25ma' you see NaN value for first 24 items because you gave n=25 to calculate the mean. I hope I could explain, I am also new to these stuff. Good luck! :)
@amkarkare964 жыл бұрын
Its what is used to represent the running set of previous n values. So we could have moving sum of previous 25 values or moving average
@ranjeetjha19455 жыл бұрын
Great to see that as of now there are no dislikes
@sentdex5 жыл бұрын
There will be some.
@Reality11075 Жыл бұрын
Hello any body tell me how we read the folder and all csv file of folder
@ianjj1234564 жыл бұрын
File "", line 11 graph_df = region_df[[f'{region}_price25ma"]] ^ SyntaxError: EOL while scanning string literal Does anyone know what is happening? I typed exactly the same as the video tells