Combining S&P 500 into one DataFrame - Python Programming for Finance p. 7

Рет қаралды 86,113

Күн бұрын

Пікірлер: 236

@leeli2318 7 жыл бұрын

Little change in the original code: then works perfectly in the for loop: df = pd.read_csv('stock_dfs/{}.csv'.format(ticker.replace('.', '-'))) df.set_index('Date', inplace=True)

@heratyian 8 жыл бұрын

Dude. You have the best python videos. Thank you for your hard work.

@BrandonJacobson 5 жыл бұрын

I just ran this code as is and it worked perfectly. If you haven't made it through video 5 or video 6 in this series, then you may be having problems with the stocks having after them or running into issues with stock symbols having a "." instead of a "-" like BRK-A and Yahoo! not being able to pull the data. Look through the comments on those videos for tips on getting rid of those errors.

@vijaynyaya6603 3 жыл бұрын

One of the reasons I love computer science is because the "programming" community is so much supportive and shares knowledge.

@hill2750 4 жыл бұрын

Over 3 years later and these videos still blow my new python mind. THANK YOU FOR BEING YOU :)

@FranVarVar 5 жыл бұрын

Running this in December 2019. I was having problems getting data form these tickers: BKR, BRK.B BF.B, CTVA, DOW, FOXA, FOX NLOK. Apparently DataReader complains that there is no 'Date' value. I just wrap the function in try except clause and ignore those tickers: try: df = web.DataReader(ticker, 'yahoo', start, end) df.to_csv(f'stock_dfs/{ticker}') except: print(f'Problems found when retrieving data for {ticker}. Skipping!')

@xilin1063 5 жыл бұрын

Thank you for sharing the knowledge, it's really priceless to me

@julianurrea 8 жыл бұрын

The axis refers whether you're using rows or columns to apply the pandas' function. axis 0 means rows, axis 1 is columns. the error it throws up is because you don't have any row with the index "Close" for example, but once you have it, the entire row will be dropped if axis = 0

@andreasj3018 8 жыл бұрын

thanks for the tutorial. i think it would be great if you'd explain how to do the daily refresh for the new data as well... :) at least im very curious for this

@oriol-borismonjofarre6114 4 жыл бұрын

You magestic beast! you are a brilliant mind!. I loved all the videos from your list "Python Programming for Finance"

@dougp4503 7 жыл бұрын

I did a similar thing to what Jan Blake did in the compile_data function. I kept getting an exception because there is no Z.csv. Here is the code I used to get around that. I hope this helps someone because it worked for me: for count, ticker in enumerate(tickers): try: df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) df.set_index('Date', inplace=True) df.rename(columns = {'Adj Close': ticker}, inplace=True) df.drop(['Open','High','Low','Close','Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df.join(df, how='outer') except: print('stock_dfs/{}.csv'.format(ticker) + ' not found') if count % 10 == 0: print(count)

@X7evenA 7 жыл бұрын

Thanks! Very helpful considering I didn't download the whole data set

@dougp4503 7 жыл бұрын

Glad it helped.

@dahwood2522 7 жыл бұрын

Thank you!

@dougp4503 7 жыл бұрын

Np. Glad I could help.

@dahwood2522 7 жыл бұрын

Yeah, but for some reason the joined closes seems to only be giving dates not any other information? Did you run into that problem?

@matthewgjevre9483 Жыл бұрын

New to Python, following this series after going through your basic of Python series. GPT is a cheat code for this lol

@pmunin 8 жыл бұрын

Is there any way to download other timeframes besides 1day (1 min, 5 min, 15min, 1hr) from Yahoo or other sources? Sorry if i missed in some other videos.

@danielbuhler7024 8 жыл бұрын

Just as a sidenote, whenever you are iterating over something timeconsuming, instead of the counter method you could use tqdm. Gives you a nice progress bar, with estimations for duration. Just use it like this: "for ticker in tqdm.tqdm(tickers): [...]" and it works.

@sentdex 8 жыл бұрын

Awesome, thanks for sharing, grabbed it and have already made use of it :D

@adamdavis9718 7 жыл бұрын

If you downloaded the list recently and were getting a KeyError:'Date', there are two B-class tickers with no data (BRK.B and BF.B) from yahoo finances that will kill your program. Without knowing why there wasn't any data in them, I simply put the labels in the .csv file and it was able to run and didn't skew any of my data. Hope that helps. put this at the top of your files .csv files in a text editor --> Date,Open,High,Low,Close,Adj Close,Volume

@yichengzhao3412 7 жыл бұрын

Thanks for your help, in addition to the missing (BRK.B and BF.B) data. I still cannot get the data of (DXC). Same happen to you?

@toluwafayemi3123 7 жыл бұрын

Wow! Yeah that file is whack...that would have taken me forever to find, how'd you figure that out?!

@adamdavis9718 7 жыл бұрын

Toluwa Fayemi what I did was put a print statement that would output the name of the stock when finished (goes in alphabetical order). Then I went in and found which stock the program failed on, checked the file and found it had no data and couldn't pull anything from it, making an error. Took me a little while to figure out but was a pretty simple fix

@JingweiZhong 7 жыл бұрын

Do the same for DWDP, WLTW, WYN, WYNN, XEL, XL, XLNX, XRX, XYL, YUM, ZBH, ZION, and ZTS, if you are using the most up-to-date stock data (2017 Sep. 2). Hope it helps :-)

@asivolobov 7 жыл бұрын

Yahoo changed API so there are some runtime errors in current version of code. To repair: 1. All information is here: pypi.python.org/pypi/fix-yahoo-finance 2. Install fix_yahoo_finance using pip: $ pip install fix_yahoo_finance --upgrade --no-cache-dir 3. Import fix_yahoo_finance into your code (add this at top of your file after "import pandas_datareader.data as web"): import fix_yahoo_finance 4. Change a line with 'yahoo' string to: df = web.get_data_yahoo(ticker, start, end)

@saadahmed9239 7 жыл бұрын

Thanks but how do i install it in spyder ide in anaconda?

@arturthesimplehuman 5 жыл бұрын

no adds, wow. Thank you)

@vish647 6 жыл бұрын

@sentdex, Morningstar doesn't procure Adj Close column from the website. It only stores the date, open,high, close, low, and volume in the csv file.

@Locke19901 7 жыл бұрын

Great video, as always. What would be considered best practice here? We have all the individual CSVs - and it's quick to create the combined dataframe. We then output and save it to a new csv. Would it be more efficient to just pickle the df (using pandas of course) and save that and not mess with csv? Or is there a reason we may want the combined CSV instead? Or is this completely trivial and don't worry about it?

@Martin-ms7nb 6 жыл бұрын

For some reason, some of the Morningstar pulls have a column labeled 'Symbol' and some don't, further, Morningstar just can't pull certain tickers. For the first issue, where sentdex drops Open, High, Low, Volume, Close, *YOU* should drop Open, High, Low, and Volume, then a try/except where you drop "Symbol" For the second issue, throw the logic in his "for" loop under a try condition, then except and pass/print your non-working tickers. def compile_data(): with open('sp500tickers.pickle', 'rb') as f: tickers = pickle.load(f) print(len(tickers)) main_df = pd.DataFrame() for count, ticker in enumerate(tickers): try: df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) if not df.empty: df.set_index('Date', inplace = True) df.rename(columns = {'Close': ticker}, inplace = True) df.drop(['Open','High','Low','Volume'], 1, inplace = True) try: df.drop(['Symbol'], 1, inplace = True) except: pass if main_df.empty: main_df = df else: main_df = main_df.join(df) if count % 10 == 0: print(count) except: print('Cannot obtain {}'.format(ticker)) print(main_df.head()) main_df.to_csv('sp500_joined_close.csv')

@Martin-ms7nb 6 жыл бұрын

Another possible solution is to drop pickles entirely, and generate your list from the filenames in the directory: csvlist = os.listdir('stock_dfs') #Creates a list of all files in the folder, but includes '.csv' in all the strings tickers = [] #Blank list where we will store our parsed tickers for i in csvlist: #Iterate through the list, drop '.csv', append to empty list, you could also just drop the last 4 tickers += [i[0:-4],] #You could also use os.splitext, but this uses fewer characters and is probably more understandable

@marcelocanetta1892 5 жыл бұрын

Hi sentdex, thanks for the videos. I think there is an error in putting the series together since the values returned Adj Close values in "sp500_joined_closes" are different from the real ones in the individual df. In my code they went well, but in the video they are different , Regards

@hosseinzakariaee5465 3 жыл бұрын

vey good but when i run it it give me thie eror: pandas_datareader._utils.RemoteDataError: No data fetched for symbol MMM using YahooDailyReader whati should do ?

@juliusuotila5930 4 жыл бұрын

Those with "ValueError: columns overlap but no suffix specified: ...." and cant get around it with other advises here. add this to your loop: if count % 10 == 0: print(count) #added if count == 500: print(main_df.head()) main_df.to_csv('sp500_joined_closes.csv') return False .... you can remove print(main_df.head()) main_df.to_csv('sp500_joined_closes.csv') ...this part from the end of the code. This gets the file saved and you can continue with the tutorial. Something wrong with the .join command, I couldnt get around otherwise.

@declanmullen5326 3 жыл бұрын

Wish I could like this 10 times

@kosnowman 6 жыл бұрын

hey guys , so I try to run the code, from apatel32 as well as sentdex official one, whenever I have reached BRK.B it has an error, can i do anything about it? could i skip it , or just compile whatever I have with me up to this point? thanks !

@fbmemar 6 жыл бұрын

Instead of using df = web.DataReader(ticker,'yahoo',start,end) replace ticker with ticker.replace('.','-') meaning: df = web.DataReader(ticker.replace('.','-'),'yahoo',start,end) now it can save those with "." or if they have "-"

@derbicalderon2101 6 жыл бұрын

@@fbmemar Thank you. Thank you. Thank you!! this should be pinned.

@mschuer100 5 жыл бұрын

just noticed you had this error..I just posted above on the same issue...

@tkmks8536 5 жыл бұрын

@@fbmemar This seemed to solve "KeyError: 'Date'" which i got it in previous video. Thanks a lot.

@rajancutting6925 6 жыл бұрын

Anybody else getting "ValueError: columns overlap but no suffix specified: Index(['Symbol'], dtype='object')"? Anybody know what to do?

@MatthewBarcus 6 жыл бұрын

If you are using morningstar like i was, then there is a column called 'Symbol'. Drop that column with the others. df.drop(['Open', 'High', 'Symbol', 'Low', 'Volume'], 1, inplace=True) I also differed from the code in the video by using the standard 'Close' column instead of 'Adj Close' for the line: df.rename[columns={'Close':ticker}, inplace=True)

@abhishekdoke6102 3 жыл бұрын

Anyone getting "ValueError: columns overlap but no suffix specified: Index(['MMM'], dtype='object')"?

@mhj2724 6 жыл бұрын

If you get 'ValueError 'Open' label axis doesn't exist ~~ (I can't remember exact error message), try this code. ( * I'm using morningstar api, I'm python newbie, I'm not good at english) this code will ignore 'error data', and just emerge data in one. morningstar api doesn't give you adj close. so, I just emerge data with 'Close'. and sometimes morningstar can't download some of sp500tickers.pickle. try with fix_yahoo_finance lib. [ CODE ] def compile_data(): with open("sp500tickers.pickle", "rb") as f: tickers = pickle.load(f) main_df = pd.DataFrame() for count, ticker in enumerate(tickers): try: df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) if not df.empty: df.set_index('Date', inplace=True) df.rename(columns={'Close': ticker}, inplace=True) df.drop(['High', 'Low', 'Open', 'Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df = main_df.join(df) print(main_df.head()) except: print('Cannot obtain data for') main_df.to_csv('sp500_joined_closes.csv')

@adityaagarwal4262 6 жыл бұрын

Thanks dude. It worked :D

@quantumly57 6 жыл бұрын

Thanks it worked

@dashawnlyons2791 4 жыл бұрын

If not df.empty?

@dashawnlyons2791 4 жыл бұрын

This code not running for me

@rajivkumar8160 6 жыл бұрын

Hi sentdex. Great work. There's a slight problem here as morningstar does not provide us with the adjustment close data and the google and yahoo finance api have been depracated. I would be grateful if you or anyone can help me out with this as to how to get the adjustment close price of the tickers

@azazeljaxshark69 7 жыл бұрын

I'm losing data on the .join() function. compile_data() iterates through the pickle file fine, grabbing 505 ticker values. For some reason, when merging the dataframes (pushing "df" into "main_df" through "main_df = main_df.join(df, how='outer')") I end up with a final dataframe that has ~47 columns. These aren't even sequential columns. In my case the ticker columns are accurate until "AKAM", skips to "LMT", and then skips a few more times. Total column count of ~40ish columns instead of the expected 505 columns. I verified that every other functions seems to work fine. I matched the pickle ticker values to their relevant .csv files in the stock_dfs folder, etc. Can't find anything on stack overflow about column loss in dataframes. Any ideas?

@azazeljaxshark69 7 жыл бұрын

Nevermind, I realized it's because during the course of my troubleshooting I got data from other APIs in my .csv's that don't include a column name "adj close"

@ImGooblie 8 жыл бұрын

Hey Sent, Do you cover any methods for finding outliers in any of your episodes?

@jamesburns9933 4 жыл бұрын

I didn't want to drop columns like 'Volume', 'Open', 'High', 'Low', and 'Close'. So instead of using the code in this video, I used the below code to get a large df. import glob, os files = glob.glob('stock_dfs/*.csv') df = pd.concat([pd.read_csv(fp).assign(Ticker=os.path.basename(fp)) for fp in files]) df['Ticker'] = df['Ticker'].str.replace('.csv', '') Thanks for the videos, sentdex!

@TheZ10Z 4 жыл бұрын

I did small adjusts, df = pd.read_csv('stock_dfs/{}.csv'.format(ticker.rstrip().replace('.', '-'))) instead of df = pd.read_csv('stock_dfs/{}.csv'.format(ticker) I did it because yahoo api change and now the wikipedia list doesn't work if you don't change the . to -. But in order to combine them you need to change it back to "."

@skythianz 7 жыл бұрын

I am getting the remote data error. Weird that it is actually pulling data from yahoo but in chunks before throwing the error. Ran it some 20 times and got 71 tickers files and seems unable to fetch anymore. Any workaround, guys ?

@ritujha7900 6 жыл бұрын

Hi, I am getting an error in the line : main_df = main_df.join(df, how='outer') ValueError: columns overlap but no suffix specified: Index([u'Ex-Dividend', u'Split Ratio', u'Adj. Open', u'Adj. High', u'Adj. Low', u'Adj. Volume'], dtype='object') Does anyone have any idea about how to correct this? I have downloaded data from google as yahoo didn't work.

@engineerhealthyself 6 жыл бұрын

try using pd.merge() instead of join(). something like: main_df = main_df.merge(df) should work

@snhok01 6 жыл бұрын

I had to pull data from Morningstar, but it created 2 different formats depending if the stock had a stock split, so I had to build an if/else logic to drop the correct columns. Once all the files consists of the same columns, the join worked. I added the block of code below: if 'AdjClose' in df.columns: df.rename(columns = {'AdjClose':ticker}, inplace=True) df.drop(['Open','High','Low','Close','Volume','ExDividend','SplitRatio','AdjOpen','AdjHigh','AdjLow','AdjVolume'], 1, inplace=True) else: df.rename(columns = {'Close':ticker}, inplace=True) df.drop(['Symbol','Open','High','Low','Volume'], 1, inplace=True)

@dahwood2522 7 жыл бұрын

For some reason when I run this code my CSV file only has a column of dates? Is there any fix for this?. Thanks in advance

@TheGbelcher 6 жыл бұрын

I am getting this same error. Did you find the solution?

@TheGbelcher 6 жыл бұрын

Solution: #replace df.drop(['open','high','low',''close','volume'], 1, inplace = True) [# with] df.drop(['open','high','low','volume'], 1, inplace = True) If you are getting an empty DataFrame it is because 'close' is getting dropped by the drop function before the name can be changed to 'ticker'. This makes no sense but seems to be what's going on. If anyone can explain better, pls do. // I am using iex instead of Google or Yahoo

@ultimatefifagaming5719 5 жыл бұрын

I have this same problem. I only am getting the last ticker. Did you find the answer? Greg Belcher, I tried it but it did not work

@tinatipton3291 3 жыл бұрын

What might you do if you wanted to keep other columns, and not just adjusted close, it throws an exception that the data overlaps. I attempted to rename each of the columns, however, it doesn't quite work the same with multiple, and it gives me positional keyword argument errors. I could just be doing it wrong though, I'm not super familiar with doing data plots, so any advice from anyone would be appreciated.

@PCAN411 8 жыл бұрын

do you have a video that imports custom built functions from other files to a main file?

@janbalke5900 8 жыл бұрын

I had an error a few videos earlier, where it throws an exception when i tried to get yahoo data for a ticker that contains a point, like BRK.B for example. for getting the data from yahoo i just did the following... try: df = web.DataReader(ticker, 'yahoo', start, end) df.to_csv('stock_dfs/{}.csv'.format(ticker)) except: print("DataReader Error!") now in this case i just put the code that gets executed in the for loop additionally in an if statement, so ist only executet if the ticker does not contain a point... if not '.' in ticker: but still... really great videos, keep up the good work!!

@angedupont9564 8 жыл бұрын

did you try df = web.DataReader(ticker.replace('.', '-'), "yahoo", start, end)? because wikipedia is using a '.' but Yahoo a '-'. I'm not sure my code is optimal, someone can confirm ?

@janbalke5900 8 жыл бұрын

oh, never thought of that... works perfectly fine, tanks you!!!

@johndoucette3687 8 жыл бұрын

Okay. But where did you put this code?

@iNotSoTall 6 жыл бұрын

I'm still getting a "stock_dfs/BRK.B.csv' does not exist" error. Here's what my enumerate(ticker) for loop is: for count, ticker in enumerate(tickers): df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) df.set_index('Date',inplace = True) df.rename(columns = {'Adj Close': ticker}) df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True) When I run the compile_data() function, it'll give me numbers, 0,10,20... until 70, then it gives me that error message and cancels the entire program.

@landonrobin8910 3 жыл бұрын

Where do I find the folder with all of the .csv files? I ran the code and everything looks correct in the console with no errors, but I am struggling to find the files. I am on a mac as well, if that makes a difference

@privateeye242 5 жыл бұрын

Everything works except for the module compile_data. I got the sp500tickers.pickle file, I got the stock_dfs directory and I got the csv files in that directory and these files have the data. This is the code: def compile_data(): with open("d:sp500tickers.pickle","rb") as f: tickers = pickle.load(f) main_df = pd.DataFrame() for count, ticker in enumerate(tickers): df = pd.read_csv('d:stock_dfs/{}.csv'.format(ticker)) df.set_index('Date', inplace=True) df.rename(columns={'Adj Close': ticker}, inplace=True) df.drop(['Open','High','Low','Close','Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df = main_df.join(df, how='outer') if count % 10 == 0: print(count) print(main_df.head()) main_df.to_csv('d:sp500_joined_closes.csv') compile_data() It does not print main_df.head and does not make a sp500_joined_closes.csv file. If it run's it prints the counts till 500 and shows the next error message: ValueError: columns overlap but no suffix specified: Index(['MMM'], dtype='object') If I put the commands in the if count % 10 = = 0 loop by putting them at the samen indentation as print(count) the data is printed and the sp500_joined_closes.csv file is produced. But it takes a long time. Even then I get error messages. Any one has an idea? It has something to do with: main_df = main_df.join(df, how='outer' )

@markd964 6 жыл бұрын

Minor correction to this code, using a : instead of a , as shown here (from pythonprogramming.net/combining-stock-prices-into-one-dataframe-python-programming-for-finance/): df.rename(columns={'Adj Close':ticker}, inplace=True)

@markd964 6 жыл бұрын

oh, yes...sentdex fixed this in the video...should watch to the end! my bad...

@yuehu5315 6 жыл бұрын

What if companies were included in the index after 2000, or were excluded some point in time after 2000, what will csv look like? And how to deal with that?

@diegomedina1898 4 жыл бұрын

When I run the code I get this issue: TypeError: 'set' object is not callable. Can anybody help?

@buildwithahmet 4 жыл бұрын

You should change "," to ":" in df.rename(columns = {'Adj Close': ticker}, inplace=True)

@Hahmzuh 4 жыл бұрын

@@buildwithahmet Thank you Ahmet, you're awesome

@hansel7203 4 жыл бұрын

In the previous video I selected on the first 10 companies, [:10]. Then when I ran this new script I got the error that I could not find file for #11. Where within this code would I tell it to just use the 10 files I downloaded?

@brunorafael6497 4 жыл бұрын

"sp500_joined_closes.csv: tokenization, wrapping and folding have been turned off for this large file in order to reduce memory usage and avoid freezing or crashing." Should i forcefully enable features? (I am using vs code)

@adampayne5619 2 жыл бұрын

You should consider registering your Sublime

@sentdex 2 жыл бұрын

I've had a sublime license for years. Sometimes I forget to activate it on new machines/VMs. Thank you for your concern though.

@kmillanr 7 жыл бұрын

hi, I'm not getting all the tickers in my compiled file. as anyone run into this?

@ahmedb8613 7 жыл бұрын

Got the same issue aswell

@ultimatefifagaming5719 5 жыл бұрын

Did you figure out how to fix this?

@jasonrbodie 6 жыл бұрын

Good tutorial so far. If you're getting a ValueError: columns overlap ...... try the following code. ##### COMPILE DATA INTO ONE FILE ##### # 1. Utilizes python try catch due to not having all 500 tickers # 2. Opens each ticker file, strips unwanted information and leaves ticker value and 'Close' value # 3. Joins stripped data into one file ##### END COMPILE DATA INTO ONE FILE ##### import requests import pickle import requests import datetime import os import pandas import pandas_datareader.data as web from time import sleep # Utilizes python try catch due to not having all 500 tickers def compile_data(): try: # 'sp500tickers.pickle' populated with 'morningstar' data # Not yahoo or google # Wasnt able to download all 500 but did download 44 tickers with open("sp500tickers.pickle", "rb") as f: tickers = pickle.load(f) mainDataSet = pandas.DataFrame() for count, ticker in enumerate(tickers): sleep(1) # 1 sec sleep allowing me to watch progress of print fileDataSet = pandas.read_csv('stock_dfs/{}.csv'.format(ticker)) fileDataSet.set_index('Date', inplace=True) # Used column 'Close' due to not having 'Adj Close' from morningstar fileDataSet.rename(columns={'Close':ticker}, inplace=True) fileDataSet.drop(['Symbol', 'Open', 'High', 'Low', 'Volume'], 1, inplace=True) if mainDataSet.empty: mainDataSet = fileDataSet else: # Just joined(fileDataSet) without how='outer' # Required for it to work. I kept getting # ValueError: columns overlap mainDataSet = mainDataSet.join(fileDataSet) # Prints progress in terminal instead of counting # Good if you dont have 500 tickers print(mainDataSet.head()) mainDataSet.to_csv('sp500_joined_closes.csv') # Saves after for loop (if you have all 500 tickers) # Except triggers due to not having all 500 ticker files. # And saves final version of csv file except FileNotFoundError: mainDataSet.to_csv('sp500_joined_closes.csv') # Saves after you reach your last ticker

@seanbatir4115 7 жыл бұрын

Soo.... what did you guys do to fix the issue wtih Yahoo's URL changing? Did everyone just switch from df = web.DataReader(ticker, 'yahoo', start, end) to df = web.DataReader(ticker, 'google', start, end)? I tried this and my script runs, but for every possible stock I get the feedback, " Cannot obtain data for

@Aerozine50 7 жыл бұрын

I constantly get the date error no matter what I do to try and correct it...

@hans6973 7 жыл бұрын

Same

@daitavan297 7 жыл бұрын

您好, did you got all 500 tickets and compile it successfully? HELP

@sarelg21 6 жыл бұрын

In case anybody sees this, I encountered an error where the date isn't recognized as a key, I think this can happen if your csv is empty for some reason, i.e. the data for that company wasn't downloaded. you can bypass these companies by adding a condition that the data frame you're loading isn't empty: for count, ticker in enumerate(tickers): df = pd.read_csv('stocks_dfs/{}.csv'.format(ticker)) if not df.empty: df.set_index('Date', inplace=True) df.rename(columns={'Adj Close': ticker}, inplace=True) df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df = main_df.join(df, how='outer') else: print('{} data is missing'.format(ticker)) if count % 10 == 0: print(count) main_df.to_csv('sp500_joined_closes.csv')

@nenadnikolic2728 7 жыл бұрын

When I display main_df.head I only get the column of the first company, the join doesn't work. Anybody had issues with that?

@ultimatefifagaming5719 5 жыл бұрын

Did you figure out how to fix this?

@walkops 8 жыл бұрын

I keep getting that file stock_dfs/MMM does not exist and it is clearly there. Any ideas?

@yifeiliu450 8 жыл бұрын

I got that message as well. if your tickers list is complete 500 company list, you won't get that message again.

@KimmoHintikka 8 жыл бұрын

So how did you solve this? I reloaded the data and MMM clearly is present both at stock data and tickers list.

@KimmoHintikka 8 жыл бұрын

Ok figured this out. In my case it had nothing to do with incomplete data. I was missing .csv inside the for loop using enumerate basically naming files without encoding. Error made no sense but since I fixed it runs fine.

@walkops 8 жыл бұрын

Your right! That did the trick!

@angedupont9564 8 жыл бұрын

hey ! if you decided not to retrieve all the 500 company list but (for example) only 50 companies, you should modify this : for count,ticker in enumerate(tickers[:50]): df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) df.set_index('Date', inplace=True)

@gabrielberbig9454 4 жыл бұрын

Anyone else notice that df.inplace causes the date header to appear one row below all other headers

@yudikubota293 4 жыл бұрын

in my case I was getting date is not a column error for some reason, the csv file was missing the first column (date) solved this by resetting index and renaming df = web.DataReader(ticker, 'whatever', start, end) df.reset_index(level=0, inplace=True) df.rename(columns = {'index': 'date'}, inplace=True) df.to_csv(csvPath)

@ryanshrott9622 8 жыл бұрын

great tut!

@aprilmeng74 5 жыл бұрын

Ryan Shrott same here

@prempatil3263 4 жыл бұрын

I am facing a memory error, please help me out

@naveenv3097 8 жыл бұрын

Hi, is there a way to stack data on top of a data in one dataframe...cause i have the same stock data in 3 or 4 files split into 2 year period...Thanks :)

@julianurrea 8 жыл бұрын

Naveen V something like df.append(df2) ?

@DAcasado 7 жыл бұрын

Hi sentdex, thanks for your videos. I have a question: how would we modify the program so that we get for each stock and date: the adj close and volumes, all together. I used: "df.rename(columns={'Adj Close': ticker,'Volume': ticker}, inplace=True)" which works, but I see output in this format: MMM MMM ABT ABT Date 2000-01-03 2173400 31.131128 10635000 9.459574 2000-01-04 2713800 29.894130 10734600 9.189300 2000-01-05 3699400 30.760029 11722500 9.172408 2000-01-06 5975800 33.234026 17479500 9.493358 2000-01-07 4101200 33.893758 15755900 9.594710 With this format, as MMM is both volume and price, I wouldnt know how to interact between both. Any ideas, anyone? Thanks in advance.

@DOKtheDJ 7 жыл бұрын

You probably don't need it anymore, but I would make a separate data frame for volume. You can then pick out the info that you need from those dfs based on the ticker name.

@adarshsharma681 2 жыл бұрын

running compile_data() shows "None of ['Date'] are in the columns" may be because many companies had delisted there data from yfinance they didn't share there data many companies data is empty.Can anyone solve this???

@andreipoehlmann913 8 жыл бұрын

Yahoo API requests are limited to 2k/hour per IP via public access. However via OAuth API Key it's 20k/hour. I've been trying to figure how to set up the OAuth in combination with python/pandas but couldn't find any solution. Would really appreciate any help on this like maybe as a side note on your website. (or as a comment here ;) )

@Rygorius 6 жыл бұрын

When I run the code, the counter starts to slow down and then grinds to a halt at about 410, I left it running and my computer restarted. Since I don't have an error for this its hard to troubleshoot. Has anyone else run into this issue? Thanks!

@douglasholman6300 6 жыл бұрын

I am having the same problem, have you found any fixes to this issue?

@Rygorius 6 жыл бұрын

@@douglasholman6300 I just ran through the series from scratch and now it's working... I'm not entirely sure what the problem was. I think I might have only partially downloaded the sp500 dataset.

@douglasholman6300 6 жыл бұрын

@@Rygorius Hey Ryg -- everything looks to be working on my end as well. Cheers

@6konstis926 5 жыл бұрын

Hey, I am having the same problem. Do you have any advice? I already tried reinstalling the dataset Thank you

@Madmartigan6 7 жыл бұрын

I keep getting the error "RemoteDataError: Unable to read URL", every time I run it the code can get 1 or 2 stocks then I get the error and have to run it again. Any idea why this is happening?

@rohitupadhyay4665 6 жыл бұрын

Trying the code for Indian Nifty 50 stocks. Facing memory error in concatenate_join_units concat_values = concat_values.copy() MemoryError. Any solution?

@sarelg21 6 жыл бұрын

In case anybody sees this, I encountered an error where the date isn't recognized as a key, I think this can happen if your csv is empty for some reason, i.e. the data for that company wasn't downloaded. I think this can lead to other function-breaking errors as well. you can try bypassing these companies by adding a condition that the data frame you're loading isn't empty: for count, ticker in enumerate(tickers): df = pd.read_csv('stocks_dfs/{}.csv'.format(ticker)) if not df.empty: df.set_index('Date', inplace=True) df.rename(columns={'Adj Close': ticker}, inplace=True) df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df = main_df.join(df, how='outer') else: print('{} data is missing'.format(ticker)) if count % 10 == 0: print(count) main_df.to_csv('sp500_joined_closes.csv')

@jessejohnchiasson4254 5 жыл бұрын

Nice one! This worked great

@darwinchan5573 5 жыл бұрын

i just copied the whole code and ran in vscode, but got error in the 2nd iteration in the count ticker enumerate for loop. ValueError: columns overlap but no suffix specified: Index(['Unnamed: 0'], dtype='object') Seems it about the dataframes have problem. I just wonder why code in Sentdex doesnt get such error.

@darwinchan5573 5 жыл бұрын

i just added index_col when reading csv file, and now the results are fine. df = pd.read_csv('stock_dfs/{}.csv'.format(ticker), index_col=0)

@natevannortwick4554 6 жыл бұрын

In the part 6 video I capped the S&P 500 query from yahoo (I used IEX thought because yahoo is broken now) at 10 (for ticker in tickers[:10]:). Now when I try to execute the compile_data() function it doesn't work because I don't have all of the .csv files that I need, it appears to work for the first ten, but then when it finds a ticker in the S&P 500 pickle file that doesn't have an associated .csv file I get an error. Any ideas on how to fix this? Sorry if this is a stupid question, I'm a noob, thanks in advance.

@connorbarrick3975 6 жыл бұрын

@sentdex I'm having the same issue. i think MorningStar throttles at 44 Tickers and it can't pull after that. I tried inputting time sleeps to overcome but that didn't work. found the issue in this section too. then in the enumerate function i put a [:25] on it and then it only pulled the dates for those... i'm a rookie too, any help you can provide?

@connorbarrick3975 6 жыл бұрын

I guess morningstar API gets stuck on a few others too, its better if you insert a piece of code to skip the ones that don't return a response and continue with others that do. I ended up with 497 CSV files in total for ticker in tickers[]: try: print(ticker) if not os.path.exists('stock_dfs/{}.csv'.format(ticker)): df = web.DataReader(ticker, 'morningstar', start, end) df.reset_index(inplace= True) df.set_index('Date', inplace= True) df = df.drop('Symbol', axis=1) df.to_csv('stock_dfs/{}.csv'.format(ticker)) else: print('Already have{}'.format(ticker)) except: print('Cannot obtain data for' +ticker) When any ticker gets stuck, a keyboard interrupt (Ctrl + C) helped me move on to the next one Try this out

@amirvahid7143 6 жыл бұрын

Thanks it appears to me that pip install yahoo_finance_fix solves all of the problems since quandl is not quite a good fit for this tutorial b/c it only downloads ~55 S&P500

@FireTeamSix. 3 жыл бұрын

I have been trying to do this tutorial, but keep running out of memory after running compile_data(). It keeps saying "Unable to get 121 MB of Memory", so I think my RAM might be too small (16GB DDR4). Does anyone have a workaround to this? Would I need to only pull 200 companies as opposed to all 500? Thank you so much.

@IonicCascade 8 жыл бұрын

since not all the companies started back in 2000, how do you replace all the empty cells cleanly?

@igors1131 8 жыл бұрын

depending on the approach you store your data, say in my_data, you may use for loop as for i,j in enumerate(my_data): Here i is index j is my_data value, numpy.isnan(j)

@ElLenzo 8 жыл бұрын

There are different ways to handle missing values. For example, you could use a forward fill method in combination with a backward fill method ( pandas.pydata.org/pandas-docs/stable/missing_data.html ). This is probably the easiest but least precise method. You can also impute the missing values, for example through Mean or Median values. You could also perform a regression analysis to compute the missing values. It's up to which method to choose. This is what i currently use: # fill missing values using forward # and backward fill method def fill_missing_values(df_data): """Fill missing values in data frame, inplace.""" df_data.fillna(method="ffill", inplace=True) df_data.fillna(method="bfill", inplace=True) return df_data Greetings

@danyalpanjwani14 4 жыл бұрын

I'm getting the following error: File "/Users/danpanjwani/opt/anaconda3/envs/ECON341/lib/python3.7/site-packages/pandas_datareader/yahoo/daily.py", line 160, in _read_one_data raise RemoteDataError(msg.format(symbol, self.__class__.__name__)) RemoteDataError: No data fetched for symbol MMM using YahooDailyReader what should I do?

@TheZ10Z 4 жыл бұрын

this link might help you stackoverflow./com/questions/54854276/no-data-fetched-web-datareader-panda you need to take out the tab between the dot and the com

@aligh8803 6 жыл бұрын

Just for the info: right now the Yahoo, Google, and Morningstar do not work. you could use "Quandl" instead.

@ragavaarajesh 6 жыл бұрын

can you please share the code ?

@aligh8803 6 жыл бұрын

I would refer you to this link: www.quandl.com/tools/python But basically it would be something like: import quandl mydata = quandl.get("FRED/GDP") For some databse it need free registreation, which then you get token and do: quandl.ApiConfig.api_key = "YOUR_KEY_HERE" But anyway, it lacks a lot of database. The dirty work would be directly get the daily data from yahoo or google.

@clinthastings353 6 жыл бұрын

Yahoo data seems to work again, today being 11/11/18

@aprilmeng74 5 жыл бұрын

Clint Hastings still cannot downtown all the data , only part of the data, confused, 11/11/2019

@vk1094 4 жыл бұрын

Generated frames are not getting saved. Folder is created but file are not saved. Pleas help anyone?

@ReactsRebirth 8 жыл бұрын

Hey Sentdex, I tried to do this with a specific sector and I'm getting an error. Do you think you can point me to the reason? the AAL file is located inside a folder named airline_sector, so I'm not sure. def compile_data(): main_df = pd.DataFrame() for ticker in enumerate(stockToPull): df = pd.read_csv('airline_sector/{}.txt'.format(ticker)) IOError: File airline_sector/(0, 'AAL').txt does not exist

@TheSBraun58 8 жыл бұрын

enumerate(stockToPull) returns a tuple. If you write 'for ticker in stockToPull:' your code will work. Alternatively if you need the count provided by enumerate() you can write 'for count, ticker in enumerate(stockToPull):'.

@MrRaznos 4 жыл бұрын

IN order to solve 'ValueError: columns overlap but no suffix specified' Change: main_df = main_df.join(df,how = 'outer') to: main_df = main_df.merge(df,how = 'outer',on='Date')

@utsavkadam1675 4 жыл бұрын

Lovely, cheers mate :)

@dimitrijb 4 жыл бұрын

Amazing ! Thank you very much !

@abhishekdoke6102 3 жыл бұрын

i m getting this error "ValueError: columns overlap but no suffix specified: Index(['MMM'], dtype='object')"

@nikifoxy69 7 жыл бұрын

Can someone help on below error. valueerror stat path too long for windows python

@abluntdaily 8 жыл бұрын

if I rerun the function again will it create duplicate files or continue where it left off? its taking a really long time so I think I might be getting throttled. On my last attempt I was able to get 60 files but then I deleted the folder and it started right up again when I ran the function. I have 256 files now so I don't want to delete the folder but I also don't want to rerun the function if it will create duplicate files. Its been about an hour that I am stuck at 256 files

@sentdex 8 жыл бұрын

Windows will not prompt you when overwriting files with Python. The reason we have the following code: for ticker in tickers: if not os.path.exists('stock_dfs/{}.csv'.format(ticker)): df = web.DataReader(ticker, "yahoo", start, end) df.to_csv('stock_dfs/{}.csv'.format(ticker)) else: print('Already have {}'.format(ticker)) is precisely for this reason, however. If the ticker data is already there, we wont pull it again.

@abluntdaily 8 жыл бұрын

sentdex thanks for answering. yeah I shortly realized that after. I kept rerunning the function but now I'm stuck at 316 files and every time I run it I immediately get an error because get past one of the tickers.

@abluntdaily 8 жыл бұрын

I already applied the fix I found in the comment section to get past Berkshire Hathaway. df = web.DataReader(ticker.replace('.', '-') so I have 316 files now but I don't understand what its stuck on and why. from the link in the error message it looks like its Morgan Stanley? RemoteDataError: Unable to read URL: ichart.finance.yahoo.com/table.csv?s=MS&a=0&b=1&c=2000&d=11&e=31&f=2016&g=d&ignore=.csv

@abluntdaily 8 жыл бұрын

never mind it looks like there was just something wrong with that link before. now when i click it it downloads the file so it must have been an error yahoo fixed. function is working again

@Theminecrafter2598 6 жыл бұрын

when i run this code i get the error "File b'stock_dfs/ALL.csv' does not exist" can you help me figure out how to fix this?

@onlinetrades1016 6 жыл бұрын

copy paste this in the for loop like below for count,ticker in enumerate(tickers): mapping = str.maketrans(".","-") ticker = ticker.translate(mapping)

@aprilmeng74 5 жыл бұрын

Online Trades 101 her error is not caused by . but because of the data she downloaded is not enough

@solutioncomedy9681 5 жыл бұрын

Does anyone also experience problems when the tickers reaches 'BRK.B'? I get the " Key error: 'Date' " warning. Unfortunately i am not able to find the solution by myself...

@vinayaksidharth5088 5 жыл бұрын

It's because you haven't fully saved all the sp 500 companies just look at the folder and see if 500 csv files are there. suppose if there are only 60 files def compile_data() : with open ("sp500tickers.pickle" , "r+b")as f: tickers = pickle.load(f)[:60] run this

@solutioncomedy9681 5 жыл бұрын

@@vinayaksidharth5088 Thanks for your quick answer. The problem does not occur at the compile_data() function but at the get_data_from_yahoo() function... If I choose to only use a few companies of the sp500, the whole function works like a charm.

@skmn07 8 жыл бұрын

awesome

@Anthony-db7ou 4 жыл бұрын

Hey, any idea whats going on here? i checked the stock_dfs folder and it didn't store anything. Trying to figure our what went wrong. Best, Carmine ------------------------------------------------------------------------- OUTPUT: KeyError Traceback (most recent call last) ~\anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in _read_one_data(self, url, params) 156 j = json.loads(re.search(ptrn, resp.text, re.DOTALL).group(1)) --> 157 data = j["context"]["dispatcher"]["stores"]["HistoricalPriceStore"] 158 except KeyError: KeyError: 'HistoricalPriceStore' During handling of the above exception, another exception occurred: RemoteDataError Traceback (most recent call last) in 25 print('Already have {}'.format(ticker)) 26 ---> 27 get_data_from_yahoo() 28 29 in get_data_from_yahoo(reload_sp500) 17 # just in case your connection breaks, we'd like to save our progress! 18 if not os.path.exists('stock_dfs/{}.csv'.format(ticker)): ---> 19 df = web.DataReader(ticker, 'yahoo', start, end) 20 df.reset_index(inplace=True) 21 df.set_index("Date", inplace=True) ~\anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs) 212 else: 213 kwargs[new_arg_name] = new_arg_value --> 214 return func(*args, **kwargs) 215 216 return cast(F, wrapper) ~\anaconda3\lib\site-packages\pandas_datareader\data.py in DataReader(name, data_source, start, end, retry_count, pause, session, api_key) 374 375 if data_source == "yahoo": --> 376 return YahooDailyReader( 377 symbols=name, 378 start=start, ~\anaconda3\lib\site-packages\pandas_datareader\base.py in read(self) 251 # If a single symbol, (e.g., 'GOOG') 252 if isinstance(self.symbols, (string_types, int)): --> 253 df = self._read_one_data(self.url, params=self._get_params(self.symbols)) 254 # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT']) 255 elif isinstance(self.symbols, DataFrame): ~\anaconda3\lib\site-packages\pandas_datareader\yahoo\daily.py in _read_one_data(self, url, params) 158 except KeyError: 159 msg = "No data fetched for symbol {} using {}" --> 160 raise RemoteDataError(msg.format(symbol, self.__class__.__name__)) 161 162 # price data RemoteDataError: No data fetched for symbol MMM using YahooDailyReader

@loveeven90 4 жыл бұрын

You will remove the " " from each ticker. ticker = ticker.replace('.','-') tickers.append(ticker.strip()) using the above line to get ride off " "

@shanghaifoodie 8 жыл бұрын

In the code main_df = main_df.join(df, how = 'outer' ) ## Is how = 'inner' also ok?

@henningsperr8063 8 жыл бұрын

I think inner will only keep all rows that appear in both data frames, so you would not have the NaN columns and throw away the data (could cost you years of data depending on when some of the companies launched)

@varadjams 3 жыл бұрын

Compiling this in April 2021 - I end up getting a name error: name 'main_df' is not defined. Dont understand why, anybody got any solutions?

@thevaibhavgaur4953 5 жыл бұрын

how can we change the currency?

@beansgoya 7 жыл бұрын

im getting a ton of "can not obatin data". and for a bunch of files, they are not even stock data. just text. Anyone else running into this?

8 жыл бұрын

Nice :P

@hans6973 7 жыл бұрын

KeyError: 'Date' how to solve this?

@hans6973 7 жыл бұрын

import datetime as dt import os import pandas as pd from pandas_datareader import data as pdr import fix_yahoo_finance import pickle def save_sp500_tickers(): df = pd.read_html("en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0] df.columns = df.ix[0] df.drop(df.index[0], inplace=True) tickers = df['Ticker symbol'].tolist() with open('sp500tickers.pickle', 'wb') as f: pickle.dump(tickers, f) return tickers def get_data_from_yahoo(reload_sp500=False): if reload_sp500: tickers = save_sp500_tickers() else: with open('sp500tickers.pickle', 'rb') as f: tickers = pickle.load(f) if not os.path.exists('stock_dfs'): os.makedirs('stock_dfs') start = dt.datetime(2000, 1, 1) end = dt.datetime(2017, 6, 29) for ticker in tickers: if not os.path.exists('stock_dfs/{}.csv'.format(ticker)): df_yahoo = pdr.get_data_yahoo(ticker, start, end) df_yahoo.to_csv('stock_dfs/{}.csv'.format(ticker)) else: print('Already have {}'.format(ticker)) def compile_data(): with open('sp500tickers.pickle', 'rb') as f: tickers = pickle.load(f) main_df = pd.DataFrame() for count, ticker in enumerate(tickers): df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) df.set_index(['Date'], inplace=True) df.rename(columns={"Adj Close": ticker}, inplace=True) df.drop(['Open', 'High', 'Low', 'Close', 'Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df = main_df.join(df, how='outer') print(ticker) print(main_df.head()) main_df.to_csv('sp500_joined_closes.csv') compile_data() ##what's wrong with my code

@sentdex 7 жыл бұрын

keyerror means you likely attempted to reference a column in your dataframe that doesn't exist.

@mushfiqurmashuk6948 7 жыл бұрын

running with the same problem..In this line df.set_index('Date', inplace=True)..any progress ?

@olatomiwaakinlaja4978 5 жыл бұрын

fIX: df = web.DataReader(ticker.replace('.','-'), 'yahoo', start, end) but the error pops up again after a while

@aprilmeng74 5 жыл бұрын

Tommy Akins same here did you fix it thanks

@看細看係 5 жыл бұрын

Not sure if anyone facing a strange issue , there have 2 companies, BRK-B and BF-B, which i cannot read the history data from Yahoo, but i manually search it, they are really exits and have all time data. also , i guess the index of SP500 from WIKI ,the format is BRK.B, but in Yahoo, the company data is stored as BRK-B. same thing happened to BF-B or BF.B, if I run the code , there have an error : FileNotFoundError: [Errno 2] File b'stock_dfs/BF.B.csv' does not exist: b'stock_dfs/BF.B.csv one by one come out , does anyone else facing that problem? i am using VScode to program code. thanks

@tonihuhtiniemi1222 5 жыл бұрын

See previous video, someone also asked and replied :) "Wikipedia uses "." instead of "-" in their list. Had to translate "." to "-" so it would get past Berkshire Hathaway. Just for anyone running into this." ---->

@iNotSoTall 6 жыл бұрын

It's still giving me the "kzbin.info/www/bejne/oGHdiJKBjd6EgJY" error when trying to enumerate. What sort of code can move past that error so it can just skip enumerating it if the file isn't there?

@simonromano 6 жыл бұрын

Anyone else getting a KeyError: 'Date' when running the function??

@SamNaSam 5 жыл бұрын

Check if the tickers start with the same date - probably not

@aprilmeng74 5 жыл бұрын

yes I got the same error date , totally confused

@DiptiranjanHarichandan 8 жыл бұрын

It's showing this error and I am unable to figure out what exactly it means. Can you please help me out :( ----------------------------------------------------------------------------------------- File "combining-sp500-p7.py", line 72, in compile_data() File "combinin-sp500-p7.py", line 55, in compile_data df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) File "/home/diptiranjan/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 646, in parser_f return _read(filepath_or_buffer, kwds) File "/home/diptiranjan/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 389, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/diptiranjan/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 730, in __init__ self._make_engine(self.engine) File "/home/diptiranjan/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 923, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/home/diptiranjan/.local/lib/python3.5/site-packages/pandas/io/parsers.py", line 1390, in __init__ self._reader = _parser.TextReader(src, **kwds) File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4184) File "pandas/parser.pyx", line 667, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:8449) FileNotFoundError: File b'stock_dfs/MMM.csv' does not exist ------------------------------------------------------------------------------------------ But I have stock_dfs/MMM.csv file stored.

@scottik187 8 жыл бұрын

Did you find a solution to this? I got the same error.

@DiptiranjanHarichandan 8 жыл бұрын

Yeah.. You are basically not saving the files in .csv format check the code where it stores the stock values in CSV format

@scottik187 8 жыл бұрын

Thanks - I tried but I can't see an error. It looks exactly the same. for ticker in tickers: print(ticker) if not os.path.exists('stocks_dfs/{}.csv'.format(ticker)): df = web.DataReader(ticker, 'yahoo', start, end) df.to_csv('stocks_dfs/{}.csv'.format(ticker))

@jayc578 6 жыл бұрын

Regarding the axis, here is a source for you to refer to. stackoverflow.com/questions/25773245/ambiguity-in-pandas-dataframe-numpy-array-axis-definition

@yuxuanliu4661 4 жыл бұрын

After combining these data by 'Date', dataframe only contains 'MMM'(only one company)? Anyone had the same problem???? thanks!!!!!

@asdfasdfwae 4 жыл бұрын

ya me too. did you get any soln.?

@dashawnlyons2791 4 жыл бұрын

Ticker = row.findAll('td')[0].text.strip()

@Xtremefiresnake 5 жыл бұрын

if you are having problems with "Date" , try "date"

@aprilmeng74 5 жыл бұрын

Xtremefiresnake hi Can you elaborate how, really confused about the error Date. Many thanks

@samg7247 6 жыл бұрын

Can someone maybe explain to me why I am having issues with panda and datareader? I was able to recreate the code from part 5 with no issues but from this video, it is giving me issues with lines (in order of traceback) 5, 2, 14, and 1. At the bottom of the error it says Importerror: cannot import name 'is_list_like_' I am trying to use morning star data as I have noticed comments saying there are issues with google and yahoo APIs

@patelal 7 жыл бұрын

Here is the code if you are using the Google stock API since yahoo no longer seems to work. import bs4 as bs import datetime as dt import os import pickle import requests ##import matplotlib.pyplot as plt ##from matplotlib import style ##from matplotlib.finance import candlestick_ohlc ##import matplotlib.dates as mdates import pandas as pd import pandas_datareader.data as web ##style.use('ggplot') ##start = dt.datetime(2000,1,1) ##end = dt.datetime(2016,12,31) ##df = pd.read_csv('tsla.csv', parse_dates = True, index_col=0) def save_sp500_tickers(): resp = requests.get('en.wikipedia.org/wiki/List_of_S%26P_500_companies') soup = bs.BeautifulSoup(resp.text) table = soup.find('table', {'class':'wikitable sortable'}) tickers = [] for row in table.findAll('tr')[1:]: ticker = row.findAll('td')[0].text tickers.append(ticker) with open("sp500tickers.pickle","wb") as f: pickle.dump(tickers, f) print(tickers) return tickers #save_sp500_tickers() def get_data_goog(reload_sp500=False): if reload_sp500: tickers = save_sp500_tickers() else: with open("sp500tickers.pickle","rb") as f: tickers = pickle.load(f) if not os.path.exists('stock_dfs'): os.makedirs('stock_dfs') start=dt.datetime(2000,1,1) end=dt.datetime(2016,12,31) for ticker in tickers: print(ticker) if not os.path.exists('stock_dfs/{}.csv'.format(ticker)): df = web.DataReader(ticker, 'google', start, end) df.to_csv('stock_dfs/{}.csv'.format(ticker)) else: print('Already have {}',format(ticker)) #get_data_goog() def compile_data(): with open("sp500tickers.pickle","rb") as f: tickers = pickle.load(f) main_df = pd.DataFrame() for count,ticker in enumerate(tickers): df = pd.read_csv('stock_dfs/{}.csv'.format(ticker)) df.set_index('Date', inplace=True) df.rename(columns = {'Close':ticker}, inplace=True) df.drop(['Open','High','Low','Volume'], 1, inplace=True) if main_df.empty: main_df = df else: main_df = main_df.join(df, how='outer') if count % 10 == 0: print(count) print(main_df.head()) main_df.to_csv('sp500_joined_closes.csv') compile_data()

@LimePhD 7 жыл бұрын

I encountered an issue with the Google API on certain tickers (eg. LMT), which needed their exchange specified in the API call. In the below, I added a new variable to concatenate 'NYSE:' and the ticker, and a try block to attempt both the vanilla and 'NYSE' version of each ticker. The code is confirmed working. def get_data_from_google(reload_sp500=False): if reload_sp500: tickers = save_sp500_tickers else: with open('sp500.pickle','rb') as f: tickers = pickle.load(f) if not os.path.exists('stock_dfs'):#check if directory exists os.makedirs('stock_dfs')#if not, create new directory start = dt.datetime(2000,1,1) end = dt.datetime(2017,11,1) for ticker in tickers: #loop through ticker list query_name = 'NYSE:{}'.format(ticker) #specify exchange, can be further extentiated for nonNYSE stocks if not os.path.exists('stock_dfs/{}.csv'.format(ticker)): #if file doesn't exist try: df = web.DataReader(query_name,'google',start,end) #query google for historical data df.to_csv('stock_dfs/{}.csv'.format(ticker)) #save csv of data except: df = web.DataReader(ticker,'google',start,end) #query google for historical data df.to_csv('stock_dfs/{}.csv'.format(ticker)) #save csv of data else: print('Already have {}'.format(ticker))

@strawtak3265 5 жыл бұрын

cool man

@strawtak3265 5 жыл бұрын

it has something error with that's belowTraceback (most recent call last): File "C:\Users\straw\Desktop\Stock\finance.py", line 156, in get_data_goog() File "C:\Users\straw\Desktop\Stock\finance.py", line 150, in get_data_goog df = web.DataReader(ticker, 'google', start, end) UnboundLocalError: local variable 'start' referenced before assignment

@KevinJohnson01 5 жыл бұрын

I had two issues by the end of this video: both caused by me... 1) I only wanted to analyze 5 stocks instead of the entire S&P500, because my internet is super slow. 2) I didn't see a point to the count function, so I decided to remove it, whoops. How to fix: problem 1) I stuck with Sentdex's style and wrote a definition to reduce the S&P500 list we acquired from an earlier lesson then pointed to the new .pickle appropriately. step 1) Add the following definition into your code and run the call to get your new .pickle: def sp500_pickle_reducer(): tickers = pickle.load(open("sp500tickers.pickle", "rb")) print(tickers[0:5]) # prints list for verification. with open("sp5tickers.pickle", "wb") as f: # name the file accordingly, this example has 5 tickers. pickle.dump(tickers[0:5], f) # replace 5 as needed, this is how you control the number of tickers put into your new pickle. # sp500_pickle_reducer() # you'll need to un-comment this i order to call the function. step 2) Update the 2 references in def compile_data(): with open("sp500tickers.pickle", "rb") as f: main_df.to_csv('sp500_joined_closes.csv') changed to: with open("sp5tickers.pickle", "rb") as f: main_df.to_csv('sp5_joined_closes.csv') note: it isn't necessary to change to 'sp5_joined_closes.csv' and it'll probably make things more interesting in future lessons, but it looks nicer to me. problem 2) You must have "count, ticker" in the line "for count, ticker in enumerate(tickers):" step 1) if you get the error "FileNotFoundError: [Errno 2] File b"stock_dfs/(0, 'MMM').csv" does not exist: b"stock_dfs/(0, 'MMM').csv"" then you probably removed "count, " from "for count, ticker in enumerate(tickers):" and just need to put it back to get everything right again.

@liangyumin9405 6 жыл бұрын

pip install fix_yahoo_finance ---> import fix_yahoo_finance as yf ; yf.pdr_override()--> df = web.get_data_yahoo(ticker,start=start, end=end) may work 2018-08-24

@amacodes7347 2 жыл бұрын

For anyone watching this video as of Dec 2022. this version works better and uses only the yahoo finance package to get data the pandas-data-reader package doesn't work import bs4 as bs import datetime as dt import pandas as pd import requests import yfinance as yf def get_sp500_tickers(): resp = requests.get('en.wikipedia.org/wiki/List_of_S%26P_500_companies') soup = bs.BeautifulSoup(resp.text, 'html.parser') table = soup.find('table', {'class':'wikitable sortable'}) tickers = [row.find('td').text.strip().replace('.', '-') for row in table.find_all('tr')[1:]] return tickers def get_stock_data(ticker): start = dt.datetime(2000, 1, 1) end = dt.datetime(2020, 11, 26) df = yf.download(ticker, start=start, end=end) df['Ticker'] = ticker return df def create_stock_dataframe(): tickers = get_sp500_tickers() df_list = [get_stock_data(ticker) for ticker in tickers] df = pd.concat(df_list) return df df = create_stock_dataframe() or this version from the tutorial def save_sp500_tickers(): resp = requests.get('en.wikipedia.org/wiki/List_of_S%26P_500_companies') soup = bs.BeautifulSoup(resp.text, 'html.parser') table = soup.find('table', {'class':'wikitable sortable'}) tickers = [row.find('td').text.strip().replace('.', '-') for row in table.find_all('tr')[1:]] with open('sp500tickers.pickle', 'wb') as f: pickle.dump(tickers, f) return tickers def get_data_from_yahoo(reload_sp500=False): if reload_sp500: tickers = save_sp500_tickers() else: with open('sp500tickers.pickle', 'rb') as f: tickers = pickle.load(f) if not os.path.exists('stock_dfs'): os.makedirs('stock_dfs') start = dt.datetime(2000, 1, 1) end = dt.datetime(2020, 11, 26) for ticker in tickers: if not os.path.exists(f'stock_dfs/{ticker}.csv'): df = yf.download(ticker, start, end) df.reset_index(inplace=True) df.set_index('Date', inplace=True) df.to_csv(f'stock_dfs/{ticker}.csv') else: print(f'Already have {ticker}') get_data_from_yahoo()