The Super Easy Way to Scrape FBREF for Free Soccer Data

Tap to unmute

The Super Easy Way to Scrape FBREF for Free Soccer Data

Рет қаралды 13,362

McKay Johns

Күн бұрын

Пікірлер: 58

@JuniorFilmz7 Ай бұрын

Man your videos are so useful and easy to follow, you should have waay more views and subscribers! Keep up the great work, man, you're the best!

@MuhammedShameel-qd3oi 15 күн бұрын

Man this means a lot I have one project coming but i have no knowledge in this field but as i had interest in football I chose my topic to be something related to football and had no clue how i would get the data

@oscarmelendezcodina6225 10 күн бұрын

I'm doing the same, I want to do a web like SofaScore, so Idk who to really do it, so if you have knowledge in this, please contact me

@YTykh 4 ай бұрын

Thanks a lot for this man! I've been trying to do a little personal project on data analytics and this was really useful. Subbed! :)

@robertc2121 8 ай бұрын

Outstanding, your content is really really good and incredibly useful. I didnt know you could add the atts within a read_html call... liked and subscribed :)

@brunosaintclair2911 8 ай бұрын

u have the best content in youtube for sure

@Ndhming 5 күн бұрын

when i set the id for the squad &player stats category; the player shooting table id show that can not found table, how can i solve this problem

@DreamTim 10 ай бұрын

Hey, great video! read_html is a great solution, but I think it might run into a problem when dealing with edge cases like duplicate names (when different people has the same name) and could treat them as one. I've added the href element to the table as unique identifier, but not sure if it can be done through read_html

@Indiancitizen577 10 ай бұрын

Hi , really love the tutorial, I have one question, when trying to read the html in a dataframe, I'm hitting a HTTP Error 403: Forbidden , which I have seen many times when trying to scrape data from FBRef. Would really appreciate any workarounds for this, thanks.

@McKayJohns 10 ай бұрын

can you try something like this? headers = {"User-Agent": "pandas"} df = pd.read_html( "fbref.com/", storage_options=headers )

@McKayJohns 10 ай бұрын

Or pass in an actual User-Agent

@ArmanTheAnalyst 7 ай бұрын

@@McKayJohns I have tried many times even using User-Agent, but it's not working, Can you make a detailed video on it

@babababa201 9 ай бұрын

Hey I’m curious about your course - how are there reviews if it’s only available for pre order?

@TheMizgan 2 ай бұрын

Thanks mate. But what to do if we want to scrap the table after pressing toggle per 90 on the FBRef website? Generally, the non-toggled per 90 version is scraped

@Eduardo17Oliveira 9 ай бұрын

Excellent content!!

@sandipanibasu9240 Ай бұрын

Hey McKay, I was reading through the Use of Data on the fbref page, I'm just in two minds if we can still scrape the data and use it for visualization and share on social media or we have to request data officially?

@Iamoscar08 5 ай бұрын

your videos are so easy to follow

@McKayJohns 5 ай бұрын

Thank you!

@Parrahimovic 8 ай бұрын

Hey man, been following you on Twitter for a couple years now, I'm a big fan, you inspired me to study my first Data analytics course a couple months ago, my knowledge is quite limited as of right now, what previous knowledge do I need to have in order to successfully complete the course?

@McKayJohns 8 ай бұрын

You should be fine with no prior experience! There's a discord community where you can get help from me and others as well if you get stuck :)

@blyatcyker99 8 ай бұрын

@@McKayJohns Hey man, can i have the invite for the discord?

@FootballObserver-n2x 7 ай бұрын

Great video, I note its only the first 6 rows that are pulled, how do you get the entire table to be pulled

@McKayJohns 6 ай бұрын

Hm which page are you looking at?

@veeckhout9410 8 ай бұрын

Hi man, amazing video! I tried it myself and it worked for the table you used and other tables that were like team related. But when I tried for example the passing table of Bundesliga 2 (so all the players and their stats) it did not work bcs it says it can't find any tables. While I'm quite sure I'm using the right ID. I tried it for different leagues etc but it didn't work. Do you know what could have gone wrong and how to solve it?

@McKayJohns 8 ай бұрын

With some tables fbref is kinda weird.. Maybe try this code to see if it works: import requests import bs4 import pandas as pd response = requests.get('fbref.com/en/comps/32/stats/Primeira-Liga-Stats') soup = bs4.BeautifulSoup(response.content) comments = soup.find_all(text=lambda text:isinstance(text, bs4.Comment)) commented_out_tables = [bs4.BeautifulSoup(cmt).find_all('table') for cmt in comments] commented_out_tables = [tab[0] for tab in commented_out_tables if len(tab) == 1] df = pd.read_html(str(commented_out_tables[0]))[0]

@veeckhout9410 8 ай бұрын

@@McKayJohns Thanks man! I appreciate the help and your videos!

@pramodhsairam9670 5 ай бұрын

@@McKayJohns I was facing the same issue and couldnt match the table id as well. You saved me man. Thanks a ton for your work. Keep Rocking 🔥

@stargalextr5400 9 ай бұрын

Hey thanks for the tutorial, but I face an urllib.error.HTTPError: HTTP Error 403: Forbidden error. So how did you navigate that

@YTykh 4 ай бұрын

I had the same issue. Please let me know if you've found a solution. ;_;

@Tonitaco 8 ай бұрын

How could you export that data into an excel or csv from there? What is your opinion on VSCoder?

@McKayJohns 8 ай бұрын

If you turn it into a pandas dataframe you can export it that way to either csv or excel And I prefer PyCharm over VScode for python work but Jupyter Notebooks is where I usually tell everyone to get started

@kevinpaul5541 9 ай бұрын

hey, would it be okay if i use this data for my dissertation purpose? because im trying to use machine learning with this data

@McKayJohns 9 ай бұрын

Yeah that should be fine

@shourya5420 4 ай бұрын

Hey, fbref have now implemented cloudfare which prevents scrapping. Is there any workaround?

@Virgilplaydirty 2 ай бұрын

brightdata site but its paid

@RahulPathakoti-ig2py 7 ай бұрын

Hi McKay Johns, Really appreciate the easy way to scrape the table from websites, I need help to scrape data from 2007 to 2024 (premier league) for all teams. could you please help me with this? Thanks Rahul

@McKayJohns 7 ай бұрын

The easiest way to do it is to get a list of all of the urls you want to scrape and then use a for loop over each one and scrape the table you need off each page

@RahulPathakoti-ig2py 7 ай бұрын

@@McKayJohns thanks a lot , I'll try to scrap those URLs !!

@wiktoria8437 7 ай бұрын

@@RahulPathakoti-ig2py Let me know if it worked :D

@JuanGonzalez-hs4lf 9 ай бұрын

Does anyone know a video that does this in R??

@reasetilo 6 ай бұрын

Just learn python bro😭 I was on here asking the same question two years ago, gave up and learned python

@satyajeetpatil8177 10 ай бұрын

Great stuff

@tomkmb4120 4 ай бұрын

Did my comment get deleted from this video!? wtf

@samali323 9 ай бұрын

74 for a course kinda wild but great video

@Chris_0303 9 ай бұрын

awesome!!

@SymonsKerwin-d1m 4 ай бұрын

Auer Corners

@andrewchen2349 8 ай бұрын

Thank you for your video! May I know if you encountered this problem when read_html? ``` urllib.error.HTTPError: HTTP Error 429: Too Many Requests ``` Thank you

@andrewchen2349 8 ай бұрын

Oh NVM I figured it out: add 'User-Agent' in headers. Thank you!

@McKayJohns 8 ай бұрын

Nice haha sorry for the late reply

@jose99martin 6 ай бұрын

@@andrewchen2349 how did you add the user-agent? i have tried it and i still get the same error headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url,headers=headers) df = pd.read_html(response.text, attrs={"id":ID_TABLE_STANDARD})[0]

@jerzyjaneczek9328 6 ай бұрын

@@andrewchen2349 Hi there, what did you change your 'User-Agent' value to in headers?