Easy Web Scraping in Python using Pandas for Data Science

  Рет қаралды 79,096

Data Professor

Data Professor

Күн бұрын

Пікірлер: 123
@KenJee_ds
@KenJee_ds 4 жыл бұрын
I didn't know about this pandas functionality! Great video!
@DataProfessor
@DataProfessor 4 жыл бұрын
Wow, it's Ken Jee! Thanks for the comment and kind words! I also subscribe to your channel, great content by the way, especially the 6-part DS project from scratch series.
@KenJee_ds
@KenJee_ds 4 жыл бұрын
@@DataProfessor Thanks! I am loving your stuff as well. I need to start using colab more. Keep up the good work, the tutorials are very helpful!
@karthiavenger4577
@karthiavenger4577 4 жыл бұрын
You great bro Down to earth
@HVjugo
@HVjugo 3 жыл бұрын
I used this before, but I didn't knew that you can select the table using the brackets, awesome! Thanks for the video!
@DataProfessor
@DataProfessor 3 жыл бұрын
Glad it's helpful, thanks for watching!
@monicadesai7928
@monicadesai7928 4 жыл бұрын
Great Explanation of each step....right from opening file to end....because sometimes as a newbie we find difficult to which file to use from github also.....Thank you ....Great Video!
@DataProfessor
@DataProfessor 4 жыл бұрын
Wow thanks for the encouraging words, glad you’ve found the video helpful 😊
@givansot4581
@givansot4581 3 жыл бұрын
thanks a lot. I am doing a machine learning project and do web scraping in the same code...thanks this is better
@TcRiverrat18
@TcRiverrat18 2 жыл бұрын
Excellent work breaking this down. I have only used R, but this seemed incredibly intuitive. Thank you!
@soufianelamsiah4337
@soufianelamsiah4337 3 жыл бұрын
what would be best for comparing prices between competitors?
@usmanafridi9668
@usmanafridi9668 3 жыл бұрын
Amazing! I am totally new to web scraping. I tried to scrape the website using beautiful soup library for 4 days now, but I can't get past the basics. You have extremely simplified it for me. For instance, I just scraped data from Wikipedia about the list of countries and their population and got the whole table in the first attempt. Thank you so much! I wonder if this can be used for other pages like LinkedIn, Glassdoor data collection? Because there are no tables there. Professor, thank you so much once again!
@DataProfessor
@DataProfessor 3 жыл бұрын
Glad to hear that the video was helpful! For non-tabular pages you may have to use beautifulsoup and/or selenium
@mj7146
@mj7146 4 жыл бұрын
Great content ! Any idea on how I can scrape data for example from linkedin Jobs Postings. I found Octoparse for this, any ideas?
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks Mert for the kind comment. pandas works only for tabular data from webpages. For linkedin posts, we'll probably have to use beautiful soup for that. I might make a future video about that, will put it into the to-do list.
@mj7146
@mj7146 4 жыл бұрын
Data Professor thank you 🙏
@DataProfessor
@DataProfessor 4 жыл бұрын
@@mj7146 A pleasure!
@oguguaonyinyechi4980
@oguguaonyinyechi4980 4 жыл бұрын
@@DataProfessor Hi Data Professor, we are still expecting this :grin:
@shwetaredkar734
@shwetaredkar734 4 жыл бұрын
Informative.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks Shweta for the kind comment!
@badraboufirasse433
@badraboufirasse433 4 жыл бұрын
Very helpful thank you!
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks Badr for the kind words!
@jojushaji3010
@jojushaji3010 4 жыл бұрын
Ure awesome sr
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for the kind words
@Troglodyte2021
@Troglodyte2021 4 жыл бұрын
A great tutorial!
@DataProfessor
@DataProfessor 4 жыл бұрын
Thank you!
@cllim80
@cllim80 4 жыл бұрын
Thank you for the clear explanation !
@DataProfessor
@DataProfessor 4 жыл бұрын
A pleasure! Thanks for watching 😃
@kalyanprasad4069
@kalyanprasad4069 4 жыл бұрын
How do we deal when we encounter the error "HTTP Error 403: Forbidden" while reading url with Pandas? How should we proceed in this case? Kindly advise.
@blankmedia01
@blankmedia01 4 жыл бұрын
Hey I tried using the code on Wikipedia to scrape tables on Wikipedia. When it comes to scraping on place with loads of other data and i just want to pull the table alone is there a method for that? As with current code im pulling whole page. And I just want the playoff stats... i think I'm supposed to creat dictionary then assign it to a dataframe but I dont know how when it comes to urls and websites.
@aniwahidaabdulrahim2538
@aniwahidaabdulrahim2538 4 жыл бұрын
Hello Professor, I would like to suggest you to publish a video about RSelenium which use with Selenium Webdriver for automation system testing :D Hope it may benefits others. This is just my humble suggestion.
@DataProfessor
@DataProfessor 4 жыл бұрын
Great suggestion! I have played around with Selenium for Python and have found it pretty powerful. What I made so far was a short script that can take screenshots of my youtube channel's page (or any webpage).
@randyluong6275
@randyluong6275 4 жыл бұрын
this tutorial gets my subscription. Thank you Professor. :)
@DataProfessor
@DataProfessor 4 жыл бұрын
Wow, glad to hear that, welcome aboard 😃
@rogerwprice
@rogerwprice 4 жыл бұрын
Fabulous - it's soooo easy when you know how!
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for watching Roger, absolutely agreed with that 😃
@da_ta
@da_ta 4 жыл бұрын
Great well explained clear and excellent quality of sound. Thanks for doing this keep it up!
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for the encouragement 😃
@nourarifi2642
@nourarifi2642 4 жыл бұрын
thank you for your video my question if there are many tables in so many pages (20000 page) what should I do ???
@DataProfessor
@DataProfessor 4 жыл бұрын
The pandas read_html function is suitable for a simple webpage with relatively few tables. For more complex and large volume of pages I would recommend to look into beautifulsoup and selenium.
@luciferkhusrao
@luciferkhusrao 4 жыл бұрын
Awesome work by the hero! Keep teaching like this
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for the encouragement 😃
@pauloreis8868
@pauloreis8868 4 жыл бұрын
Hi, Professor! Thank you for the contents you brings to us, it really helps! \o/ Lately, I've been asking myself: How important is web scraping for a data scientist? How often do you web scrape? I just started learning it, I'll keep going and I wanted to know your thoughts about its relevance.
@DataProfessor
@DataProfessor 4 жыл бұрын
Hi Paulo, webscraping comes in handy when you want to create your own dataset from available data on the internet. For example, you want to analyze the salary of data scientists from glassdoor database then you can do that with webscraping. Hope this helps 😃
@kennykern6292
@kennykern6292 4 жыл бұрын
This helped thanks!
@DataProfessor
@DataProfessor 4 жыл бұрын
Glad it helped!
@muhammadjamalahmed8664
@muhammadjamalahmed8664 4 жыл бұрын
Please don't stop making videos. These videos really helps alot.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thank you, glad it was helpful!
@vyacheslavgorkunov3790
@vyacheslavgorkunov3790 4 жыл бұрын
Thx for the video, was really helpful. I wish u more subscribers, man ;)
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for the support! 😃
@salikmalik7631
@salikmalik7631 4 жыл бұрын
Really awesome.. Data Professor
@DataProfessor
@DataProfessor 4 жыл бұрын
Salik, Thanks!
@melshae8630
@melshae8630 6 ай бұрын
Wow your video is the best , it took me forever to run this .This video helped me in 5 min. Thank you !!!
@nickolaisimmons4638
@nickolaisimmons4638 2 жыл бұрын
Wow this is a great video! Very well organised!
@prashant381
@prashant381 2 жыл бұрын
A query, in row 12 , why are we using .index along with df.drop ? why wouldn't df.drop work without it ?
@kwanpakshing
@kwanpakshing 3 жыл бұрын
The video is great. But the screen text us way too small to read. Suggest that you can enlarge the font or reduce the white space in the screen to make the video no e readable
@DataProfessor
@DataProfessor 3 жыл бұрын
Thanks for the suggestion, greatly appreciate it, yes in recent videos I have increased the font size.
@engr.inigo.silva2000
@engr.inigo.silva2000 2 жыл бұрын
Bravo Data Professor, nice lecture!
@shankaricharan510
@shankaricharan510 8 ай бұрын
Thanks a lot - this helped a lot.
@fazlaynur4509
@fazlaynur4509 3 жыл бұрын
Thanks bro, for your nice tutorials
@DataProfessor
@DataProfessor 3 жыл бұрын
It's my pleasure
@manishabheemanpelly3580
@manishabheemanpelly3580 3 жыл бұрын
Thank you so much for this concept it was really time saving one!
@DataProfessor
@DataProfessor 3 жыл бұрын
Glad it was helpful!
@amoahs7779
@amoahs7779 3 жыл бұрын
Hi professor I truly enjoy your videos and have learnt a lot may God keep you successful in life. A question that's been on my mind is what laptop do you use as I really like the keyboard sound when you type unless you are using a external keyboard. Is it possible for you to show us a set-up of your desk ? Kind regards
@DataProfessor
@DataProfessor 3 жыл бұрын
Hi, I'm using a MacBook Pro (2016) and yes the keyboard feel is good on this laptop although being a bit flat which is a good thing as it allows minimal effort in moving from one button to the next.
@danniliu2544
@danniliu2544 3 жыл бұрын
Hi Data Professor, thanks for this video. It's very helpful. I'm a newbie starting out in data science and web scraping. Just wondering can you use pandas functionality for scraping data that are not laid out in table? and how would you do that? could you perhaps create a video on scraping non tabular data if you haven't already?
@DataProfessor
@DataProfessor 3 жыл бұрын
Great question, to web scrape non-tabular data you can look into using beautiful soup and also selenium libraries for Python
@danniliu2544
@danniliu2544 3 жыл бұрын
@@DataProfessor thank you for the pointer, much appreciated!
@Panucci75
@Panucci75 3 жыл бұрын
Exactly the question I was gonna ask. Thanks.
@sangpark7656
@sangpark7656 3 жыл бұрын
Hi Professor does the original data need to be a html file to start with? Does the original data always need to have a table to extract data?
@DataProfessor
@DataProfessor 3 жыл бұрын
Yes to both questions, that’s the limitation of this approach. Other than that selenium + beautifulsoup is a good combo to look into.
@sangpark7656
@sangpark7656 3 жыл бұрын
I see. Thank you very much for the guidance!!@@DataProfessor
@Moonlight-jx2sj
@Moonlight-jx2sj 4 жыл бұрын
Amazing! your video helped me with my 1st homework in Data Mining. And also thinking to jump into data science, so Thank you so much! Like and Subscription!
@DataProfessor
@DataProfessor 4 жыл бұрын
Glad I could help! And welcome to Data science!
@wisjnujudho3152
@wisjnujudho3152 2 жыл бұрын
this is exciting. i love pandas
@legacylifey182
@legacylifey182 3 жыл бұрын
Thank you so much for this concept it was really helpful respect !
@spacebird9430
@spacebird9430 4 жыл бұрын
hey professor, thankyou for the content. but i was wondering when we are scrapping by just passing the link how does it know to only read data from the table and not any other information.
@DataProfessor
@DataProfessor 4 жыл бұрын
Hi, the function will detect HTML syntax. The syntax for tables in HTML is and the read_html() function finds these to figure out that they are tables and extracts the data.
@lolsucks3599
@lolsucks3599 2 жыл бұрын
Is there an api for sports results? or you have to do it via web scraping?
@lucianodomingues2290
@lucianodomingues2290 4 жыл бұрын
Great video Professor!
@DataProfessor
@DataProfessor 4 жыл бұрын
Glad you liked it!
@RyanLoh
@RyanLoh 2 жыл бұрын
Can you also use df2019(df2019[‘Age’] == ‘Age’) to find the ages containing the word ‘Age’?
@narongtumsri-ubol1737
@narongtumsri-ubol1737 4 жыл бұрын
thank for knowledge
@DataProfessor
@DataProfessor 4 жыл бұрын
A pleasure, thanks for watching
@raphaellutz2693
@raphaellutz2693 2 жыл бұрын
Very nice video
@DataProfessor
@DataProfessor 2 жыл бұрын
Thanks :)
@markslima1557
@markslima1557 2 жыл бұрын
very cool thanks!
@sanjj_1
@sanjj_1 3 жыл бұрын
f strings are more readable compared to the .format() method
@tannyamishra9291
@tannyamishra9291 2 жыл бұрын
Can you please explain how to read all the retrieved urls
@vaasudhfp2874
@vaasudhfp2874 3 жыл бұрын
not working for other sites i did it for tripadvisor nothing came
@priyalshah8869
@priyalshah8869 2 жыл бұрын
How do I keep the url that the coloum tm has in my dataframe?
@nowdevoted1649
@nowdevoted1649 4 жыл бұрын
Superb, let me bring you some more guys to your channel
@DataProfessor
@DataProfessor 4 жыл бұрын
Awesome, welcome to the channel!
@nowdevoted1649
@nowdevoted1649 4 жыл бұрын
@@DataProfessor 🙏
@ekoatm1914
@ekoatm1914 3 жыл бұрын
Matur nuwun sanget sedulur....
@sameermehdi3143
@sameermehdi3143 2 жыл бұрын
Thankyou so much sir
@AmitKumar-hm4gx
@AmitKumar-hm4gx 3 жыл бұрын
Do you know if we can use this to scrape sites built with dynamic JS, and how do we do this if we have to login ?
@piyushyadav7162
@piyushyadav7162 Жыл бұрын
Hi! ken jee, I try your code of web screping on kaggle but I'm getting RLError: error. i try to solve but i cannot resolve ...please give me your suggestions
@DataProfessor
@DataProfessor Жыл бұрын
Hi Piyush, The pandas library allows scraping webpages that have tabular data such as from Wikipedia. It is really limited to those with a predefined table format. To scrape webpages I'd recommend looking into selenium and beautifulsoup
@argiepoul7457
@argiepoul7457 3 жыл бұрын
What are the prerequisites to watch this tutorial? I know some python, is this ok?
@DataProfessor
@DataProfessor 3 жыл бұрын
Yes, beginner’s level of Python is sufficient to follow along.
@XoreLP
@XoreLP 3 жыл бұрын
Why did you use string.format instead of String concatination
@Papiii_benz
@Papiii_benz 4 жыл бұрын
Thanks !
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for watching!
@harshitsharma8131
@harshitsharma8131 2 жыл бұрын
what if there is no table on a web page ??
@moatasimashraf6818
@moatasimashraf6818 3 жыл бұрын
(ImportError: lxml not found, please install it) I got this error. what is the solution?
@DataProfessor
@DataProfessor 3 жыл бұрын
Hi, you can install lxml via pip install lxml
@moatasimashraf6818
@moatasimashraf6818 3 жыл бұрын
@@DataProfessor Done it, thank U
@lyhuutai3339
@lyhuutai3339 3 жыл бұрын
how to save df to excel ? please
@saulo_foot
@saulo_foot 3 жыл бұрын
Every link turns into a df. How can I concatenate all the dfs?
@DataProfessor
@DataProfessor 3 жыл бұрын
Hi, dfs can be concatenated using the pd.concat() function, you can play around with axis=0 or axis=1 depending on how you want to combine the dfs (side by side or stacked on top of the other)
@mootaz3944
@mootaz3944 3 жыл бұрын
i try it on ur channel ( just for testing lol )
@qi8983
@qi8983 2 жыл бұрын
Awesome
@tareqmahmud3902
@tareqmahmud3902 3 жыл бұрын
You look like jomatech's big brother :O
@DataProfessor
@DataProfessor 3 жыл бұрын
Haha, I get that a lot. Joma and I should do a collab video 😆
@tareqmahmud3902
@tareqmahmud3902 3 жыл бұрын
@@DataProfessor But Sir I learned a week's lesson from one of your 10 minute video. I can't be more grateful to you. Thank you.
@DataProfessor
@DataProfessor 3 жыл бұрын
@@tareqmahmud3902Thanks, glad to hear that they’re helpful! 😊
@alexwatson6370
@alexwatson6370 4 жыл бұрын
Don't name your variables str or you will shadow the string builtin
@DataProfessor
@DataProfessor 4 жыл бұрын
You're right, many thanks for pointing that out, why did I do that. I've changed it to url_link now.
@ishpandey7886
@ishpandey7886 4 жыл бұрын
Is this useful for every situation? I am trying to fetch data from glassdoor but this method is not working Link: "www.glassdoor.co.in/Job/bengaluru-data-analyst-jobs-SRCH_IL.0,9_IC2940587_KO10,22.htm"
Find and Find_All | Web Scraping in Python
12:10
Alex The Analyst
Рет қаралды 69 М.
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 4,5 МЛН
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 42 МЛН
I thought one thing and the truth is something else 😂
00:34
عائلة ابو رعد Abo Raad family
Рет қаралды 17 МЛН
Exploratory Data Analysis in Python using pandas
28:52
Data Professor
Рет қаралды 58 М.
Web Scraping with Python - Start HERE
20:58
John Watson Rooney
Рет қаралды 39 М.
Web Scraping with Python and BeautifulSoup is THIS easy!
15:51
Thomas Janssen | Tom's Tech Academy
Рет қаралды 44 М.
Web Scraping Basics in Python
19:48
NeuralNine
Рет қаралды 19 М.
Scraping Data from a Real Website | Web Scraping in Python
25:23
Alex The Analyst
Рет қаралды 517 М.
Inspecting Web Pages with HTML | Web Scraping in Python
5:55
Alex The Analyst
Рет қаралды 69 М.
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 4,5 МЛН