I didn't know about this pandas functionality! Great video!
@DataProfessor4 жыл бұрын
Wow, it's Ken Jee! Thanks for the comment and kind words! I also subscribe to your channel, great content by the way, especially the 6-part DS project from scratch series.
@KenJee_ds4 жыл бұрын
@@DataProfessor Thanks! I am loving your stuff as well. I need to start using colab more. Keep up the good work, the tutorials are very helpful!
@karthiavenger45774 жыл бұрын
You great bro Down to earth
@HVjugo3 жыл бұрын
I used this before, but I didn't knew that you can select the table using the brackets, awesome! Thanks for the video!
@DataProfessor3 жыл бұрын
Glad it's helpful, thanks for watching!
@monicadesai79284 жыл бұрын
Great Explanation of each step....right from opening file to end....because sometimes as a newbie we find difficult to which file to use from github also.....Thank you ....Great Video!
@DataProfessor4 жыл бұрын
Wow thanks for the encouraging words, glad you’ve found the video helpful 😊
@givansot45813 жыл бұрын
thanks a lot. I am doing a machine learning project and do web scraping in the same code...thanks this is better
@TcRiverrat182 жыл бұрын
Excellent work breaking this down. I have only used R, but this seemed incredibly intuitive. Thank you!
@soufianelamsiah43373 жыл бұрын
what would be best for comparing prices between competitors?
@usmanafridi96683 жыл бұрын
Amazing! I am totally new to web scraping. I tried to scrape the website using beautiful soup library for 4 days now, but I can't get past the basics. You have extremely simplified it for me. For instance, I just scraped data from Wikipedia about the list of countries and their population and got the whole table in the first attempt. Thank you so much! I wonder if this can be used for other pages like LinkedIn, Glassdoor data collection? Because there are no tables there. Professor, thank you so much once again!
@DataProfessor3 жыл бұрын
Glad to hear that the video was helpful! For non-tabular pages you may have to use beautifulsoup and/or selenium
@mj71464 жыл бұрын
Great content ! Any idea on how I can scrape data for example from linkedin Jobs Postings. I found Octoparse for this, any ideas?
@DataProfessor4 жыл бұрын
Thanks Mert for the kind comment. pandas works only for tabular data from webpages. For linkedin posts, we'll probably have to use beautiful soup for that. I might make a future video about that, will put it into the to-do list.
@mj71464 жыл бұрын
Data Professor thank you 🙏
@DataProfessor4 жыл бұрын
@@mj7146 A pleasure!
@oguguaonyinyechi49804 жыл бұрын
@@DataProfessor Hi Data Professor, we are still expecting this :grin:
@shwetaredkar7344 жыл бұрын
Informative.
@DataProfessor4 жыл бұрын
Thanks Shweta for the kind comment!
@badraboufirasse4334 жыл бұрын
Very helpful thank you!
@DataProfessor4 жыл бұрын
Thanks Badr for the kind words!
@jojushaji30104 жыл бұрын
Ure awesome sr
@DataProfessor4 жыл бұрын
Thanks for the kind words
@Troglodyte20214 жыл бұрын
A great tutorial!
@DataProfessor4 жыл бұрын
Thank you!
@cllim804 жыл бұрын
Thank you for the clear explanation !
@DataProfessor4 жыл бұрын
A pleasure! Thanks for watching 😃
@kalyanprasad40694 жыл бұрын
How do we deal when we encounter the error "HTTP Error 403: Forbidden" while reading url with Pandas? How should we proceed in this case? Kindly advise.
@blankmedia014 жыл бұрын
Hey I tried using the code on Wikipedia to scrape tables on Wikipedia. When it comes to scraping on place with loads of other data and i just want to pull the table alone is there a method for that? As with current code im pulling whole page. And I just want the playoff stats... i think I'm supposed to creat dictionary then assign it to a dataframe but I dont know how when it comes to urls and websites.
@aniwahidaabdulrahim25384 жыл бұрын
Hello Professor, I would like to suggest you to publish a video about RSelenium which use with Selenium Webdriver for automation system testing :D Hope it may benefits others. This is just my humble suggestion.
@DataProfessor4 жыл бұрын
Great suggestion! I have played around with Selenium for Python and have found it pretty powerful. What I made so far was a short script that can take screenshots of my youtube channel's page (or any webpage).
@randyluong62754 жыл бұрын
this tutorial gets my subscription. Thank you Professor. :)
@DataProfessor4 жыл бұрын
Wow, glad to hear that, welcome aboard 😃
@rogerwprice4 жыл бұрын
Fabulous - it's soooo easy when you know how!
@DataProfessor4 жыл бұрын
Thanks for watching Roger, absolutely agreed with that 😃
@da_ta4 жыл бұрын
Great well explained clear and excellent quality of sound. Thanks for doing this keep it up!
@DataProfessor4 жыл бұрын
Thanks for the encouragement 😃
@nourarifi26424 жыл бұрын
thank you for your video my question if there are many tables in so many pages (20000 page) what should I do ???
@DataProfessor4 жыл бұрын
The pandas read_html function is suitable for a simple webpage with relatively few tables. For more complex and large volume of pages I would recommend to look into beautifulsoup and selenium.
@luciferkhusrao4 жыл бұрын
Awesome work by the hero! Keep teaching like this
@DataProfessor4 жыл бұрын
Thanks for the encouragement 😃
@pauloreis88684 жыл бұрын
Hi, Professor! Thank you for the contents you brings to us, it really helps! \o/ Lately, I've been asking myself: How important is web scraping for a data scientist? How often do you web scrape? I just started learning it, I'll keep going and I wanted to know your thoughts about its relevance.
@DataProfessor4 жыл бұрын
Hi Paulo, webscraping comes in handy when you want to create your own dataset from available data on the internet. For example, you want to analyze the salary of data scientists from glassdoor database then you can do that with webscraping. Hope this helps 😃
@kennykern62924 жыл бұрын
This helped thanks!
@DataProfessor4 жыл бұрын
Glad it helped!
@muhammadjamalahmed86644 жыл бұрын
Please don't stop making videos. These videos really helps alot.
@DataProfessor4 жыл бұрын
Thank you, glad it was helpful!
@vyacheslavgorkunov37904 жыл бұрын
Thx for the video, was really helpful. I wish u more subscribers, man ;)
@DataProfessor4 жыл бұрын
Thanks for the support! 😃
@salikmalik76314 жыл бұрын
Really awesome.. Data Professor
@DataProfessor4 жыл бұрын
Salik, Thanks!
@melshae86306 ай бұрын
Wow your video is the best , it took me forever to run this .This video helped me in 5 min. Thank you !!!
@nickolaisimmons46382 жыл бұрын
Wow this is a great video! Very well organised!
@prashant3812 жыл бұрын
A query, in row 12 , why are we using .index along with df.drop ? why wouldn't df.drop work without it ?
@kwanpakshing3 жыл бұрын
The video is great. But the screen text us way too small to read. Suggest that you can enlarge the font or reduce the white space in the screen to make the video no e readable
@DataProfessor3 жыл бұрын
Thanks for the suggestion, greatly appreciate it, yes in recent videos I have increased the font size.
@engr.inigo.silva20002 жыл бұрын
Bravo Data Professor, nice lecture!
@shankaricharan5108 ай бұрын
Thanks a lot - this helped a lot.
@fazlaynur45093 жыл бұрын
Thanks bro, for your nice tutorials
@DataProfessor3 жыл бұрын
It's my pleasure
@manishabheemanpelly35803 жыл бұрын
Thank you so much for this concept it was really time saving one!
@DataProfessor3 жыл бұрын
Glad it was helpful!
@amoahs77793 жыл бұрын
Hi professor I truly enjoy your videos and have learnt a lot may God keep you successful in life. A question that's been on my mind is what laptop do you use as I really like the keyboard sound when you type unless you are using a external keyboard. Is it possible for you to show us a set-up of your desk ? Kind regards
@DataProfessor3 жыл бұрын
Hi, I'm using a MacBook Pro (2016) and yes the keyboard feel is good on this laptop although being a bit flat which is a good thing as it allows minimal effort in moving from one button to the next.
@danniliu25443 жыл бұрын
Hi Data Professor, thanks for this video. It's very helpful. I'm a newbie starting out in data science and web scraping. Just wondering can you use pandas functionality for scraping data that are not laid out in table? and how would you do that? could you perhaps create a video on scraping non tabular data if you haven't already?
@DataProfessor3 жыл бұрын
Great question, to web scrape non-tabular data you can look into using beautiful soup and also selenium libraries for Python
@danniliu25443 жыл бұрын
@@DataProfessor thank you for the pointer, much appreciated!
@Panucci753 жыл бұрын
Exactly the question I was gonna ask. Thanks.
@sangpark76563 жыл бұрын
Hi Professor does the original data need to be a html file to start with? Does the original data always need to have a table to extract data?
@DataProfessor3 жыл бұрын
Yes to both questions, that’s the limitation of this approach. Other than that selenium + beautifulsoup is a good combo to look into.
@sangpark76563 жыл бұрын
I see. Thank you very much for the guidance!!@@DataProfessor
@Moonlight-jx2sj4 жыл бұрын
Amazing! your video helped me with my 1st homework in Data Mining. And also thinking to jump into data science, so Thank you so much! Like and Subscription!
@DataProfessor4 жыл бұрын
Glad I could help! And welcome to Data science!
@wisjnujudho31522 жыл бұрын
this is exciting. i love pandas
@legacylifey1823 жыл бұрын
Thank you so much for this concept it was really helpful respect !
@spacebird94304 жыл бұрын
hey professor, thankyou for the content. but i was wondering when we are scrapping by just passing the link how does it know to only read data from the table and not any other information.
@DataProfessor4 жыл бұрын
Hi, the function will detect HTML syntax. The syntax for tables in HTML is and the read_html() function finds these to figure out that they are tables and extracts the data.
@lolsucks35992 жыл бұрын
Is there an api for sports results? or you have to do it via web scraping?
@lucianodomingues22904 жыл бұрын
Great video Professor!
@DataProfessor4 жыл бұрын
Glad you liked it!
@RyanLoh2 жыл бұрын
Can you also use df2019(df2019[‘Age’] == ‘Age’) to find the ages containing the word ‘Age’?
@narongtumsri-ubol17374 жыл бұрын
thank for knowledge
@DataProfessor4 жыл бұрын
A pleasure, thanks for watching
@raphaellutz26932 жыл бұрын
Very nice video
@DataProfessor2 жыл бұрын
Thanks :)
@markslima15572 жыл бұрын
very cool thanks!
@sanjj_13 жыл бұрын
f strings are more readable compared to the .format() method
@tannyamishra92912 жыл бұрын
Can you please explain how to read all the retrieved urls
@vaasudhfp28743 жыл бұрын
not working for other sites i did it for tripadvisor nothing came
@priyalshah88692 жыл бұрын
How do I keep the url that the coloum tm has in my dataframe?
@nowdevoted16494 жыл бұрын
Superb, let me bring you some more guys to your channel
@DataProfessor4 жыл бұрын
Awesome, welcome to the channel!
@nowdevoted16494 жыл бұрын
@@DataProfessor 🙏
@ekoatm19143 жыл бұрын
Matur nuwun sanget sedulur....
@sameermehdi31432 жыл бұрын
Thankyou so much sir
@AmitKumar-hm4gx3 жыл бұрын
Do you know if we can use this to scrape sites built with dynamic JS, and how do we do this if we have to login ?
@piyushyadav7162 Жыл бұрын
Hi! ken jee, I try your code of web screping on kaggle but I'm getting RLError: error. i try to solve but i cannot resolve ...please give me your suggestions
@DataProfessor Жыл бұрын
Hi Piyush, The pandas library allows scraping webpages that have tabular data such as from Wikipedia. It is really limited to those with a predefined table format. To scrape webpages I'd recommend looking into selenium and beautifulsoup
@argiepoul74573 жыл бұрын
What are the prerequisites to watch this tutorial? I know some python, is this ok?
@DataProfessor3 жыл бұрын
Yes, beginner’s level of Python is sufficient to follow along.
@XoreLP3 жыл бұрын
Why did you use string.format instead of String concatination
@Papiii_benz4 жыл бұрын
Thanks !
@DataProfessor4 жыл бұрын
Thanks for watching!
@harshitsharma81312 жыл бұрын
what if there is no table on a web page ??
@moatasimashraf68183 жыл бұрын
(ImportError: lxml not found, please install it) I got this error. what is the solution?
@DataProfessor3 жыл бұрын
Hi, you can install lxml via pip install lxml
@moatasimashraf68183 жыл бұрын
@@DataProfessor Done it, thank U
@lyhuutai33393 жыл бұрын
how to save df to excel ? please
@saulo_foot3 жыл бұрын
Every link turns into a df. How can I concatenate all the dfs?
@DataProfessor3 жыл бұрын
Hi, dfs can be concatenated using the pd.concat() function, you can play around with axis=0 or axis=1 depending on how you want to combine the dfs (side by side or stacked on top of the other)
@mootaz39443 жыл бұрын
i try it on ur channel ( just for testing lol )
@qi89832 жыл бұрын
Awesome
@tareqmahmud39023 жыл бұрын
You look like jomatech's big brother :O
@DataProfessor3 жыл бұрын
Haha, I get that a lot. Joma and I should do a collab video 😆
@tareqmahmud39023 жыл бұрын
@@DataProfessor But Sir I learned a week's lesson from one of your 10 minute video. I can't be more grateful to you. Thank you.
@DataProfessor3 жыл бұрын
@@tareqmahmud3902Thanks, glad to hear that they’re helpful! 😊
@alexwatson63704 жыл бұрын
Don't name your variables str or you will shadow the string builtin
@DataProfessor4 жыл бұрын
You're right, many thanks for pointing that out, why did I do that. I've changed it to url_link now.
@ishpandey78864 жыл бұрын
Is this useful for every situation? I am trying to fetch data from glassdoor but this method is not working Link: "www.glassdoor.co.in/Job/bengaluru-data-analyst-jobs-SRCH_IL.0,9_IC2940587_KO10,22.htm"