Python and Requests-HTML - Web Scraping Dynamic Content from JavaScript applications

  Рет қаралды 15,191

BugBytes

BugBytes

Күн бұрын

In this video, we'll learn how to scrape content that is NOT present in initial page loads, but instead is loaded dynamically by JavaScript.
This is a common problem with scraping the modern web: the initial response contains minimal HTML and a SPA-based JavaScript app (React, Vue, Angular, etc). The data that we want to scrape on the page is therefore not present, but is rendered later via API calls from the SPA application.
We will look at how we can use requests-html to solve this issue in Python when scraping such sites. We'll also look at using this with BeautifulSoup in order to find data on the page.
This video makes use of the following sample website (a React application):
react-amazon-bestsellers-book...
📌 𝗖𝗵𝗮𝗽𝘁𝗲𝗿𝘀:
00:00 Intro
02:15 Sending GET request using Python requests library
04:00 Finding objects with BeautifulSoup
05:15 Installing requests-html
06:38 Executing JavaScript on page using requests-html
☕️ 𝗕𝘂𝘆 𝗺𝗲 𝗮 𝗰𝗼𝗳𝗳𝗲𝗲:
To support the channel and encourage new videos, please consider buying me a coffee here:
ko-fi.com/bugbytes
𝗦𝗼𝗰𝗶𝗮𝗹 𝗠𝗲𝗱𝗶𝗮:
📖 Blog: bugbytes.io/posts/
👾 Github: github.com/bugbytes-io/
🐦 Twitter: / bugbytesio
📚 𝗙𝘂𝗿𝘁𝗵𝗲𝗿 𝗿𝗲𝗮𝗱𝗶𝗻𝗴 𝗮𝗻𝗱 𝗶𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻:
requests-html: pypi.org/project/requests-html/
BeautifulSoup: beautiful-soup-4.readthedocs....
requests: pypi.org/project/requests/
Sample website: react-amazon-bestsellers-book...
#python #webscraping #datascience

Пікірлер: 47
@serdarfidan3451
@serdarfidan3451 9 ай бұрын
Hi, When I execute the response.html.render() function, I can see the terminal have downloaded the Chromium but it also throws the error down below: File "C:\Users\sfida\OneDrive\Masaüstü\python\pythonn\venv\Lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint raise BrowserError('Browser closed unexpectedly: ') pyppeteer.errors.BrowserError: Browser closed unexpectedly: Can you please help me out? There are no valid solutions on the web and I am using Windows at the moment. Thanks,
@ahassan7270
@ahassan7270 7 ай бұрын
Thank you SO MUCH for your WONDERFUL explanations. You are really GREAT in communicating the ideas in a very clear and simple way.
@bugbytes3923
@bugbytes3923 7 ай бұрын
Thanks very much, that's great to hear!
@GuilhermeBreda
@GuilhermeBreda 26 күн бұрын
Thank you! Your tutorial really helped me in a web scrapping project.
@bugbytes3923
@bugbytes3923 23 күн бұрын
Glad to hear it helped! Thanks for commenting.
@fernandtape9363
@fernandtape9363 Жыл бұрын
Great content as always. Thanks.
@bugbytes3923
@bugbytes3923 Жыл бұрын
Thank you, as always much appreciated!
@user-sl9fg8rl4j
@user-sl9fg8rl4j 6 ай бұрын
Wow, your explanations are absolutely fantastic! I really appreciate how you make complex things so easy to understand. Your content is always top-notch and has helped me out a lot. This video actually solved a problem I was stuck on. Thank you so much for all your hard work!
@bugbytes3923
@bugbytes3923 6 ай бұрын
Thanks a lot - amazing to hear that! Cheers.
@RichIrl
@RichIrl Жыл бұрын
This was great! Thanks!
@bugbytes3923
@bugbytes3923 Жыл бұрын
Thanks for watching!
@valentino7057
@valentino7057 Жыл бұрын
I would love to watch a series of this! Talking about SPA's, how can we achieve SPA in Django? A series with Django and React/any SPAs would be great. 😃
@bugbytes3923
@bugbytes3923 Жыл бұрын
Thank you! I'll be doing stuff with React/Vue etc in future, for sure.
@willysnowman
@willysnowman 4 ай бұрын
Thank you! This worked in colab: from bs4 import BeautifulSoup from requests_html import HTMLSession url = 'URL' from requests_html import AsyncHTMLSession asession = AsyncHTMLSession() response = await asession.get(url) await response.html.arender() resp=response.html.raw_html print(response.status_code) # print(response.html.html) # print(response.html.find('article.book')) soup = BeautifulSoup(response.html.html, 'html.parser') books = soup.find_all('article', class_= 'book') for book in books: print(book.find('h2').text)
@bugbytes3923
@bugbytes3923 4 ай бұрын
Awesome!
@abderrazzakaljalil1375
@abderrazzakaljalil1375 10 ай бұрын
Good job, keep going on 👍
@bugbytes3923
@bugbytes3923 10 ай бұрын
Thanks a lot!
@ResilientFighter
@ResilientFighter 7 ай бұрын
this solved my problem. thank you
@bugbytes3923
@bugbytes3923 7 ай бұрын
Glad to hear it, thanks for the comment!
@gerhardspitzlsperger1567
@gerhardspitzlsperger1567 Жыл бұрын
Very helpful
@bugbytes3923
@bugbytes3923 Жыл бұрын
Thank you Gerhard!
@SOHAILKHAN-iu8fu
@SOHAILKHAN-iu8fu Жыл бұрын
Great content
@bugbytes3923
@bugbytes3923 Жыл бұрын
Thanks!
@dobcs3236
@dobcs3236 Жыл бұрын
I do not know how to thank you, if there were a million likes, I would have done for you ... Thank you, thank you
@bugbytes3923
@bugbytes3923 Жыл бұрын
You are welcome! Thanks a lot for watching!
@selvas5043
@selvas5043 3 ай бұрын
Wow ...Thanks a lot.
@bugbytes3923
@bugbytes3923 3 ай бұрын
Thanks for watching!
@ffgaming-fe3cx
@ffgaming-fe3cx 6 ай бұрын
tons of thanks
@bugbytes3923
@bugbytes3923 5 ай бұрын
Thanks for watching!
@sasanandeh3653
@sasanandeh3653 7 ай бұрын
How can i Handel button like add to cart?
@Anu_was_here
@Anu_was_here Ай бұрын
Not sure if you still respond on this old video, But i have a question: What if i have this "root", but it's a "tooltip-root"; that doesn't get filled with html unless I hover over the component. Noting that: it works without internet (if page is just loaded then i disconnect wifi, i can hover and see all contents). Would this library help me? (I went selenium route, but it's too cumbersome and slow with issues over time).
@user-rj6hv4zf6w
@user-rj6hv4zf6w Жыл бұрын
Thank you for the good content, is there a possibility to have a serie about how to use tasks on django (schdule tasks to excute on background) like doing a check every midnight...
@andreaszweili8593
@andreaszweili8593 Жыл бұрын
Great video, Selenium would certainly be interesting but for my use cases I reckon this would most likely already be enough. BTW. how do deal with packages that haven't seen updates in a few years? The latest commit from requests-html is three years old at this point.
@bugbytes3923
@bugbytes3923 Жыл бұрын
Thanks! I agree, usually requests-html does the job unless we need to perform actions such as clicking elements to load content. Good point on the lack of updates. At the moment, I think the library continues to do its basic job well, so I'm happy enough to keep using it until a newer and better-maintained alternative comes along. There might already be one out there - if anyone is aware of one, please let me know!
@frameff9073
@frameff9073 Жыл бұрын
thank
@bugbytes3923
@bugbytes3923 Жыл бұрын
You're welcome!
@SugengWahyudi
@SugengWahyudi 6 ай бұрын
Please add website that use login page and javacript and csrf token
@Wesley9xd
@Wesley9xd 8 ай бұрын
Not working in Angular applications.
@rsmehta207
@rsmehta207 11 ай бұрын
Is it possible for you to help me?
@brianaragon1641
@brianaragon1641 2 ай бұрын
having some many porblems with libraries, i think many of them at this date, all changed in some way...
@bugbytes3923
@bugbytes3923 2 ай бұрын
Thanks for the comment. Maybe the video needs an update.
@nnaemekacephas5931
@nnaemekacephas5931 Ай бұрын
Did you make a video on how to use selenium?
@bugbytes3923
@bugbytes3923 Ай бұрын
Not yet - would you like to see something with Selenium?
@nnaemekacephas5931
@nnaemekacephas5931 Ай бұрын
@@bugbytes3923 yes, that would be great
Advanced Web Scraping Tutorial! (w/ Python Beautiful Soup Library)
42:43
Web Scraping + Reverse Engineering APIs
52:33
Syntax
Рет қаралды 5 М.
КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂
00:29
МЯТНАЯ ФАНТА
Рет қаралды 10 МЛН
Stay on your way 🛤️✨
00:34
A4
Рет қаралды 22 МЛН
Scraping Dynamic JavaScript Websites - Beautiful Soup Python
11:38
This script I threw together saves me hours.
13:38
John Watson Rooney
Рет қаралды 18 М.
Intro To Web Scraping With Puppeteer
21:24
Traversy Media
Рет қаралды 98 М.
Web Scraping With Selenium Python: Delayed JavaScript Rendering
7:43
Python Tutorial: Web Scraping with Requests-HTML
56:27
Corey Schafer
Рет қаралды 190 М.
The Biggest Issues I've Faced Web Scraping (and how to fix them)
15:03
Python and Scrapy - Scraping Dynamic Site (Populated with JavaScript)
15:40
Web Scraping with Python - Start HERE
20:58
John Watson Rooney
Рет қаралды 32 М.
Always Check for the Hidden API when Web Scraping
11:50
John Watson Rooney
Рет қаралды 613 М.
Web Scraping with Python - Beautiful Soup Crash Course
1:08:23
freeCodeCamp.org
Рет қаралды 1,5 МЛН
КАК ДУМАЕТЕ КТО ВЫЙГРАЕТ😂
00:29
МЯТНАЯ ФАНТА
Рет қаралды 10 МЛН