Scraping Dynamic JavaScript Websites

Scraping Dynamic JavaScript Websites - Beautiful Soup Python

Рет қаралды 60,128

Күн бұрын

Building your own scraper to scrape dynamic websites? Watch this video tutorial to learn useful techniques. Or, forget scraping problems with Oxylabs Scraper APIs FREE trial 👉oxy.yt/2iM
Gathering data from static websites is usually simple, but scraping data from dynamic websites can be challenging. Python is a popular choice for this task due to its many helpful libraries and extensive documentation.
This tutorial will guide you through the process of scraping dynamic websites. You'll learn how to use a browser to check if a website is dynamically rendered with JavaScript and locate AJAX calls that load additional data.
We recommend using a Chromium-based browser to spot dynamically rendered content. To scrape the data, combine Selenium (or Python’s Requests library) to make HTTP requests, and BeautifulSoup to parse the raw HTML. Once your web scraping script is ready, use a headless browser to speed up the process.
📚 OTHER RESOURCES
Best Python Libraries for Web Scraping:
oxy.yt/Kt6L
🔧 OUR SCRAPING SOLUTIONS
Residential Proxies:
👉 oxy.yt/3pJ
Shared Datacenter Proxies:
👉 oxy.yt/oa5
Dedicated Datacenter Proxies
👉 oxy.yt/7s3
SOCKS5 Proxies:
👉 oxy.yt/PdB
✅ Grow Your Business with Top-Tier Web Data Collection Infrastructure: oxy.yt/Qoi
🤝 LET'S CONNECT
/ discord
⏳ TIMESTAMPS
0:00 Introduction
0:45 How to See if the Website is Dynamic
1:35 Can BeautifulSoup Render Javascript?
2:16 How to Scrape Data From a Dynamic Website
3:35 Finding Elements by Using Selenium
5:16 Finding Elements by Using BeautifulSoup
6:33 Python Scraping With a Headless Browser
7:05 Locating AJAX Calls
9:40 Data Embedding in Other Pages
11:11 Conclusion
🎥 RELATED VIDEOS
Learn how to extract data to Excel:
• How To Extract Scraped... ...
Find out how to scrape multiple URLs:
• How To Scrape Multiple...
For more topics on all things web scraping:
• Learn More About Proxi...
Subscribe for more: kzbin.info?sub...
© 2022 Oxylabs. All rights reserved.
#Oxylabs #WebScraping #BeautifulSoup

Пікірлер: 77

@oxylabs 2 жыл бұрын

Thank you for watching! We hope you find this video helpful! Please leave a comment if you have any questions. If you are interested in web scraping and tutorial videos, subscribe to our KZbin channel: kzbin.info

@abhijitboda 2 жыл бұрын

This is gold guideline. Literally covered most of the cases

@oxylabs 2 жыл бұрын

We're very delighted to read this!

@roystonfurtado 5 ай бұрын

Wow. Great Video! I was looking for a video that highlights realistic and efficient web scraping and this is it. Thanks.

@DigitalAlligator Жыл бұрын

This is right on spot, most other videos are not even close to mention all

@mehdiahmed7836 2 жыл бұрын

short detailed very informative, that's how a good tutorial is made Thanks

@oxylabs 2 жыл бұрын

Thank you so much!

@toshirv341 Жыл бұрын

Thank you for the informative tutorial! I will probably try web scrapping over the next month, so I'll comment here again if I have any problem!

@oxylabs Жыл бұрын

Thanks for watching, and definitely reach out if you need any help!

@masterbe591 10 ай бұрын

Thanks for this video. Never thought about to use F12 and Network-Tab to find the source of websites data. greetings

@angellaz3869 2 жыл бұрын

😭😭😭 Idk how to say thank you.. I've been searching for a help for this ajax stuff. this is the one I can say made my day

@oxylabs 2 жыл бұрын

That's so sweet to hear! Glad you enjoyed it!

@mhrasoulian 2 жыл бұрын

It was very useful. Thank you!

@oxylabs 2 жыл бұрын

We're very glad to you liked it!

@anjifeldspar8804 Жыл бұрын

Thanks for making this

@Johnbrown-op5xt 2 жыл бұрын

Thank you so much. Great useful info.

@oxylabs 2 жыл бұрын

Glad you liked it!

@ivandechiara3415 Жыл бұрын

THANKS ! You saved my life! :)

@oxylabs Жыл бұрын

Happy we could help!

@KhalilYasser 2 жыл бұрын

Awesome. Thank you very much.

@oxylabs 2 жыл бұрын

Thank YOU for the support!

@programmingpriest4862 Жыл бұрын

Thank you so much. much needed

@oxylabs Жыл бұрын

We glad it helped!

@dakooki Жыл бұрын

is this possible with website that requires a user input from the user for example adding a quantity or selecting a shipping service ?

@tarunbhardwaj121 2 жыл бұрын

This video helped me a lot.

@oxylabs 2 жыл бұрын

Thank you!

@jjaemc7389 2 жыл бұрын

i'm soooooo appreciative of you😄

@oxylabs 2 жыл бұрын

Thanks for your feedback. It's much appreciated!

@lorsco4107 2 жыл бұрын

Awesome!

@oxylabs 2 жыл бұрын

Glad to hear!

@muneebrehman7842 2 жыл бұрын

You saved my day

@oxylabs 2 жыл бұрын

We're very happy to hear!

@nwizugbesamson6718 2 жыл бұрын

This is so perfect

@oxylabs 2 жыл бұрын

Thank you!

@maikwurl1484 Ай бұрын

Thank You

@vladimirantonov4506 Жыл бұрын

Hello! And why do all parsers analyze the same site? Interesting different approaches... Thanks for the interesting example!

@kamaleshpramanik7645 2 жыл бұрын

Thank you very much Madam ...

@oxylabs 2 жыл бұрын

You're welcome!

@ismaelperezmesa524 Жыл бұрын

thanks a lot for all these content you constantly share. I would like to ask you something: this tutorial example works if I want to deploy it on the web as an api for consuming it after? thank you so much

@oxylabs Жыл бұрын

Hey, we're glad you like our content! As for the tutorial, it's focused on showing how to build your own scraper. In case you want an easy way out, try Oxylabs Scraper APIs for free: oxy.yt/2iM

@user-xy4by2th9h 2 жыл бұрын

Thanks for the video. Could you please explain where you took value 'h3 >a' for select at the end of the video?

@oxylabs 2 жыл бұрын

Hello. We’re glad you enjoyed it! The h3 > a syntax just tells Beautiful Soup to get a tags that are directly beneath h3. You can try going to the link displayed in the video (librivox.org/search/?q=time%20machine&search_form=advanced), right-click on any book title and select “inspect”. This should open the exact place where you can see that a is bellow h3. Have fun!

@vbernard1 2 жыл бұрын

Hi, it was very good. Thanks. But I'm facing a problem: at the line 13, it is reported to me that "NameError: name 'data' is not defined". Any idea how to fix it?

@oxylabs 2 жыл бұрын

Hello! It seems that you’re trying to access a variable called data but it doesn't exist. Please double check the names for the variables you have defined. Also, there are couple of more scenarios when this error might get triggered - this article summarizes that quite well: www.geeksforgeeks.org/handling-nameerror-exception-in-python/

@Ned478 2 жыл бұрын

6:14 line 10. Is that pathing to the folder with chromedriver or the chromedriver.exe? Either way mine wont work

@oxylabs 2 жыл бұрын

Hey. That's a path to a chromedriver executable. Your question is hard to answer since we don't know what kind of error you are getting. If it's a File Not Found error, you would need to make sure that the path leading to the chromedriver is correct - try to use a full path instead of a relative one if it is failing for you. Also, if you're running on Windows OS, the chromedriver should have an .exe at the end. Hope this helps!

@carlostoledoFLA 3 ай бұрын

Hi, Thank's for this video! For me, in a dynamic sites, using selenium for a get page source, don't work ! Still responding in javascript tag's. The path of the server request and response is: browser request -> server response -> javascript response -> api response -> browser ? Thank's

@oxylabs 3 ай бұрын

Thank you for the feedback. It’s difficult to say without seeing the code, but have you tried using different web drivers? The situation might change if you add some actions to the page. Also, it might be handy for you to check the code in textual form: oxylabs.io/blog/dynamic-web-scraping-python.

@drewgatch4488 Жыл бұрын

Excellent video. Quick question - when I click Ctrl+U on the website, my source page looks different. I don't have anywhere and I have separating each section. Does this matter or was "script" just used to locate the data needed?

@oxylabs Жыл бұрын

Hello! If you're unable to locate in the source page of the website, do make sure that you visited "quotes.toscrape.com/js" (adding the `/js` at the end of the URL). This was important to ensure that you can follow the script parsing portion of the guide, as we then extracted information found in the tag. You can see the differences when comparing the source codes of "quotes.toscrape.com/js/" vs "quotes.toscrape.com"

@gianni_ari 2 ай бұрын

hi! I was not able to install the chrome driver, do you have any suggestion?

@oxylabs Ай бұрын

Hi! Could you specify why you couldn't install the driver? Was there an error message of any sorts?

@shreyasoni1302 Жыл бұрын

How to get the data when the tag source is not None instead there is a file mentioned

@oxylabs Жыл бұрын

Hello! Thank you for asking :) If there is a src= attribute, then you need to get the file content by doing additional request to the url defined in src= attribute. source = soup.find("script") link_to_file = source["src"]

@benasvalancius Жыл бұрын

dekoju, naudinga!

@python689 Жыл бұрын

Hello, help me please, how to get the text out "Wilson Tour Premier All Court 4B" soup = BeautifulSoup(html, 'lxml') title = soup.find('h1', class_='product--title') Tennis balls Wilson Tour Premier All Court 4B

@oxylabs Жыл бұрын

Hey, thanks for asking! There are a couple of ways (without modifying your find function) 1.Retrieve a list of tag's children and select last one on the list. Strip the white space afterwards: title.contents[-1].strip() 2. Retrieve the whole text of a title, split it by a double space and select last string on the list: title.text.split(" ")[-1]

@legit_nyel Жыл бұрын

This method is impossible if the script have src especially reCaptcha right?

@oxylabs Жыл бұрын

The presented method works on any and all types of web pages, both static and dynamic. As a page containing the reCaptcha is a type of a dynamic page, you can read, extract and manipulate data that is present on the page. :) Hope this helps you!

@capunzel5859 Жыл бұрын

Love the explanation, but also loved the music. Can you share the track id?

@oxylabs Жыл бұрын

Hey! So happy you enjoyed it :) The track is this one: Purple Planet Music - Corporate Planning

@ureshkayastha Жыл бұрын

Can you sent the new version selenium 4 video for dynamic web scraping

@oxylabs Жыл бұрын

We'll keep that in mind for our future videos :)

@mantasda1904 7 ай бұрын

why are you not using requests-html library? Seems to achieve the same in a simpler way

@oxylabs 7 ай бұрын

Good point, thanks for the feedback!

@voqz6667 4 ай бұрын

it's dead

@filippodiconno9923 2 жыл бұрын

9:15, what does line 10 do

@oxylabs 2 жыл бұрын

Hey! It defines a regular expression to match certain combinations of characters within a document. This one specifically is looking for whatever text thats between var data = and a new line.