Requests-HTML - Checking out a new HTML parsing library for Python

  Рет қаралды 29,595

sentdex

sentdex

Күн бұрын

Пікірлер: 54
@thehungman
@thehungman 6 жыл бұрын
I like this type of video. You should do like a monthly video of new module so people can be aware. This will be very useful people that learn python.
@sentdex
@sentdex 6 жыл бұрын
Yeah, I was thinking of doing it a bit more often, testing it here. There are many modules I am curious about, not not sure I'd ever do a serious series on them, just want to poke around with them like here.
@xXMockapapellaXx
@xXMockapapellaXx 6 жыл бұрын
"Module of the Month" playlist?
@rumidom
@rumidom 6 жыл бұрын
man, i'm mixing that HTML parsing sauce with my beautiful soup right now
@yokoono42
@yokoono42 6 жыл бұрын
Regarding the next() thing, it's dead simple. Have a look at: github.com/kennethreitz/requests-html/blob/master/requests_html.py#L431-L470 So basically they just have a list of symbols like 'more', 'next' or 'older' and look for their hrefs. So on HN page 2, the title from the CNBC story has the word 'more' in it. Haha. However, there are statistical method about how you can find out how a page uses pagination but I guess that's a bigger nut to crack for such a young library :)
@mshirazab
@mshirazab 6 жыл бұрын
Multiple classes in html is shown by spaces. So in CSS selectors, it will be separated by a '.'(dot) For example: will be referenced as about.find("div.foo.bar") Also, '(' and ')' are invalid css selector characters so you have to escape them. For more info, developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
@Hans-jc1ju
@Hans-jc1ju 6 жыл бұрын
18:52 I ‘think’ replacing the spaces with periods should fix the period error, no idea about the parentheses, though. Backslash or HTML escape?
@cooperlimond
@cooperlimond 6 жыл бұрын
I think the // for retry in range(100): // part is what is allowing the script to continue after raising the error. From their doc: "The simplest use case is retrying a flaky function whenever an Exception occurs until a value is returned." So this would allow the exception to be printed, yet the script to continue I believe. Great content man, thanks for all of the awesome videos :).
@MohamedMagdyHammad
@MohamedMagdyHammad 6 жыл бұрын
The function couldn't clean up user data because these files were locked by chromium process.
@joshuavillwo
@joshuavillwo 6 жыл бұрын
Mohamed Hammad I'll bet if you closed the request before the script ends, it won't error.
@rohnchatterjee7736
@rohnchatterjee7736 6 жыл бұрын
Best way to remove error -> comment out raise statement. 😁😂
@sentdex
@sentdex 6 жыл бұрын
Pesky errors just getting in the way!
@rohnchatterjee7736
@rohnchatterjee7736 6 жыл бұрын
sentdex Linux seemed to unaffected, I have thoroughly used the render in Linux even without sudo, no problem.
@Lucas-wl8py
@Lucas-wl8py 6 жыл бұрын
This is cool! And it gave me the idea of a series of videos about how to create a python package
@anyad111
@anyad111 5 жыл бұрын
Hello! Recently I am trying to parse some webpages with Requests-html asynchronously. Theoretically this can be done by working with AsyncHTMLSession. However, I am unable to get result with it most of the time (I also use arender, the attempts to parse the webpages fails due to different reasons - most probably timeouts). Maybe it's just the poor internet connection, but I'd be really grateful if you uploaded a video or help me with this.
@SimOn-bz4xy
@SimOn-bz4xy 6 жыл бұрын
Sentdex! Can you show scraping from a page with a "show more" button, that loads more of the page in JavaScript ?
@shazkingdom1702
@shazkingdom1702 4 жыл бұрын
Thank you so much!, I got the answer based on your guide. *nice helmet ⛑ you got back there*
@LolLol-wy5fp
@LolLol-wy5fp 6 жыл бұрын
Thank u sentdex you are leading me to the real world from africa
@SerenoMendes
@SerenoMendes 6 жыл бұрын
Great tool to build crawlers!
@WhiterockFTP
@WhiterockFTP 6 жыл бұрын
to find td‘s or other elements that pertain to multiple classes you just would have had to put dots in between. Read up on css selectors, jquery also uses them, pretty standard nowadays and less headache than xpaths ;)
@SkySesshomaru
@SkySesshomaru 6 жыл бұрын
Make another video building a crawler using it. Nice video!
@SimonEliasen123
@SimonEliasen123 6 жыл бұрын
Please make a video building a webcrawler, would be very insightful!
@rohnchatterjee7736
@rohnchatterjee7736 6 жыл бұрын
I think for paging one can use threading with a except statement and hopefully it will work.
@chengyaozheng8536
@chengyaozheng8536 5 жыл бұрын
for {basically anything you want to name it} in r.html: you get urls
@Hegelian10
@Hegelian10 4 жыл бұрын
Strange, I have installed requests_html but when I import it in a Python script in Python 2.x or 3.7, I get: ModuleNotFoundError: No module named 'requests_html'
@developerarchitect7523
@developerarchitect7523 6 жыл бұрын
what the extension, who print the result down?
@kylek29
@kylek29 6 жыл бұрын
Thanks for posting this. I've used BS4 and another module to do the JavaScript (render the page) on many projects, it's nice to have it in a concise package. Btw, I think the pagination on HackerNews failed because it looks for one of three (by default) "next" labels. "next", "more", "older" (DEFAULT_NEXT_SYMBOLS). The CNBC link has "more" in it.
@sentdex
@sentdex 6 жыл бұрын
Yep that's what someone else was saying too. I started an issue with a suggestion here github.com/kennethreitz/requests-html/issues/154 If I get some time, I'll try to do a PR for it.
@im4485
@im4485 3 жыл бұрын
hi, i cant find a way to select an element by its attribute. Can you help please?
@pyxelr
@pyxelr 6 жыл бұрын
Can I get some help how to install "requests-html" package to be run globally, for example, through Sublime Text? I am using Conda on Windows 10. I have been trying to do that, but as I understand so far, it runs only in virtual environment that cannot be used by Sublime? Correct me if I am wrong.
@muhammedeltabakh92
@muhammedeltabakh92 6 жыл бұрын
I love you man more than Mo Salah
@shmuel-k
@shmuel-k 6 жыл бұрын
You could have used a css selector when parsing the yahoo finance page
@rajshah9031
@rajshah9031 6 жыл бұрын
Is it useful for scraping website with ajax ??
@TheJohnny9506
@TheJohnny9506 6 жыл бұрын
Great tool for mixing with bs4 to build a robust crawler
@dave597
@dave597 6 жыл бұрын
wow, when did you start using sublime text? i havent seen your videos in a while but back then they were all done in notepad or idle! :)
@sentdex
@sentdex 6 жыл бұрын
Last few months, still use IDLE, but been mainly using sublime lately.
@ddbonpc
@ddbonpc 6 жыл бұрын
awesome video as always, quick question tho why don't you use linux ?
@sentdex
@sentdex 6 жыл бұрын
Linux is great for dev, not so much for a user. I do use linux every day for dev, but I rarely record in linux because it's often problematic.
@ddbonpc
@ddbonpc 6 жыл бұрын
Yeah I can normally get by with OBS but then I have to turn back to windows for Premier Pro. Anyway awesome videos ! I loved your series on GTA
@Christian-mn8dh
@Christian-mn8dh 2 жыл бұрын
why use requests-html when I can use requests?
@fhashim
@fhashim 4 жыл бұрын
Please make a tutorial on how to make asynchronous requests using grequest
@sak8485
@sak8485 6 жыл бұрын
Can you make a video about GANS, and some real time appplications of it
@Slacquerr
@Slacquerr 6 жыл бұрын
How well does it work on snippets or badly formed HTML?
@sentdex
@sentdex 6 жыл бұрын
I'd have to imagine it would vary on a case by case basis and what you're attempting to do.
@CodingTrades
@CodingTrades 6 жыл бұрын
you forgot to say what it is good for
@xiaokunxu7593
@xiaokunxu7593 6 жыл бұрын
I was thinking why not using r.html.find_all('td', {'class':re.compile(r'class name regex')}). turn out that's beautifulsoup funtion. But yeah, having the render() to run js is nice!
@DominikPiekarczyk
@DominikPiekarczyk 5 жыл бұрын
I wanted to give this Finance Yahoo example a try. Here's an example solution how to collect that data: github.com/kornislaw/scraping_recipes/blob/master/finance-yahoo_by_requests-html/run.py
@zurabkavtar9128
@zurabkavtar9128 5 жыл бұрын
Hello! I am new to python. Unfortunately, your code doesn’t work for me, maybe something has changed on the site? (finance.yahoo.com/quote/AMZN/key-statistics?p=AMZN)
@ayush0477
@ayush0477 6 жыл бұрын
Should i change my windows to 64-bit version ?
@sentdex
@sentdex 6 жыл бұрын
If possible, you definitely want 64 over 32. 32bit limits apps to 2gb of memory, which is unfortunate.
@kemalonat802
@kemalonat802 6 жыл бұрын
From Turkey👋👋👋
@hamrozjumaev
@hamrozjumaev 6 жыл бұрын
Great work! Thank you.... I would be very greatful if you check my privious comments. I need your help please
@yoeriyoeri4264
@yoeriyoeri4264 6 жыл бұрын
1en
Python Tutorial: Web Scraping with BeautifulSoup and Requests
45:48
Corey Schafer
Рет қаралды 1,1 МЛН
How Strong Is Tape?
00:24
Stokes Twins
Рет қаралды 96 МЛН
Enceinte et en Bazard: Les Chroniques du Nettoyage ! 🚽✨
00:21
Two More French
Рет қаралды 42 МЛН
Как устроен PYTHON
37:44
про АйТи | IT Pro
Рет қаралды 30 М.
ChatGPT API in Python
35:18
sentdex
Рет қаралды 181 М.
This Is Why Python Data Classes Are Awesome
22:19
ArjanCodes
Рет қаралды 821 М.