I like this type of video. You should do like a monthly video of new module so people can be aware. This will be very useful people that learn python.
@sentdex6 жыл бұрын
Yeah, I was thinking of doing it a bit more often, testing it here. There are many modules I am curious about, not not sure I'd ever do a serious series on them, just want to poke around with them like here.
@xXMockapapellaXx6 жыл бұрын
"Module of the Month" playlist?
@rumidom6 жыл бұрын
man, i'm mixing that HTML parsing sauce with my beautiful soup right now
@yokoono426 жыл бұрын
Regarding the next() thing, it's dead simple. Have a look at: github.com/kennethreitz/requests-html/blob/master/requests_html.py#L431-L470 So basically they just have a list of symbols like 'more', 'next' or 'older' and look for their hrefs. So on HN page 2, the title from the CNBC story has the word 'more' in it. Haha. However, there are statistical method about how you can find out how a page uses pagination but I guess that's a bigger nut to crack for such a young library :)
@mshirazab6 жыл бұрын
Multiple classes in html is shown by spaces. So in CSS selectors, it will be separated by a '.'(dot) For example: will be referenced as about.find("div.foo.bar") Also, '(' and ')' are invalid css selector characters so you have to escape them. For more info, developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors
@Hans-jc1ju6 жыл бұрын
18:52 I ‘think’ replacing the spaces with periods should fix the period error, no idea about the parentheses, though. Backslash or HTML escape?
@cooperlimond6 жыл бұрын
I think the // for retry in range(100): // part is what is allowing the script to continue after raising the error. From their doc: "The simplest use case is retrying a flaky function whenever an Exception occurs until a value is returned." So this would allow the exception to be printed, yet the script to continue I believe. Great content man, thanks for all of the awesome videos :).
@MohamedMagdyHammad6 жыл бұрын
The function couldn't clean up user data because these files were locked by chromium process.
@joshuavillwo6 жыл бұрын
Mohamed Hammad I'll bet if you closed the request before the script ends, it won't error.
@rohnchatterjee77366 жыл бұрын
Best way to remove error -> comment out raise statement. 😁😂
@sentdex6 жыл бұрын
Pesky errors just getting in the way!
@rohnchatterjee77366 жыл бұрын
sentdex Linux seemed to unaffected, I have thoroughly used the render in Linux even without sudo, no problem.
@Lucas-wl8py6 жыл бұрын
This is cool! And it gave me the idea of a series of videos about how to create a python package
@anyad1115 жыл бұрын
Hello! Recently I am trying to parse some webpages with Requests-html asynchronously. Theoretically this can be done by working with AsyncHTMLSession. However, I am unable to get result with it most of the time (I also use arender, the attempts to parse the webpages fails due to different reasons - most probably timeouts). Maybe it's just the poor internet connection, but I'd be really grateful if you uploaded a video or help me with this.
@SimOn-bz4xy6 жыл бұрын
Sentdex! Can you show scraping from a page with a "show more" button, that loads more of the page in JavaScript ?
@shazkingdom17024 жыл бұрын
Thank you so much!, I got the answer based on your guide. *nice helmet ⛑ you got back there*
@LolLol-wy5fp6 жыл бұрын
Thank u sentdex you are leading me to the real world from africa
@SerenoMendes6 жыл бұрын
Great tool to build crawlers!
@WhiterockFTP6 жыл бұрын
to find td‘s or other elements that pertain to multiple classes you just would have had to put dots in between. Read up on css selectors, jquery also uses them, pretty standard nowadays and less headache than xpaths ;)
@SkySesshomaru6 жыл бұрын
Make another video building a crawler using it. Nice video!
@SimonEliasen1236 жыл бұрын
Please make a video building a webcrawler, would be very insightful!
@rohnchatterjee77366 жыл бұрын
I think for paging one can use threading with a except statement and hopefully it will work.
@chengyaozheng85365 жыл бұрын
for {basically anything you want to name it} in r.html: you get urls
@Hegelian104 жыл бұрын
Strange, I have installed requests_html but when I import it in a Python script in Python 2.x or 3.7, I get: ModuleNotFoundError: No module named 'requests_html'
@developerarchitect75236 жыл бұрын
what the extension, who print the result down?
@kylek296 жыл бұрын
Thanks for posting this. I've used BS4 and another module to do the JavaScript (render the page) on many projects, it's nice to have it in a concise package. Btw, I think the pagination on HackerNews failed because it looks for one of three (by default) "next" labels. "next", "more", "older" (DEFAULT_NEXT_SYMBOLS). The CNBC link has "more" in it.
@sentdex6 жыл бұрын
Yep that's what someone else was saying too. I started an issue with a suggestion here github.com/kennethreitz/requests-html/issues/154 If I get some time, I'll try to do a PR for it.
@im44853 жыл бұрын
hi, i cant find a way to select an element by its attribute. Can you help please?
@pyxelr6 жыл бұрын
Can I get some help how to install "requests-html" package to be run globally, for example, through Sublime Text? I am using Conda on Windows 10. I have been trying to do that, but as I understand so far, it runs only in virtual environment that cannot be used by Sublime? Correct me if I am wrong.
@muhammedeltabakh926 жыл бұрын
I love you man more than Mo Salah
@shmuel-k6 жыл бұрын
You could have used a css selector when parsing the yahoo finance page
@rajshah90316 жыл бұрын
Is it useful for scraping website with ajax ??
@TheJohnny95066 жыл бұрын
Great tool for mixing with bs4 to build a robust crawler
@dave5976 жыл бұрын
wow, when did you start using sublime text? i havent seen your videos in a while but back then they were all done in notepad or idle! :)
@sentdex6 жыл бұрын
Last few months, still use IDLE, but been mainly using sublime lately.
@ddbonpc6 жыл бұрын
awesome video as always, quick question tho why don't you use linux ?
@sentdex6 жыл бұрын
Linux is great for dev, not so much for a user. I do use linux every day for dev, but I rarely record in linux because it's often problematic.
@ddbonpc6 жыл бұрын
Yeah I can normally get by with OBS but then I have to turn back to windows for Premier Pro. Anyway awesome videos ! I loved your series on GTA
@Christian-mn8dh2 жыл бұрын
why use requests-html when I can use requests?
@fhashim4 жыл бұрын
Please make a tutorial on how to make asynchronous requests using grequest
@sak84856 жыл бұрын
Can you make a video about GANS, and some real time appplications of it
@Slacquerr6 жыл бұрын
How well does it work on snippets or badly formed HTML?
@sentdex6 жыл бұрын
I'd have to imagine it would vary on a case by case basis and what you're attempting to do.
@CodingTrades6 жыл бұрын
you forgot to say what it is good for
@xiaokunxu75936 жыл бұрын
I was thinking why not using r.html.find_all('td', {'class':re.compile(r'class name regex')}). turn out that's beautifulsoup funtion. But yeah, having the render() to run js is nice!
@DominikPiekarczyk5 жыл бұрын
I wanted to give this Finance Yahoo example a try. Here's an example solution how to collect that data: github.com/kornislaw/scraping_recipes/blob/master/finance-yahoo_by_requests-html/run.py
@zurabkavtar91285 жыл бұрын
Hello! I am new to python. Unfortunately, your code doesn’t work for me, maybe something has changed on the site? (finance.yahoo.com/quote/AMZN/key-statistics?p=AMZN)
@ayush04776 жыл бұрын
Should i change my windows to 64-bit version ?
@sentdex6 жыл бұрын
If possible, you definitely want 64 over 32. 32bit limits apps to 2gb of memory, which is unfortunate.
@kemalonat8026 жыл бұрын
From Turkey👋👋👋
@hamrozjumaev6 жыл бұрын
Great work! Thank you.... I would be very greatful if you check my privious comments. I need your help please