Python Scrapy Tutorial - 24 - Bypass Restrictions using Proxies

Рет қаралды 59,526

Күн бұрын

Пікірлер: 81

@leslievanelsie7545 2 жыл бұрын

I will say it once again, this is the best SCRAPY tutorial on KZbin Thank you for all the great stuff you've taught me Conceivable explanations

@buildwithpython 2 жыл бұрын

Wow, thank you!

@SandwichMitGurke 4 жыл бұрын

this was way easier than i expected. this tutorial series has helped me so much, thank you

@khurramjaved2834 4 жыл бұрын

Thanks man i was trying to scrap a site for i don't know ? may 2 days , after building scraper for it ! then i spent other 3 days , trying to get their data , but site was continuously sending me to user login , data extraction rate was like 10 % out of all tries , watching your 23rd video just make my day , Bam ! now i am fkg scraping that site :D .. Love For You

@didovecigor5524 5 жыл бұрын

If you look at logs it wasn't actually scraped "properly", it was scraped with host IP, exactly same issue that happens to me. Nonetheless great video series and I'd appreciate continuation with more in depth material. I think there is a big lack of content in this field so think about it twice. Thanks for the videos and good luck.

@davyroger3773 4 жыл бұрын

This is great! i know this is old but a video dealing with browser fingerprinting, cookies and other methods of bot detection would really complete the tutorial. Thanks again man, learned a lot Btw if any one is getting a Response.text error try uninstalling than reinstalling scrapy-proxy-pool==0.1.7 as the versions after that validate the response.text

@joedavenport5891 4 жыл бұрын

thanks a lot version 0.1.7 worked

@umerimran3833 2 жыл бұрын

brother, where to find version 0.1.7

@raghuflute 4 жыл бұрын

You are amazing bro,, god bless you.. you reach new hights ,,, you always cover everything, whichever i need

@jeremyalbright2392 4 жыл бұрын

These are great! Really clear instruction

@Im4u143 3 жыл бұрын

Thanks for sharing your knowledge with us. You are good teacher. Can you please add a video in this list about how can we connect to SQL server and add the the scraping data to database. It would be much appreciated if you do this.

@viragshah8383 4 жыл бұрын

excellent !!! fantastic way of teaching !!

@safi2297 4 жыл бұрын

Hi Attreya, thanks for the amazing video. despite trying both bypassing technic on your playlist, I was unable to to access the pages I was trying to crawl. do you have other suggestions or technic? Thanks. keep it up the good works.

@sfrgvh 5 жыл бұрын

Hi, Thanks for the video! Is there any restriction on crawled number? scrapy creates a csv that ends with 200 rows but should be 3700 rows

@atultanna 2 жыл бұрын

Can we bypass google with proxies? In short to use it with Scrapebox Hope the code was available to make it more visible

@invisi6l339 5 жыл бұрын

thank you sooooooooooooooooooooooooo much!!!

@twittertrendings816 4 жыл бұрын

How to scrape the data when website provide access within the country not for other countries. For example: I want to scrape an e-commerce website data but being a foreigner I cannot scrape the data of the website which doesn't show me the html code.

@hrishabhgupta7568 Жыл бұрын

I have a question like I am trying to scrape the data using scrapy, On website there is data but in my scrape response i am getting \xa0 only. Any Idea how to fix this issue??

@NishithSavla 4 жыл бұрын

Great content sir but this proxy method isn't working somehow. It is raising an AttributeError: Response content isn't text. Can you help me with this?

@shaikhanuman8012 5 жыл бұрын

hi Sir if we add proxy configuaration in settings.py file we comment the chrome user agents or not ?

@amberchatriwala5225 3 жыл бұрын

Yes you can, comment it out

@Funnyanimalstalkk Жыл бұрын

bro will you make one video on how to solve recaptcha

@obeliskphaeton 2 жыл бұрын

this method doesnt seem to work for me. I only get errors. The previous method of user agent rotation worked.

@ilyasstaybi3583 3 жыл бұрын

hey i have a problem and can't find where the issue is, whenever i run a scrapy projet it returns "crawled 0pages.." any help would be usefull

@abukaium2106 4 жыл бұрын

These proxies didn’t work now. What's the solution??

@amberchatriwala5225 3 жыл бұрын

Hi just wondering have you found the solution Same here doesn’t work for me

@amriteshmadhur 4 жыл бұрын

@buidwithpython could you please add how to use both proxies and user agent together that would be great .May be just in description just snippet of settings.py file. Content is good . Thanks in advance

@amriteshmadhur 4 жыл бұрын

Specifically if you can just add this part - I will be thankful : 'set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA'

@learningreactnative1384 4 жыл бұрын

Thank You So Much.

@mayurbarbhaya7752 4 жыл бұрын

hi there! thanks for wonderful tutorial! can i run those user-agent middleware from python single script using scrapy?

@beautifulnature924 4 жыл бұрын

When i use proxy pool it shows reconnects for several times and at last failed. What happened with me??please help me to solve my problem. Note: i am using conda environment as interpreter. Thanks in advance.

@abhirajsinghrajpurohit2464 2 жыл бұрын

Same

@abhirajsinghrajpurohit2464 2 жыл бұрын

I am getting error

@amsab2365 5 жыл бұрын

thanks for tutorial i am new in python world and i need to as this in order to understand what is the purpose of this , you are sending a request to Amazon several times from several IP address ? what is the benefit ? can i use this method to increase traffic on my own website ?

@thebuilderr 5 жыл бұрын

Most websites have bot-detectors and will block an IP address that's making too many calls to the site. Cycling through proxy IP addresses makes it look like the requests are coming from different users, thus allowing access.

@The_One_0_0 4 жыл бұрын

Or for a Brute force attack

@rutvijrajdeo2262 5 жыл бұрын

Great tutorial... Will it work to bypass google reCaptcha?

@buildwithpython 5 жыл бұрын

Don't think so

@literallydeadpool 4 жыл бұрын

If ReCaptchas are the things with checkboxes and you have to click them, then it's pretty easy to bypass it. You need *autopy* and *random.randint* for this. Ctypes is optional. x, y = 123, 123 #these are needed. I am pretending that 123 123 are the coords for that for i in range(0, 3): autopy.mouse.smooth_move(random.randint(1, 1444), random.randint(1, 720)) autopy.mouse.smooth_move(x, y) autopy.mouse.click()

@puneethc9056 5 жыл бұрын

can we scrape as many images as possible? Using this technique, I have observed that we can scrape only a few

@16876 4 жыл бұрын

excellent!

@emmanueledettorre2087 5 жыл бұрын

Where do we put a custom list of proxies?

@buildwithpython 5 жыл бұрын

In settings

@john_michaelz1823 5 жыл бұрын

Not sure if works with proxy pool... but you can define one in a request meta. support.scrapinghub.com/support/solutions/articles/22000219743-using-a-custom-proxy-in-a-scrapy-spider

@salah56477 4 жыл бұрын

I have some private paid proxies, how can I use them?

@shivamsingh-xr8ns 4 жыл бұрын

you can print them out and then you need to send it to amazon office. Once Jeff Bezos reads it, he will surely send you back the data.

@highwayofdimensions8189 4 жыл бұрын

@@shivamsingh-xr8ns lol

@Эксперимент.Изучениенемецкогоя 3 жыл бұрын

I've got error, how to solve it ? Try to download with host ip.

@Vincxzse 7 ай бұрын

same

@Vincxzse 7 ай бұрын

have you already solved it?

@donrak2997 5 жыл бұрын

hey ! thank you very much, this is exactly what i needed:) i just have 1 problem: the scrapping only works like every 3. time. even after i did use user agent and proxy_pool, any idea?

@Grace-ql7pv 4 жыл бұрын

Thank youuu!

@iampriyanshu 5 жыл бұрын

I am trying to scrape makemytrip but neither google bot nor proxy is working. Can you please help me out?

@chaitanyajadhav5941 5 жыл бұрын

How to pause scrapy on internet loss and resume on internet connection?

@sambasivam 4 жыл бұрын

using proxy did not work for me. ban = is_ban(request, response) File "/Users/sambasivamkrishnamurthy/PycharmProjects/scrapAmazon/venv/lib/python2.7/site-packages/scrapy_proxy_pool/policy.py", line 15, in response_is_ban if self.BANNED_PATTERN.search(response.text): File "/Users/sambasivamkrishnamurthy/PycharmProjects/scrapAmazon/venv/lib/python2.7/site-packages/scrapy/http/response/__init__.py", line 93, in text raise AttributeError("Response content isn't text")

@shivendrasrivastava1721 5 жыл бұрын

I am scraping amazon and have used scrapy-proxy-pool, user-agent in rotations even made my own custom proxy middleware(fetched fresh proxies from a paid proxy service) which worked for a few days .. again stuck now amazon not giving response after all this. Any help on this?

@buildwithpython 5 жыл бұрын

Doesn't happen. May be use fresh proxies and new user-agents

@lokilok121 4 жыл бұрын

thanks you!

@chaunguyen231 3 жыл бұрын

i followed but can not pass proxy

@fadhlif6510 2 жыл бұрын

Hi, I got this error WARNING: No proxies available.

@umair5807 Жыл бұрын

same issue

@umair5807 Жыл бұрын

What's its solution?

@Vincxzse 7 ай бұрын

have you guys already come up with a solution?

@inifin8 4 жыл бұрын

dude you dont mention where to input proxy list.

@guneshshanbhag6208 6 жыл бұрын

Nice tutorial man....can u help me in right direction for automating jionet hotpot automatic connection bypassing otp verification using a service maybe....u can make a video otherwise 😁

@buildwithpython 6 жыл бұрын

I have no idea what that is.

@guneshshanbhag6208 6 жыл бұрын

@@buildwithpython hahaha

@aksontv 5 жыл бұрын

Proxy_Pool not working for me

@lexh7714 5 жыл бұрын

how can i pass some headers manually to scrapy? im trying to access to some web, but i got referer None, oh and very nice content ))

@abhirajsinghrajpurohit2464 2 жыл бұрын

Mine showing error

@rajapaladugu7473 5 жыл бұрын

bro i am getting an empty even after tryinng bypass technique

@buildwithpython 5 жыл бұрын

Maybe your scraper isn't working

@brunomartins4919 5 жыл бұрын

Did you disable user agent before proxy, or can you run both??

@adrianhelerea420 5 жыл бұрын

You can combine both or more (as scrapy-splash settings) only to add them to the SPIDER_MIDDLEWARES dictionary. If you need some settings only for a spider (and not all - general settings) you can add them under the scrapy.Spider class in a dictionary: custom_settings = { 'USER_AGENT': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:66.0) Gecko/20100101 Firefox/66.0', 'SPIDER_MIDDLEWARES' = { 'my_scrapy.middlewares.MyScrapySpiderMiddleware': 543, } }

@myangasp 4 жыл бұрын

Please help. Me on how to get proxy credentials to avoid 407 status

@9778908921 4 жыл бұрын

Unable to scrape "www.centrepointstores.com/ae/en/" have tried proxy and all but not getting response 200. If anyone can try and help me it would be great