Python Scrapy Tutorial - 10 - Extracting data w/ XPATH

Рет қаралды 80,674

Күн бұрын

Пікірлер

@popi3789 2 жыл бұрын

This series is completely awesome, I've been dying to find a video that explained in thorough detail what everything means when you are actually using scrapy. Thanks a lot!

@kipronoelijahkoech4630 4 жыл бұрын

I am totally loving this. I can't have enough. I guess I will go all through everything in two days.

@buildwithpython 4 жыл бұрын

Do it.

@hritikthapa1 4 жыл бұрын

why is the code written in the UI the same, if we can just use scrapy shell, "https.." what is the necessity of the code? or why isn't the code changing acc. to the wbesites we are scraping?

@ravitanwar9537 5 жыл бұрын

keep coming up with more python projects like this bro . awesome !!

@2137xd 4 жыл бұрын

Rias gremory tryna learn python XD

@CipherX-07 3 ай бұрын

excellent explanation.....Thank you bro

@alisiraydemir 2 жыл бұрын

I can not go to the previous code inside the scrapy shell by using 'up' any idea why?

@zeki7540 3 жыл бұрын

I can't get any response with different website, I only get "..." as scrapping result where is the problem? please help!!

@Nin3_Six 3 жыл бұрын

how do i display the previous command in shell as you are in pycharm

@yuliu1105 5 жыл бұрын

Thank you for this scrapy video series. I did the exactly as you showed, response.xpath("//title").extract(), and I got an empty list. I also tried response.css("title").extract(), same empty list. Please advise.

@buildwithpython 5 жыл бұрын

Very difficult to debug when you don't know the website. The code looks alright.

@thatolebethe3238 3 жыл бұрын

Thank you for the video series

@akromajones3385 5 жыл бұрын

I have a question how do we crawl a website when two li's exist that both have hrefs but do not have a class and on the following page they change orders? Can we get the text and use : if href = mytext : request.follow??

@buildwithpython 5 жыл бұрын

Give it a shot.

@sankofax8082 5 жыл бұрын

do i still need to be checking for source codes(inspect element) to find the selectors(css or Xpath) can't i just use the chrome extension you provided. is there any added benefit of know the syntax of Xpaths. thanks for the tuts by the way. super awesome

@SchonGetestet 5 жыл бұрын

it is important to get the class, where the url is in it. not the class of the href-tag, but the next outer

@ameygirdhari8703 4 жыл бұрын

Awesome explanation

@buildwithpython 4 жыл бұрын

thanks :)

@harigopaladamus9481 Жыл бұрын

very good stuff

@robertgancarczyk8710 4 жыл бұрын

I like your stuff :)

@winstonloke2860 5 жыл бұрын

in the video you used response.css(li.next ......) to state the class for css selector. and response.xpath("//span[@class='text']...... can i say that this is the standard format for the 2 selectors when looking for class type?

@SchonGetestet 5 жыл бұрын

I used this for a website with the following code: response.css("a.pagination-next").xpath("@href").extract() So, the classes can be different.

@angejoelziade6920 5 жыл бұрын

Good morning. I don't understand why i get answer like [u'Quotes to Scrape] before command response.xpath("//title/tesxt()").extract... why *"u'"*

@buildwithpython 5 жыл бұрын

I have mentioned it in the videos. u is only python information that this text is coded in Unicode. You have to print text in correct way to get it without this information

@ashutoshtiwari3785 4 жыл бұрын

The response is in bytes. Just add ".encode{"utf-8")" after the statement you are using.

@mountainrunner1955 4 жыл бұрын

is it necessary to mix css and xpath? e.g. you use response.css("a").xpath("@href").extract() . Doesn't it make more sense to stay with css: response.css("a::attr(href)").extract() ? seems easier to read for me. Or have you used it only for demonstration purposes?

@fabiof.deaquino4731 4 жыл бұрын

It was used for demonstration only, just to let us know that we can use it if want (or need) to :)

@yehiafouad6360 5 жыл бұрын

thanks for your exciting course it really helpful !! but I have a problem with extracting data in amazon....when I select the css using | response.css(".a-text-normal::text").extract() | the output is [] only NO DATA INSIDE IT !!....can you help me on this ?

@sankofax8082 5 жыл бұрын

use the chrome extension he provided (in the previous episode)to narrow down the elements yourself. your amazon page could be formatted differently from the one shown in the video.

@ameyabrahmankar2220 5 жыл бұрын

I am facing the same problem

@MaryemChannel 3 жыл бұрын

Same problem. The is due to the spaces in the class name i think there is a special method for class with spaces i tried to put . Instead of space but doesnt work

@abdulhannan1996 5 жыл бұрын

response.xpath("//span[@class='text']/text()").extract() this line which I copied from you gives me alot of unwanted /u characters in each sentence. Why do I get them and not you. Btw I have python 3.6.3

@HeroicHeaster 5 жыл бұрын

send the original xpath with '/u'

@whayAl 3 жыл бұрын

could you enable the subtitle for this video please? many thanks in advance for all your teachings

@Bihari_Chaman 2 жыл бұрын

Use the default inspect element of Chromium

@mmanuel6874 5 жыл бұрын

You only used it in the terminal. What about the code

@buildwithpython 5 жыл бұрын

This is a video series. Watch the next videos

@halbodb 3 жыл бұрын

Love your videos but unfortunately your screen casting is too small for watching! 📺😢

@dipankardey1044 2 жыл бұрын

one observation is there. If I'm using- response.css('a').xpath('@href').extract(), I get some limited no.(441) of urls, but when I'm using response.css('a').xpath('//@href').extract() I see a lot of extraction continues. Does it scrape all the pages too in the second expression? note the "//" in the xpath