How I Use Data Pipelines in my Web Scrapers

  Рет қаралды 2,472

John Watson Rooney

John Watson Rooney

Күн бұрын

Пікірлер: 12
@personofnote1571
@personofnote1571 3 ай бұрын
Great point about separation of concerns. As you stated, the scraper should only be concerned with getting data and saving data. I am curious what other use cases would be compatible with scrapy’s pipelines. Would pipelines be a good place for things like “save to this OTHER database”, or “upload to S3”, or “ping this api”? Will be diving into this myself soon but curious about your thoughts here.
@JohnWatsonRooney
@JohnWatsonRooney 3 ай бұрын
yes absolutely, you could use an item field to decide whether to upload to X DB or Y DB, and certainly uploading to S3 would come here too. pinging an API you mean like to notify another system? I think that would be a great use case for pipelines (not thought of that before)
@alexdin1565
@alexdin1565 3 ай бұрын
Hi Johne i have a question can we use scrapy with django? i mean make the webscraper as online tool
@RicardoPorteladaSilva
@RicardoPorteladaSilva 3 ай бұрын
I think you could create script to scrape separately and load de result to django databases. The processing occurs in separated moments. I hope you understand my English, I'm from Brazil, learning English. if you need more specific please feel free to getting in touch. its a great pleasure to help you
@JohnWatsonRooney
@JohnWatsonRooney 3 ай бұрын
this is pretty much it!
@HitAndMissLab
@HitAndMissLab 3 ай бұрын
@@RicardoPorteladaSilva what is the advantage of using Django DB?
@piercenorton1544
@piercenorton1544 3 ай бұрын
What if we want to take a full page so we can give it to an LLM to parse? For example, what if we were parsing financial filings or contracts. We want chunks or pages to pass to an LLM to structure outputs. I think splitting the text on a tag and then joining the items together would be best, but maybe there is a better way.
@HitAndMissLab
@HitAndMissLab 3 ай бұрын
Do you have any videos on how to use proxies in Python?
@JohnWatsonRooney
@JohnWatsonRooney 3 ай бұрын
I don’t specifically but that’s a good idea I will create a video on proxies inc how to use
@jjeffery129
@jjeffery129 3 ай бұрын
What’s wrong with scrapping them as string and change them in the end in your output file?
@CeratiGilmour
@CeratiGilmour 3 ай бұрын
Funcionaría junto con selenium?
@elmzlan
@elmzlan 3 ай бұрын
I hope you have a course
Is this how pro's scrape HUGE amounts of data?
20:34
John Watson Rooney
Рет қаралды 6 М.
This is How I Scrape 99% of Sites
18:27
John Watson Rooney
Рет қаралды 139 М.
Wait for it 😂
00:19
ILYA BORZOV
Рет қаралды 8 МЛН
Random Emoji Beatbox Challenge #beatbox #tiktok
00:47
BeatboxJCOP
Рет қаралды 17 МЛН
Smart Sigma Kid #funny #sigma
00:14
CRAZY GREAPA
Рет қаралды 87 МЛН
How I'd Learn AI (If I Had to Start Over)
15:04
Thu Vu data analytics
Рет қаралды 845 М.
How to Scrape Data for Market Research (full project)
54:48
John Watson Rooney
Рет қаралды 7 М.
What Makes A Great Developer
27:12
ThePrimeTime
Рет қаралды 204 М.
Want Scrapy without the project folder? Use this.
19:24
John Watson Rooney
Рет қаралды 3,4 М.
How I run my Python scripts everyday in the cloud
17:11
John Watson Rooney
Рет қаралды 7 М.
This Is Why Python Data Classes Are Awesome
22:19
ArjanCodes
Рет қаралды 812 М.
How programmers flex on each other
6:20
Fireship
Рет қаралды 2,5 МЛН
My Unconventional Coding Story | Self-Taught
27:14
Travis Media
Рет қаралды 651 М.
This Simple String Blocks Your Web Scrapers
10:29
John Watson Rooney
Рет қаралды 31 М.