Web Scraping + Reverse Engineering APIs

  Рет қаралды 4,374

Syntax

Syntax

Күн бұрын

Web scraping 101! Dive into the world of web scraping with Scott and Wes as they explore everything from tooling setup and navigating protected routes to effective data management. In this Tasty Treat episode, you'll gain invaluable insights and techniques to scrape (almost) any website with ease.
Show Notes
00:00 Welcome to Syntax!
03:13 Brought to you by Sentry.io.
05:00 What is scraping?
08:01 Examples of past scrapers.
10:06 Cloud app downloader.
16:13 Other use cases.
16:58 Scraping 101.
17:28 Client Side.
19:08 Private API.
22:40 Server rendered.
23:27 Initial state.
24:57 What format is the data in?
27:08 Working with the DOM.
27:12 Linkedom npm package.
29:02 querySelector everything.
31:28 How to find the elements without classes.
34:08 Use XPath selectors for select by word.
34:53 Make them as flexible as you can. Classes change!
35:10 AI is good at this!
36:26 File downloading.
38:20 Working with protected routes.
40:41 Programatically retrieve authentication keys because they are short-lived.
43:20 Deal-breakers.
44:58 What happened with Amazon?
46:42 Wes' portable refrigerator utopia.
47:25 Sick Picks & Shameless Plugs.
All links available at syntax.fm/763
------------------------------------------------------------------------------
Hit us up on Socials!
Scott: / stolinski
Wes: / wesbos
Randy: / @randyrektor
Syntax: / syntaxfm
www.syntax.fm
Brought to you by Sentry.io
#webdevelopment #webdeveloper #javascript

Пікірлер: 18
@cguser
@cguser 22 күн бұрын
finally a talk on Web Scraping! good to see you again wesbos and scott!
@pedrogorilla483
@pedrogorilla483 22 күн бұрын
Awesome! On the same line, I’d love an episode on reverse engineering scrambled or minified webapps 😏
@WesBos
@WesBos 22 күн бұрын
good idea - I think there is also one on how to find objects of data in the JS heap
@chamithjanaka6040
@chamithjanaka6040 19 күн бұрын
Love you both from Sri Lanka...🇱🇰 ❤
@bingerminn
@bingerminn 21 күн бұрын
Awesome! I was using puppeteer to scrape a site and converted it to pinging their api directly. So much faster and no random errors when a element fails to load. Where would you host your scraping scripts that run everyday, hour or minute? I used a package to run it as a service on windows.
@paullvindquist
@paullvindquist 20 күн бұрын
I never thought I’d hear XPath mention on a podcast. It’s really too bad XML became a 4 letter word. There was actually some cool things you could do with it that you can’t do with JSON. It also having a DOM for one thing.
@qnoox
@qnoox 22 күн бұрын
love this podcast and this episode since i’m also an scrape OG/ automation panda :) side question will the video format of the podcast ever pan into visual snapshots; when talking about something like when mention console then pan into a snapshot of that or if a website is mentioned than a print screen of that like wes did once during the this video; i know this will add in more work during editing but it would be extra coolness if it was included as a standard; thanks keep up the awesomeness 🎉👍;
@jayfiled
@jayfiled 20 күн бұрын
Yeah, I jumped off the audio version and onto KZbin hoping to see something in action. But I think that would slow down the time to upload, CJ probs has something in the mix no doubt.
@jayfiled
@jayfiled 20 күн бұрын
How would you alert if something was available? I want instant, attention ambushing feedback if my scraper finds something. If i run a cypress script in headless to check a site for tickets, say, and it found one, i want a desktop alert somehow. Browser alerts work if i run it manually, but if I schedule it on mac, then it runs in the background and i dont get any alerts.
@KevinMacKenzie61
@KevinMacKenzie61 22 күн бұрын
Is there a course you recommend for this?
@buddy.abc123
@buddy.abc123 22 күн бұрын
Lol I've been watching every episode since CJ joined and yet I'm not subscribed 😅 Time to change that
@WesBos
@WesBos 22 күн бұрын
yeahhh buddy
@stolinski
@stolinski 22 күн бұрын
Working on a scraper rn.
@jayfiled
@jayfiled 20 күн бұрын
Public repo? Link us up
@jayfiled
@jayfiled 20 күн бұрын
Oh it's you Scott, hahah. I had a rush of enthusiasm to work on it with a fellow listener but now I feel silly.
@Stoney_Eagle
@Stoney_Eagle 22 күн бұрын
If someone scrapes for indexing and links to your site to consume it I am totally cool with it, but if someone scrapes to bypass the site I'm not.
@gofudgeyourselves9024
@gofudgeyourselves9024 22 күн бұрын
Ok
HTMX Web Apps with Carson Gross
52:01
Syntax
Рет қаралды 6 М.
Зу-зу Күлпәш. Стоп. (1-бөлім)
52:33
ASTANATV Movie
Рет қаралды 1,2 МЛН
SHE WANTED CHIPS, BUT SHE GOT CARROTS 🤣🥕
00:19
OKUNJATA
Рет қаралды 12 МЛН
顔面水槽がブサイク過ぎるwwwww
00:58
はじめしゃちょー(hajime)
Рет қаралды 115 МЛН
Follow @karina-kola please 🙏🥺
00:21
Andrey Grechka
Рет қаралды 22 МЛН
You Should Use Maps and Sets in JS
14:28
Syntax
Рет қаралды 8 М.
Set up a Mac in 2024 for Power Users and Developers
1:00:34
Syntax
Рет қаралды 214 М.
The case against htmx
6:21
Mark Jivko
Рет қаралды 10 М.
So You Think You Know Git - FOSDEM 2024
47:00
GitButler
Рет қаралды 935 М.
Dear Rabbit: Stop Threatening 14 Year Olds
30:46
Theo Rants
Рет қаралды 75 М.
Host your own Vercel
57:58
Syntax
Рет қаралды 4,7 М.
How To Make Your Boring macOS Terminal Amazing With Alacritty
16:33
Josean Martinez
Рет қаралды 66 М.
Self Host 101 - Set up and Secure Your Own Server
25:56
Syntax
Рет қаралды 25 М.
Carregando telefone com carregador cortado
1:01
Andcarli
Рет қаралды 1,4 МЛН
Внутренности Rabbit R1 и AI Pin
1:00
Кик Обзор
Рет қаралды 2,3 МЛН
НЕ ПОКУПАЙ iPad Pro
13:46
itpedia
Рет қаралды 402 М.
Apple watch hidden camera
0:34
_vector_
Рет қаралды 34 МЛН
Nokia 3310 versus Red Hot Ball
0:37
PressTube
Рет қаралды 1,9 МЛН
Теперь это его телефон
0:21
Хорошие Новости
Рет қаралды 1,6 МЛН