Website to Dataset in an instant

Рет қаралды 7,226

Күн бұрын

Пікірлер: 30

@stevenlomon 8 ай бұрын

Super neat!! Also as a Swede I chuckled at "this is a pretty standard e-commerce site" when talking about Sweden's most valuable brand haha

@JohnWatsonRooney 8 ай бұрын

haha! yeah huge brand..! thanks for watching

@shubhammore6332 6 ай бұрын

I never comment on youtube videos but this has been so helpful. Thank you. Subscriber++

@theonlynicco 3 ай бұрын

you are a bloody animal mate, love your work a ton!

@matthewschultz5480 6 ай бұрын

Thank you very much John, great series - I am a bit stuck between this video and the cleaning with Polars video in taking the JSON terminal output and converting for use in Polars. Is there a def and function I can add to the code to output to csv (or JSON)? I considered importing csv and json libraries and creating a def and print but unsure on this step. Many thanks again

@theonlynicco 3 ай бұрын

check his playlists and go to the one about web scraping for beginners, i think vid number 4 he covers conversion to CSV and JSON quite well.

@superredevil12 5 ай бұрын

love your video man, great content!

@cagan8 8 ай бұрын

Just followed, great content

@mattrgee 8 ай бұрын

Thanks! Another really useful video. What would be the best way to either remove unwanted columns or extract only the required columns then output a json file containing only the required data? This and your 'hidden API' video have been so helpful.

@JohnWatsonRooney 8 ай бұрын

thanks! you could remove the keys from the json (dict) in python before loading to a dataframe, or if you are going to use the dataframe remove them there buy dropping columns

@LuicMarin 7 ай бұрын

I bet you can't make a video on how to avoid cloudflare websites, not simple test cloudflare website but proper ones where cloudflare detection works properly

@graczew 8 ай бұрын

Good stuff as always. I will try use this with fotmob website. 👍😉

@ying1296 8 ай бұрын

thank you so much for this! i always had the issue of trying to scrape data from sites which paging is based on "Load More"

@JohnWatsonRooney 8 ай бұрын

Glad it helped!

@TheJFMR 8 ай бұрын

I use polars instead of pandas. Everything improved with rust will have better performance ;-)

@jayrangai2119 7 ай бұрын

You are the best!

@RyanAI-kk1kv 7 ай бұрын

I'm currently working on a project that involves scraping Amazon's data. I have tried a few methods that didn't work which led me to your video. However, when I loaded amazon and looked through the JSON files, I couldn't find any of them that included the products. Why is that? What do you recommend I should do?

@mohamedtekouk8215 8 ай бұрын

Kind of magic thank you very much 😭😭😭 Is this can be used on scraping multiple pages ??

@rianalee3138 7 ай бұрын

yes

@negonifas 8 ай бұрын

not bad, thanks a lot.

@milesmofokeng1551 8 ай бұрын

How long had u been using linux or archlinux distro would you recommend it?

@JohnWatsonRooney 8 ай бұрын

3 years full time, dual boot/on and off for 10+. I use Fedora at the moment, seems to be a good mix. Unless you rely on windows specific software for work, or play games, 100% linux. Only thing I don't do on linux is edit videos, and that's for convenience.

@heroe1486 7 ай бұрын

@@JohnWatsonRooney Most games are more than playable thanks to proton now tho, the only drawbacks are the ones with really intrusive AA like Valorant's one.

@JohnWatsonRooney 7 ай бұрын

@@heroe1486 yeah its good to see, last thing i played was PoE and that was absolutely fine

@viratchoudhary6827 8 ай бұрын

I discovered this method three years ago🙂

@schoimosaic 8 ай бұрын

Thanks for the video, as always. In my attempt, the website's response didn't include a 'metadata' key. Instead, the page restriction was specified under the 'parameter' key, as shown below. Despite setting the 'pageSize' to 1000, I only received a maximum of 100 items, which suggests a system preset limit by the admin. I'm uncertain about how to bypass this apparent restriction of 100 items. params = { ... ... 'lang': 'en-CA', 'page': '1', 'pageSize': '1000', 'path': '', 'query': 'laptop', ... ... }

@JohnWatsonRooney 8 ай бұрын

there will be a restriction within their API, I was surprised the one in my example went up so high, 100 seems about right. you will have some kind of pagination available to get the rest of the results