Super neat!! Also as a Swede I chuckled at "this is a pretty standard e-commerce site" when talking about Sweden's most valuable brand haha
@JohnWatsonRooney8 ай бұрын
haha! yeah huge brand..! thanks for watching
@shubhammore63326 ай бұрын
I never comment on youtube videos but this has been so helpful. Thank you. Subscriber++
@theonlynicco3 ай бұрын
you are a bloody animal mate, love your work a ton!
@matthewschultz54806 ай бұрын
Thank you very much John, great series - I am a bit stuck between this video and the cleaning with Polars video in taking the JSON terminal output and converting for use in Polars. Is there a def and function I can add to the code to output to csv (or JSON)? I considered importing csv and json libraries and creating a def and print but unsure on this step. Many thanks again
@theonlynicco3 ай бұрын
check his playlists and go to the one about web scraping for beginners, i think vid number 4 he covers conversion to CSV and JSON quite well.
@superredevil125 ай бұрын
love your video man, great content!
@cagan88 ай бұрын
Just followed, great content
@mattrgee8 ай бұрын
Thanks! Another really useful video. What would be the best way to either remove unwanted columns or extract only the required columns then output a json file containing only the required data? This and your 'hidden API' video have been so helpful.
@JohnWatsonRooney8 ай бұрын
thanks! you could remove the keys from the json (dict) in python before loading to a dataframe, or if you are going to use the dataframe remove them there buy dropping columns
@LuicMarin7 ай бұрын
I bet you can't make a video on how to avoid cloudflare websites, not simple test cloudflare website but proper ones where cloudflare detection works properly
@graczew8 ай бұрын
Good stuff as always. I will try use this with fotmob website. 👍😉
@ying12968 ай бұрын
thank you so much for this! i always had the issue of trying to scrape data from sites which paging is based on "Load More"
@JohnWatsonRooney8 ай бұрын
Glad it helped!
@TheJFMR8 ай бұрын
I use polars instead of pandas. Everything improved with rust will have better performance ;-)
@jayrangai21197 ай бұрын
You are the best!
@RyanAI-kk1kv7 ай бұрын
I'm currently working on a project that involves scraping Amazon's data. I have tried a few methods that didn't work which led me to your video. However, when I loaded amazon and looked through the JSON files, I couldn't find any of them that included the products. Why is that? What do you recommend I should do?
@mohamedtekouk82158 ай бұрын
Kind of magic thank you very much 😭😭😭 Is this can be used on scraping multiple pages ??
@rianalee31387 ай бұрын
yes
@negonifas8 ай бұрын
not bad, thanks a lot.
@milesmofokeng15518 ай бұрын
How long had u been using linux or archlinux distro would you recommend it?
@JohnWatsonRooney8 ай бұрын
3 years full time, dual boot/on and off for 10+. I use Fedora at the moment, seems to be a good mix. Unless you rely on windows specific software for work, or play games, 100% linux. Only thing I don't do on linux is edit videos, and that's for convenience.
@heroe14867 ай бұрын
@@JohnWatsonRooney Most games are more than playable thanks to proton now tho, the only drawbacks are the ones with really intrusive AA like Valorant's one.
@JohnWatsonRooney7 ай бұрын
@@heroe1486 yeah its good to see, last thing i played was PoE and that was absolutely fine
@viratchoudhary68278 ай бұрын
I discovered this method three years ago🙂
@schoimosaic8 ай бұрын
Thanks for the video, as always. In my attempt, the website's response didn't include a 'metadata' key. Instead, the page restriction was specified under the 'parameter' key, as shown below. Despite setting the 'pageSize' to 1000, I only received a maximum of 100 items, which suggests a system preset limit by the admin. I'm uncertain about how to bypass this apparent restriction of 100 items. params = { ... ... 'lang': 'en-CA', 'page': '1', 'pageSize': '1000', 'path': '', 'query': 'laptop', ... ... }
@JohnWatsonRooney8 ай бұрын
there will be a restriction within their API, I was surprised the one in my example went up so high, 100 seems about right. you will have some kind of pagination available to get the rest of the results