Writing a Screen Scraper in Python helped by AI

  Рет қаралды 40

T.J Moir

T.J Moir

Ай бұрын

How text can be taken off a web page and put into a text file and audio file.
Uses Python but I was helped by using chatGPT tools.

Пікірлер: 4
@johnsim3722
@johnsim3722 10 күн бұрын
I thought it would have come out with a HTML package of images and other content within the page so that you could view off-line like it was a live page. I've seen some bits of software that do this, with varying degrees of success. The problem seems to be with scripts behind the scenes that serve up a page, and with content constantly changing on each view. Those programs can get into a mess constantly trying to update themselves. Creating a PDF is certainly one way to take a static copy, better if it does it as a continuous strip rather than paginated. As for the Daily Mail website, other than the content is a load of BS, their adverts are dangerous. My sister-in-law visited on her work computer and got infected by a virus from that site. Machine shut down, she had to then get technical support from the work who ended up buying a new one just to get her back up and running quickly. Daily Mail is a nasty bit of work.
@TJMoir
@TJMoir 10 күн бұрын
Yes you can do images too. That's a bit harder because it gives you everything on the page including logos etc and you need to sort it all out but it can be done. I tried it and got tons of images. So you need to keep track of where they are relevant to the text. So that's where AI is good and it can pop an image up relevant to a paragraph or whatever. But the text bit is the easiest.
@johnsim3722
@johnsim3722 10 күн бұрын
@@TJMoir I've wondered if there was something smart enough to just get the main content with inline images and not with all the junk they push up the sides of any of these sites. Even removing all the in-line adverts. Some sites are almost impossible to read because they're so broken up with adverts. An Ad Blocker is essential, or they'll steal all the resources from your computer to run.
@TJMoir
@TJMoir Ай бұрын
Code: #Screen scraper from a URL #import requests # T.J.Moir and chatGPT import requests as requests import winsound from bs4 import BeautifulSoup import os import tkinter as tk from tkinter import simpledialog from gtts import gTTS import shutil # for beep sound frequency = 500 # Set Frequency To 500 Hertz duration = 500 # Set Duration To 500 ms == 0.5 second # Create the root window root = tk.Tk() root.geometry('500x100+1200+500') root.title("Screen Scraper") # Set window title # Create a StringVar to associate with the label text_var = tk.StringVar() text_var.set("Screen Scraper") # Create the label widget with all options label = tk.Label(root, textvariable=text_var, anchor=tk.CENTER, bg="white", height=3, width=30, bd=3, font=("Times", 16, "bold"), cursor="hand2", fg="Grey", padx=15, pady=15, justify=tk.CENTER, relief=tk.RAISED, wraplength=250 ) # Pack the label into the window label.pack(pady=20) # Add some padding to the top #hide root window #root.withdraw() winsound.Beep(frequency, duration) # Load the URL of web page to scrape url= simpledialog.askstring("Input", "URL for screenscrape") # Send a GET request to the URL response = requests.get(url) # Parse the HTML content soup = BeautifulSoup(response.content, 'html.parser') # Extract the desired data (example: all paragraphs) paragraphs = soup.find_all('p') # Get the user's home directory home_dir = os.path.expanduser('~') text_var.set("Please wait, doing audio first") text="" # Print the text of each paragraph for p in paragraphs: text+= p.get_text() tts = gTTS(text, lang='en-uk') #mp3 director is same as one for text file mp3_dir = os.path.join(home_dir, 'Downloads\screenscraper') # Construct the path to the folder. I created a folder under Downloads called screenscraper download_dir = os.path.join(home_dir, 'Downloads\screenscraper') # Ensure the Desktop directory exists if not os.path.exists(download_dir): raise FileNotFoundError(f"The directory {download_dir} does not exist.") winsound.Beep(frequency, duration) # Create the input dialog mp3_name = simpledialog.askstring("Input", "Enter .mp3 filename:") # Save the audio file tts.save(mp3_name) # move file to desired directory screenscraper shutil.move(mp3_name, download_dir) text_var.set("Finished audio part") winsound.Beep(frequency, duration) # Create the input dialog file_name = simpledialog.askstring("Input", "Enter filename:") # Destroy the root window root.destroy() # Full path to the file file_path = os.path.join(download_dir, file_name) # Save the text to a file with open(file_path, 'w', encoding='utf-8') as file: for p in paragraphs: file.write(p.get_text() + ' ')
Python RAG Tutorial (with Local LLMs): AI For Your PDFs
21:33
pixegami
Рет қаралды 176 М.
Programming in C3 to Annoy Zig fans
1:51:55
Tsoding Daily
Рет қаралды 33 М.
IQ Level: 10000
00:10
Younes Zarou
Рет қаралды 10 МЛН
50 YouTubers Fight For $1,000,000
41:27
MrBeast
Рет қаралды 210 МЛН
Compilers, How They Work, And Writing Them From Scratch
23:53
Adam McDaniel
Рет қаралды 131 М.
ChatGPT vs Claude for Writing in 2024
34:19
The Nerdy Novelist
Рет қаралды 4,5 М.
EVERYONE should try this FREE utility NOW! [NOT SPONSORED]
20:47
JayzTwoCents
Рет қаралды 528 М.
CrowdStrike IT Outage Explained by a Windows Developer
13:40
Dave's Garage
Рет қаралды 2,1 МЛН
Real time Kalman filter on an ESP32 and sensor fusion.
23:40
T.J Moir
Рет қаралды 12 М.
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 194 М.
Calculator version 2 using AI Python
13:50
T.J Moir
Рет қаралды 83
Faster than Light Particles Could Exist After All, New Study Says
7:12
Sabine Hossenfelder
Рет қаралды 244 М.
Хакер взломал компьютер с USB кабеля. Кевин Митник.
0:58
Последний Оплот Безопасности
Рет қаралды 2,1 МЛН
АЙФОН 20 С ФУНКЦИЕЙ ВИДЕНИЯ ОГНЯ
0:59
КиноХост
Рет қаралды 1,2 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 17 МЛН
Как бесплатно замутить iphone 15 pro max
0:59
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 8 МЛН
Rate This Smartphone Cooler Set-up ⭐
0:10
Shakeuptech
Рет қаралды 6 МЛН
#samsung #retrophone #nostalgia #x100
0:14
mobijunk
Рет қаралды 13 МЛН
Samsung laughing on iPhone #techbyakram
0:12
Tech by Akram
Рет қаралды 6 МЛН