Getting Word Frequency from a Text File using Python Dictionaries

  Рет қаралды 18,464

Adam Gaweda

Adam Gaweda

2 жыл бұрын

Пікірлер: 25
@user-qq1ul6jy7b
@user-qq1ul6jy7b Жыл бұрын
Hi, I am a college student majoring in computer engineering in South Korea. Your video really helped me a lot with my studies. Thank you.😊
@AMGaweda
@AMGaweda Жыл бұрын
Glad to help!
@comrade_dankbob6876
@comrade_dankbob6876 3 ай бұрын
You are my most beautiful sunshine Adam Gaweda, you give me the light of my tunnel. You make the grey days bright with your wonderful smile. You are my pookie wookie stuffy bear-boy and I want to cherish you for days-on-end. Adam, I love you dai-dai-dai-dai-ski
@user-uv6rk8mo3s
@user-uv6rk8mo3s Жыл бұрын
hi thanks, can I use it for the Arabic language to count words for me
@solomonngare8382
@solomonngare8382 Жыл бұрын
Thanks bro
@thomaskersig5291
@thomaskersig5291 Жыл бұрын
Thanks for this! Using my own file (a .csv which I saved as .txt), I get the following output after running list.(word_count.keys())[:10] = [' \x00'] Any suggestions of what to do? Does it make sense to rewrite the code to open the .csv, or will I run into the same problem? Best Thomas
@AMGaweda
@AMGaweda Жыл бұрын
You'll most likely still run into the issue, since CSV files are just TXT files. Its mostly programs that treat them differently. I'd recommend doing a little "preprocessing" before you count your words by doing things like making all letters lowercase and removing excess white space. Such as your example, it might be good to do something like sentence = sentence.replace("\x00", "") to remove these kind of characters from analysis
@LukaDonesnitch
@LukaDonesnitch Жыл бұрын
can you explain how to swap out the .txt file for a .csv file? I'm trying to add a user input line and when the user searches for a word in the csv file on column 3 it prints the output of how many occurrences the word is in the csv file. so far when i make the changes to csv and increase the increment by 1 it has an error message TypeError: string indices must be integers.
@AMGaweda
@AMGaweda Жыл бұрын
It depends on the format of the file, but take a look at my video on using CSVReader kzbin.info/www/bejne/Z2KZfIqvgchpgJo You'll follow the same ideas - getting a list of the words, then use a dictionary to get the count of that word. You may also not need the dictionary, since a running total just needs a for loop to iterate through a list. One trick I like to use is to load the contents of a file into a "contents" file first, ala contents = open(filename, 'r').readlines(). This way, I no longer need to worry about the file handling aspect of my analysis and can instead rely on the list.
@bahaminakhtari4997
@bahaminakhtari4997 Жыл бұрын
Hello, I enjoy watching your videos. This video helped a lot. I do have a question. How would you put the top ten words into a dictionary, where the key would be the word and the count would be the value?
@AMGaweda
@AMGaweda Жыл бұрын
Around Minute 8 there is a function that creates a sorted list of the most frequent words. If you wanted to put the top 10 in a dictionary, you'd need to create a new dictionary and add only the words from the sorted list into it.
@bahaminakhtari4997
@bahaminakhtari4997 Жыл бұрын
@@AMGaweda I see. Thank you so much for replying!
@andytamburino1743
@andytamburino1743 2 жыл бұрын
Do you teach a masters class at NCU? im aobut to finish my BA in Comp Sci and man you are an awesome teacher
@AMGaweda
@AMGaweda 2 жыл бұрын
Thanks, I'm finishing up my PhD now, but hopefully in the fall wherever I end up I'll be teaching there
@RRB47tv
@RRB47tv Жыл бұрын
How do I get the least frequent? Excellent video!! Thank you
@AMGaweda
@AMGaweda Жыл бұрын
When I did sorted_values = sorted(sorted_values) you would omit the [::-1] portion. The [::-1] reverses the list so the largest appear first, but sorted(sorted_values) will have the least frequent first. It will be a lot of 1 count words, but that should do what you are looking for.
@Vagabund92
@Vagabund92 2 жыл бұрын
Thank you. I learned a lot. I also appeciate the comments in the code. Only thing is that I didn't get rid off all the punctuation in my text (that I wrote myself as a dummy. I mass copied "thousand.thousand.thousand.thousand" next to each other and it stayed that way). Would be cool if you could share the code and the Alice in Wonderland text.
@AMGaweda
@AMGaweda 2 жыл бұрын
I don't share my code mostly to encourage students to code along with me BUT you can download a copy of Alice in Wonderland on Project Gutenburg www.gutenberg.org/ebooks/11
@Vagabund92
@Vagabund92 2 жыл бұрын
@@AMGawedaokay, I already replicated you Code and thought that copying it would have been handy. :D
@pradnyakasar614
@pradnyakasar614 Жыл бұрын
sir,How to find out the count of unique words from multiple text file at one time?
@AMGaweda
@AMGaweda Жыл бұрын
I would still recommend using the counting method from this video but process it across a list of files. Once you've finished each file, the dictionary will have a list of keys you can look at (using the .keys() function). This will give you the list of unique words which you can then get how many by using len()
@sharma3226
@sharma3226 2 жыл бұрын
Sir could you Pleasee guide me how to sort number of frequent words used in pdf document. because i want to learn the most important major words for exam would be very helpful 🙏🏽🙏🏽.
@AMGaweda
@AMGaweda 2 жыл бұрын
There isn't a "clean" way to extract text from a PDF, however you can utilize some of Python's third-party libraries to do this. For example, PDFPlumber (github.com/jsvine/pdfplumber) will allow you to extract text. Please note, this is expecting the PDF's text to be TEXT. Text inside of graphics or pictures, or pictures of text, will not get extracted.
@sharma3226
@sharma3226 2 жыл бұрын
@@AMGaweda okk i able to convert pdf text into text file then...?
@AMGaweda
@AMGaweda 2 жыл бұрын
@@sharma3226 Then you can do the methods shown in the video
Python Challenge | Word Frequency
16:39
Very Academy
Рет қаралды 2,9 М.
Iterative Deepening Search
11:39
Adam Gaweda
Рет қаралды 2,7 М.
Became invisible for one day!  #funny #wednesday #memes
00:25
Watch Me
Рет қаралды 59 МЛН
Now THIS is entertainment! 🤣
00:59
America's Got Talent
Рет қаралды 38 МЛН
Heartwarming Unity at School Event #shorts
00:19
Fabiosa Stories
Рет қаралды 19 МЛН
Nastya and SeanDoesMagic
00:16
Nastya
Рет қаралды 15 МЛН
Genetic Algorithms
14:19
Adam Gaweda
Рет қаралды 1,5 М.
A* Pathfinding (A-Star Pathfinding)
40:05
Adam Gaweda
Рет қаралды 1 М.
Simulated Annealing
17:45
Adam Gaweda
Рет қаралды 11 М.
Ant Colony Optimization
34:46
Adam Gaweda
Рет қаралды 2,1 М.
Word File Processing in Python
19:43
NeuralNine
Рет қаралды 63 М.
Python for Beginners - Learn Python in 1 Hour
1:00:06
Programming with Mosh
Рет қаралды 17 МЛН
Regular Expression Tutorial Python | Python Regex Tutorial
25:29
codebasics
Рет қаралды 113 М.
Became invisible for one day!  #funny #wednesday #memes
00:25
Watch Me
Рет қаралды 59 МЛН