Bag of Words - Feature Extraction in Natural Language Processing (BoW in NLP)

  Рет қаралды 3,147

Socratica

Socratica

Күн бұрын

Mathematica Essentials - the first PRO COURSE from Socratica
Buy here: www.socratica.com/courses/mat...
Learn along with free Mathematica notebooks available on github:
github.com/socratica/wolfram
𝙒𝘼𝙉𝙏 𝙈𝙊𝙍𝙀? snu.socratica.com/mathematica
To be notified about updates to our first Pro Course "Mathematica Essentials,", join our mailing list at: snu.socratica.com/mathematica
Natural Language Processing (NLP) is a specialized field within machine learning, focused on interpreting and processing HUMAN language, or "natural" language. This is crucial, as only a fraction of the population knows a computer language.
In this video, we explore the "Bag of Words" (or BoW) technique, which is a way to transform docs from something qualitative (text) into something quantitative (word frequencies, etc.). We'll discuss the math terminology used in this area, including sets and multisets, creating a vector (embedding in feature space), normalization, and more. We'll use the Wolfram Language to work these examples. In a future lesson, we will explore these concepts using Python as well.
BTW-Socratica offers a pro course, 'Mathematica Essentials,' providing key concepts for mastering Wolfram products:
www.socratica.com/courses/mat...
You can jump to sections of the video here:
0:00 Intro & Conceptual Definition
0:48 Making text quantitative (word frequencies)
1:53 Feature Extraction
2:16 Example: The Foundation
3:34 Word Frequencies and repeats
4:03 Math terminology: Set and Multiset
5:06 Create a vector (embedding in feature space)
7:03 Example: War & Peace (Normalization)
Thank you to our VIP Patreon Members who helped make this video possible!
KW, M Andrews, Jim Woodworth, Massimiliano Pala, Marcos Silveira, Christopher Kemsley, Eric Eccleston, Jeremy Shimanek, Michael Shebanow, Alvin Khaled, Kevin B, John Krawiec, Umar Khan, and Tracy Karin Prell - we are so happy to have you on our team!
- Thank you kind friends! 💜🦉
✷✷✷
We recommend the following (affiliate links):
The Wolfram Language
amzn.to/3D4jqvz
The Mythical Man Month - Essays on Software Engineering & Project Management
amzn.to/2tYdNeP
Innumeracy: Mathematical Illiteracy and Its Consequences
amzn.to/2ri1nf7
Mindset by Carol Dweck
amzn.to/2q9y8Nj
How to Be a Great Student (our first book!)
ebook: amzn.to/2Lh3XSP
Paperback: amzn.to/3t5jeH3
Kindle Unlimited: amzn.to/3atr8TJ
✷✷✷
If you find our work at Socratica valuable, please consider becoming our Patron on Patreon!
/ socratica
If you would prefer to make a one-time donation, you can also use
Socratica Paypal
www.paypal.me/socratica
✷✷✷
Written & Produced by Michael Harrison & Kimberly Hatch Harrison
Edited by Megi Shuke
About our Instructors:
Michael earned his BS in Math from Caltech, and did his graduate work in Math at UC Berkeley and University of Washington, specializing in Number Theory. A self-taught programmer, Michael taught both Math and Computer Programming at the college level. He applied this knowledge as a financial analyst (quant) and as a programmer at Google.
Kimberly earned her BS in Biology and another BS in English at Caltech. She did her graduate work in Molecular Biology at Princeton, specializing in Immunology and Neurobiology. Kimberly spent 16+ years as a research scientist and a dozen years as a biology and chemistry instructor.
Michael and Kimberly Harrison co-founded Socratica.
Their mission? To create the education of the future.
✷✷✷
Welcome to Socratica! We make SMART videos focusing on STEM - science, math, programming. Subscribe here: bit.ly/SocraticaSubscribe
PLAYLISTS
Study Tips bit.ly/StudyTipsPlaylist
Python programming bit.ly/PythonSocratica
SQL programming bit.ly/SQL_Socratica
Chemistry bit.ly/Chemistry_Playlist
Abstract Algebra bit.ly/AbstractAlgebra
Astronomy bit.ly/AstronomySocratica
Biology bit.ly/BiologySocratica
Calculus bit.ly/CalculusSocratica
Geometry bit.ly/GeometrySocratica
Mathematica bit.ly/SocraticaMathematica
#NaturalLanguageProcessing #BagOfWords #Mathematica

Пікірлер: 9
@Socratica
@Socratica 5 ай бұрын
We worked these examples using the Wolfram Language. Socratica offers a pro course, 'Mathematica Essentials,' providing key concepts for mastering Wolfram products: www.socratica.com/courses/mathematica-essentials
@kirbymarchbarcena
@kirbymarchbarcena 5 ай бұрын
I didn't expect The Foundation, The Adventure of Sherlock Holmes, and War and Peace to be in this video as examples.
@jagadishgospat2548
@jagadishgospat2548 5 ай бұрын
Good one team, it's about time we learn about algorithms before they take over.
@DasIllu
@DasIllu 5 ай бұрын
I just wrote a small tokenizer to fit my needs, now i feel like i have to expand it massively. Thanks for the video.
@jim4859
@jim4859 5 ай бұрын
I think this is really interesting.
@chlupatazarovka8201
@chlupatazarovka8201 5 ай бұрын
What about lemmatization? It isn't used?
@samson6707
@samson6707 5 ай бұрын
WordCount[text]. Where you taking these functions from?
@Socratica
@Socratica 5 ай бұрын
This is a built-in function in the Wolfram Language. WordCount["string"] gives the total number of words in string.
@Her_Lovely_Tentacles
@Her_Lovely_Tentacles 5 ай бұрын
"because cats are not vegan they should eat meat" vs "because cats are vegan they should not eat meat" Bag of Words: "It's the same sentence 🤷" In seriousness: is there a way around situations like this, for example by binding the "not" more tightly, or is this simply out of scope for this approach, and the only relevant features are cats and whether or not they are vegan, but with no conclusion if they actually are vegan?
Vectoring Words (Word Embeddings) - Computerphile
16:56
Computerphile
Рет қаралды 282 М.
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 6 МЛН
Final muy inesperado 🥹
00:48
Juan De Dios Pantoja
Рет қаралды 19 МЛН
WHO DO I LOVE MOST?
00:22
dednahype
Рет қаралды 75 МЛН
I’m just a kid 🥹🥰 LeoNata family #shorts
00:12
LeoNata Family
Рет қаралды 17 МЛН
Congruences & Modular Arithmetic ← Number Theory
12:26
Socratica
Рет қаралды 10 М.
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 179 М.
Getting started with Natural Language Processing: Bag of words
6:27
Google Cloud Tech
Рет қаралды 40 М.
Bag of Words : Natural Language Processing
8:00
ritvikmath
Рет қаралды 23 М.
The Algorithm Behind Spell Checkers
13:02
b001
Рет қаралды 405 М.
AsyncIO, await, and async - Concurrency in Python
9:12
Socratica
Рет қаралды 78 М.
Wolfram Engine - SO worth it. - Mathematica Essentials
5:51
Socratica
Рет қаралды 7 М.
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 6 МЛН