NLP - Text Preprocessing and Text Classification (using Python)

  Рет қаралды 84,284

Machine Learning TV

Machine Learning TV

Күн бұрын

Hi! My name is Andre and this week, we will focus on text classification problem. Although, the methods that we will overview can be applied to text regression as well, but that will be easier to keep in mind text classification problem. And for the example of such problem, we can take sentiment analysis. That is the problem when you have a text of review as an input, and as an output, you have to produce the class of sentiment. For example, it could be two classes like positive and negative. It could be more fine grained like positive, somewhat positive, neutral, somewhat negative, and negative, and so forth. And the example of positive review is the following. "The hotel is really beautiful. Very nice and helpful service at the front desk." So we read that and we understand that is a positive review. As for the negative review, "We had problems to get the Wi-Fi working. The pool area was occupied with young party animals, so the area wasn't fun for us." So, it's easy for us to read this text and to understand whether it has positive or negative sentiment but for computer that is much more difficult. And we'll first start with text preprocessing. And the first thing we have to ask ourselves, is what is text? You can think of text as a sequence, and it can be a sequence of different things. It can be a sequence of characters, that is a very low level representation of text. You can think of it as a sequence of words or maybe more high level features like, phrases like, "I don't really like", that could be a phrase, or a named entity like, the history of museum or the museum of history. And, it could be like bigger chunks like sentences or paragraphs and so forth. Let's start with words and let's denote what word is. It seems natural to think of a text as a sequence of words and you can think of a word as a meaningful sequence of characters.
So, it has some meaning and it is usually like,if we take English language for example,it is usually easy to find the boundaries of words because in English we can split upa sentence by spaces or punctuation and all that is left are words.Let's look at the example,Friends, Romans, Countrymen, lend me your ears;so it has commas,it has a semicolon and it has spaces.And if we split them those,then we will get words that are ready for further analysis like Friends,Romans, Countrymen, and so forth.It could be more difficult in German,because in German, there are compound words which are written without spaces at all.And, the longest word that is still in use is the following,you can see it on the slide and it actually stands forinsurance companies which provide legal protection.So for the analysis of this text,it could be beneficial to split that compound word intoseparate words because every one of them actually makes sense.They're just written in such form that they don't have spaces.The Japanese language is a different story.

Пікірлер: 22
@SMPURNIMAWIJENDRA
@SMPURNIMAWIJENDRA 5 жыл бұрын
This video is so insightful. Thank you so much.
@manwaiyeung9011
@manwaiyeung9011 2 жыл бұрын
I am a NLP practitioner at work. This is great with plenty of practical examples.
@ChandrajeetMaurya
@ChandrajeetMaurya 3 жыл бұрын
I was looking for text classification tutorial, this does not says any but it was still very useful to clear some basics.
@ting-yuhsu4229
@ting-yuhsu4229 3 жыл бұрын
it's very clear! muchas gracias!
@jungjoonkil1918
@jungjoonkil1918 4 жыл бұрын
The methods of text analysis in Korean and English are different. But I can learn very important basics here. Thank you.
@hamidrezabarari155
@hamidrezabarari155 4 жыл бұрын
very useful. Thank you :)
@13eau33
@13eau33 3 жыл бұрын
thanks so much, really helpful
@bhanupriyatham668
@bhanupriyatham668 2 жыл бұрын
is it possible to do text processing for multiple columns in the dataset ?
@PAINFEELER007
@PAINFEELER007 Жыл бұрын
Great. From india 🇮🇳🇮🇳🇮🇳
@moravec481
@moravec481 2 жыл бұрын
Why not lemmatize first and then stem (i.e. use both)?
@ganj51
@ganj51 3 жыл бұрын
Would domain identification (news,sports etc) comes under text classification?? Help me out!
@kouameamosbrou968
@kouameamosbrou968 3 жыл бұрын
Хорошее видео
@quintinsa5798
@quintinsa5798 5 жыл бұрын
hi how can i use nlp to create an application that mark essays?
@lVaNeSsA90
@lVaNeSsA90 3 жыл бұрын
I was waiting for some code..
@williamstorey5024
@williamstorey5024 4 ай бұрын
what is text regression?
@zmey2k
@zmey2k 5 жыл бұрын
dudes, контринтуитивное сейчас скажу, на русише есть эти лекции? английский язнаю, но когда живешь за границей лишняя практика такая лишняя, на нейтиве просто быстрее воспринимается и запоминается, ы
@mikkaruru
@mikkaruru 5 жыл бұрын
Hi! www.coursera.org/learn/language-processing , there is Russian text trnscript, there you will find original course and video
@aoshi479
@aoshi479 4 жыл бұрын
..is good ,but my english is not good,其实 ,俺听不懂
@MrSchlechtes
@MrSchlechtes 4 жыл бұрын
我也一样😅
@nileshkhatri8602
@nileshkhatri8602 5 жыл бұрын
In the title, Text Classification is written but didn't find a single talk about the Text Classification. Waste of Time ......
@736939
@736939 5 жыл бұрын
before text classification you should preprocess it by tokenization, removing stop words and stemming - look forward
Feature Extraction from Text (USING PYTHON)
14:24
Machine Learning TV
Рет қаралды 77 М.
NLP - Linear Models for Text Sentiment Analysis
10:41
Machine Learning TV
Рет қаралды 24 М.
Increíble final 😱
00:37
Juan De Dios Pantoja 2
Рет қаралды 108 МЛН
Haha😂 Power💪 #trending #funny #viral #shorts
00:18
Reaction Station TV
Рет қаралды 14 МЛН
ИРИНА КАЙРАТОВНА - АЙДАХАР (БЕКА) [MV]
02:51
ГОСТ ENTERTAINMENT
Рет қаралды 5 МЛН
I CAN’T BELIEVE I LOST 😱
00:46
Topper Guild
Рет қаралды 54 МЛН
Text Preprocessing in NLP | Python
47:07
Hackers Realm
Рет қаралды 10 М.
Text Classification With Python
38:47
Richard Gruss
Рет қаралды 32 М.
What is NLP (Natural Language Processing)?
9:37
IBM Technology
Рет қаралды 207 М.
Simple Deep Neural Networks for Text Classification
14:47
Machine Learning TV
Рет қаралды 116 М.
The Secret to 90%+ Accuracy in Text Classification
10:34
Pritish Mishra
Рет қаралды 40 М.
8. Text Classification Using Convolutional Neural Networks
16:28
Weights & Biases
Рет қаралды 86 М.
Main filter..
0:15
CikoYt
Рет қаралды 12 МЛН
cute mini iphone
0:34
승비니 Seungbini
Рет қаралды 5 МЛН
Hisense Official Flagship Store Hisense is the champion What is going on?
0:11
Special Effects Funny 44
Рет қаралды 2,1 МЛН