News classification using Spacy word vectors: NLP Tutorial For Beginners

News classification using Spacy word vectors: NLP Tutorial For Beginners - S2 E9

Рет қаралды 27,655

Күн бұрын

Пікірлер: 40

@codebasics 2 жыл бұрын

Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced

@Malayalam_learner 2 ай бұрын

For the sake of completion i started nlp with little ML Knowledge, i understood 50% only thanks for tutorials

@oriabnu1 2 жыл бұрын

sir, there is no course on NLP steganography on the whole youtube, if you can work on this it would be a great contribution sir

@terran008 Жыл бұрын

Thank you for your content, it is really useful and you explain everything so well and clear

@prasanth123cet 2 жыл бұрын

In previous tutorials, you explained about vector representation for each word using continuous bag of words. Here for the entire sentence or paragraph is converted into a single vector. What is happening in the background? Can you please explain?

@codebasics 2 жыл бұрын

It takes word embedding of individual vectors and for a sentence it would just average it up to come up with a sentence embedding

@lapthanh-r3g Жыл бұрын

i have a question , why must change array to array 2d in X_train and X_test. I need clearly that . Thanks

@traveldiaries347 2 жыл бұрын

Hi all, how could I access to all the dataset used in these tutorials? I used to go to the mentioned link but couldn't access to the mentioned datasets.

@Essentialenglishwords-ii7ek Жыл бұрын

same

@Raaj_ML 8 ай бұрын

Great tutorial..But can we use Naive Bayes on this ? Because NB assumes probabilities based on words count ( BOW), how can we use word2vec kind of representation for NB ?

@mykhan1676 Жыл бұрын

Sir make a video on Roman Urdo Text.. with implementino of code…which explain lexical normalisation

@devanshpurwar 2 жыл бұрын

why should we use df.vectors.values instead of df.vector

@bahaaltanreisoglu108 Ай бұрын

it is late comment a little bit ... train vectors should be numpy format

@ravindarmadishetty736 Жыл бұрын

Hi @Dhaval, can you make videos on Transformers, Bert, and LLM's. Your way of explanation is very impressive and easy to understand.

@codebasics Жыл бұрын

I've entire series on llm, langchain. Check the available playlists and look for langchain, llm playlist.

@Ian-yi7ks Жыл бұрын

Love your videos! I have a question regarding the text corpus. In the case u have a corpus of 2-6 sentences which only one sentence has the correct information in it for the classification task. Will the word vector get delude/distord by the other 1-5 senctences ?

@Mr0o0o0o0o0o0o0o0 Жыл бұрын

Im trying to do something similar to this but instead of labels to match against i have values that have been measured as reactions to my text data, one of my problems is that since its measured data, its obviously normal-distributed.. and i was wondering if there was a neet way to work around that target of having equal amounts of data /label?

@anoshpa-6531 11 ай бұрын

This dataset is no longer available on kaggle, can you please upload a copy of it. Thanks!

@mdrehan4all Жыл бұрын

Dataset is changed @ kaggle can you please provide stored copy of dataset

@chandukalisetti 2 жыл бұрын

If you don't mind, may I know how many days will it take for you to complete this NLP total tutorial, Sir? Coz I really wanna follow your NLP course to build my resume strong. Hope you will reply to this. Thanks in advance.

@codebasics 2 жыл бұрын

To be honest I do not know. NLP is a multi disciplinary vast field and it may take another 6 months to finish the playlist (and even then I won't be able to actually finish it up). The major things that come to my mind are I want to make videos on information extraction, topic modeling, some fasttext, hugging face tutorials and few other NLP applications along with end to end projects.

@chandukalisetti 2 жыл бұрын

@@codebasics Thank you so much for your response and what you are doing for us ❤️😊

@mandeep8696 Жыл бұрын

@codebasics I have a query -Here we converted the text to vectors and then splitted the data into train and test. This would cause data leakage and overfitting. Shouldn't it be splitting first and then vectorization.

@SHANMUKHASHARMA-s6h Жыл бұрын

vectorization is to convert text to number so that machine can understand it.Overfitting is caused when we modify/use a model which works for the training set excellently but fails in test data.But our model gives a good recall precision and f1score which means model is working great.

@aravinthmegnath3569 Жыл бұрын

Hi Sir, How to do negation detection ? @codebasics eg: I have a dataset with real news only as label, but in the text I have fake news also. How do we separate fake labels from the text?

@tharinduwickramaarachchi1791 2 жыл бұрын

Sir, how to classify none English text like " sinhala "

@tanzeelmohammed9157 Жыл бұрын

Sir, should I do the train test split before preprocessing on whole dataset or after preprocessing only on training dataset?

@venkatarohitpotnuru38 Жыл бұрын

u need to preprocess it first

@oriabnu1 2 жыл бұрын

Towards Near-imperceptible Steganographic Text Falcon Z. Dai, Zheng Cai

@aamiradeeb2488 2 жыл бұрын

Where will SQL course will out sir.. I am eagerly waiting

@codebasics 2 жыл бұрын

October to November time frame. It is interesting that I read your comment while I am preparing a tutorial for SQL course itself 😊

@asktostranger8296 2 жыл бұрын

@@codebasics i want all The resources from whom you are preparing For whole machine learning,deep learning,and for the maths behind them Ds algo in python too Glad if you will reply 🙏🙏🙏🙏

@marcellodichiera 2 жыл бұрын

@@asktostranger8296 so greedy haha 😅😅😂

@ssssss9311 2 жыл бұрын

Do Machine Learning Engineers use Power bi ?

@codebasics 2 жыл бұрын

@AbhishekSharma-gv2si Жыл бұрын

Can someone provide me this dataset please. I been searching it from a while.

@riyan8p Жыл бұрын

Search for it you will find two data sets true and fake Import them add label columns set minimum samples to reduce size and then combine both data frames that way you will have a perfect data set for training

@plslokeshreddy Жыл бұрын

It is in the same github repo as exercise

@oriabnu1 2 жыл бұрын

Ph.D. students stuck in this field please help us