Find Misspellings and Alternate Naming in Large Text Datasets (Tutorial)

  Рет қаралды 678

Gary Eckstein

Gary Eckstein

Күн бұрын

With just a few lines of script, you can create your own personalized way to detect alternate spellings, misspellings, and brand names in any text.
A significant issue in textual data analysis is where alternate or misspellings are used instead of a word. This is particularly prevelant in textual data in social media such as Facebook, Reddit, and Twitter. In this tutorial, I show how to use the fastText Python library to quickly create a way of finding alternate spellings in textual data.
** PLEASE SUBSCRIBE / @garyeckstein **
With 1.7 million Reddit records, I show how to identify where a similar, but not identical, term is used so that you may clean your data to get more accurate results from your data analysis. For example, you may want to know all text that contains 'paracetamol', yet some posts may use Tylenol or 'parucetamul'. The way I show helps to identify all words used that relate to your primary term.
If you're unsure how to use Python see • Start using Python qui...
#Python #Tutorial #datascience #nlp

Пікірлер
How to Prepare Text for NLP and Data Analysis (Tutorial)
12:13
Gary Eckstein
Рет қаралды 2,2 М.
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
Tableau - How to Find Duplicates in very Large Data Sets!
6:33
Jellyman Education
Рет қаралды 8 М.
Make Your Pandas Code Lightning Fast
10:38
Rob Mulla
Рет қаралды 189 М.
Calculating Text Similarity in Python with NLP
17:55
NeuralNine
Рет қаралды 56 М.
Bibliometrix Tutorial 2024 👌 Boost Your Literature Review
24:26
Gary Eckstein
Рет қаралды 13 М.
Large Scale Fuzzy Name Matching (Zhe Sun & Daniel van der Ende)
33:11
Bibliometrix - How to Create and Use a Synonym File ✔️
3:25
Word2Vec, GloVe, FastText- EXPLAINED!
13:20
CodeEmporium
Рет қаралды 27 М.
How to add a 7 point Likert scale (LimeSurvey) 👌
6:45
Gary Eckstein
Рет қаралды 2,9 М.