Рет қаралды 678
With just a few lines of script, you can create your own personalized way to detect alternate spellings, misspellings, and brand names in any text.
A significant issue in textual data analysis is where alternate or misspellings are used instead of a word. This is particularly prevelant in textual data in social media such as Facebook, Reddit, and Twitter. In this tutorial, I show how to use the fastText Python library to quickly create a way of finding alternate spellings in textual data.
** PLEASE SUBSCRIBE / @garyeckstein **
With 1.7 million Reddit records, I show how to identify where a similar, but not identical, term is used so that you may clean your data to get more accurate results from your data analysis. For example, you may want to know all text that contains 'paracetamol', yet some posts may use Tylenol or 'parucetamul'. The way I show helps to identify all words used that relate to your primary term.
If you're unsure how to use Python see • Start using Python qui...
#Python #Tutorial #datascience #nlp