Clean Excel Data With Python Pandas - Removing Unwanted Characters

  Рет қаралды 110,093

Derrick Sherrill

Derrick Sherrill

4 жыл бұрын

Hey Everyone, in this one we're looking at the replace method in pandas to remove characters from your spreadsheet columns.
Be sure to post what you want to see next!
Kite helps fund the channel, thanks for checking them out and supporting me --
⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. www.kite.com/get-kite/?...
Support the Channel on Patreon --
/ derricksherrill
Join The Socials --
Reddit - / codewithderrick
FB - / codewithderrick
Insta - / codewithderrick
Twitter - / codewithderrick
LinkedIn - / derricksherrill
GitHub - github.com/Derrick-Sherrill
*****************************************************************
Full code from the video:
import pandas as pd
excel_file_path = 'office_info.xlsx'
df = pd.read_excel(excel_file_path)
print(df.head(2))
for column in df.columns:
df[column] = df[column].str.replace(r'\W',"")
df.to_excel("removed_characters.xlsx")
github.com/Derrick-Sherrill/D...
Packages (& Versions) used in this video:
Python 3.8
Pandas 0.25
Atom 1.41
*****************************************************************
Code from this tutorial and all my others can be found on my GitHub:
github.com/Derrick-Sherrill/D...
Check out my website:
www.derricksherrill.com/
If you liked the video - please hit the like button. It means more than you know. Thanks for watching and thank you for all your support!!
--- Channel FAQ --
What text editor do you use?
Atom - atom.io/
What Equipment do you use to film videos?
Blue Yeti Microphone - amzn.to/2PcNj5d
Mic sound shield - amzn.to/3bVNkEt
Soundfoam - amzn.to/37NV9ci
Camera desk stand - amzn.to/3bX8xhm
Box Lights - amzn.to/2PanL95
Side Lights - amzn.to/37KSNut
Green Screen - amzn.to/37SFFnc
What computer do you use/desk setup?
Film on imac (4k screen) - amzn.to/37SEu7g
Work on Macbook Pro - amzn.to/2HJ5b3G
Video Storage - amzn.to/2Pey8sw
Mouse - amzn.to/2PhCtv3
Desk - amzn.to/37O1Mv1
Chair - amzn.to/2uqHE4E
What editing software do you use?
Adobe CC - www.adobe.com/creativecloud.html
Premiere Pro for video editing
Photoshop for images
After Effects for animations
Do I have any courses available?
Yes & always working on more!
www.udemy.com/user/derrick-sh...
Where do I get my music?
I get all my music from the copyright free KZbin audio library
kzbin.info...
Let me know if there's anything else you want answered!
-------------------------
Always looking for suggestions on what video to make next -- leave me a comment with your project! Happy Coding!

Пікірлер: 199
@CodeWithDerrick
@CodeWithDerrick 4 жыл бұрын
Thanks for being here! What do you want to see next?
@shahzan525
@shahzan525 4 жыл бұрын
Make on mergesort..
@KostasPanagias
@KostasPanagias 4 жыл бұрын
Hi Derrick! I would like to see a way (if there is any) to extract part of a large excelsheet which is highlighted with specific color (font or background color). For example in an excel with 1000 rows, to extract only those rows that have yellow color (or red font, or even combination) in a new excel file (with the first row which could be the data labels).
@cu806
@cu806 4 жыл бұрын
I keep receiving the following error. I am trying to remove the pipes from my excel spread sheet. Traceback (most recent call last): File "C:/Users/User/Documents/remove_characters.py", line 18, in df[column] = df[column].str.replace('\|\|','') File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\generic.py", line 5175, in __getattr__ return object.__getattribute__(self, name) File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\accessor.py", line 175, in __get__ accessor_obj = self._accessor(obj) File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\strings.py", line 1917, in __init__ self._inferred_dtype = self._validate(data) File "C:\Program Files (x86)\Python37-32\lib\site-packages\pandas\core\strings.py", line 1967, in _validate raise AttributeError("Can only use .str accessor with string " "values!") AttributeError: Can only use .str accessor with string values! Here is my code and I formatted all my data as text as well. Added Noob comments in code lol import pandas as pd import os # file should be formatted as text using the format cells>text option # since were not working in the same file directory you need to specifiy the file path # Make sure you are using double backslashes to seperate and single quotes to enclose the path inside parenthesis excel_file_path = ('C:\\Users\\User\\Documents\\Teset2.xlsx') #creates data frame and reads from excel df = pd.read_excel(excel_file_path) # only returns the first 2 lines print(df.head(2)) for column in df.columns: df[column] = df[column].str.replace('\|\|','') print(df) df.to_excel('C:\\Users\\User\\Documents\\Clean_Data.xlsx')
@customnotion
@customnotion 4 жыл бұрын
But for the r'' is for raw string?, Can we use it for regex also?
@grizthegriffdog9939
@grizthegriffdog9939 4 жыл бұрын
Will this method work for removing a trailing comma from csv file?? For some reason, my AlphaVantage stockpuller/graph works fine for AV Daily data, however when I use intraday, I am getting a comma at the end of my lines ... same exact method. Not really sure what is going on.
@aatsw
@aatsw 3 жыл бұрын
I've binge-watched quite a lot of your videos in the last two days. Amazing contents, and you always managed to keep it concise yet very clearly explained. Thank you. Please keep updating.
@aduck24
@aduck24 4 жыл бұрын
This was what I needed and you made it so clear to understand. Thank you 😊
@subzeroLV
@subzeroLV 4 жыл бұрын
Thank you Derrick! That was exactly what I was looking for! I have no doubt I can apply this to my situation. I really appreciate you taking the time to make this video!
@CodeWithDerrick
@CodeWithDerrick 4 жыл бұрын
I'm glad it works for you. Always happy to help!
@vanakornsirijongprasert1726
@vanakornsirijongprasert1726 2 ай бұрын
I like how you went back explaining each methods and what they would do.
@sahilkhan2470
@sahilkhan2470 Жыл бұрын
clear, easy and straight to the point. A real man
@samr598
@samr598 3 жыл бұрын
This was what I needed and you made it so clear to understand. Thank you
@jesusbvasquezq6597
@jesusbvasquezq6597 2 жыл бұрын
Excellent video, straight to the point and clearly explained. It helped me a lot. Thanks a million for this.
@shawnpetersen5338
@shawnpetersen5338 4 жыл бұрын
Great video, keep up the good work. I just started my Data Analytics journey and using "R" but self-teaching myself on Python.
@petterstensland3888
@petterstensland3888 3 жыл бұрын
Derrick, super thank you for these videos. Invaluable!
@sudiptomitra
@sudiptomitra 3 жыл бұрын
Very simple, yet so effective & precise !
@JohnsonKongor
@JohnsonKongor 4 жыл бұрын
Thanks, Derrick. I love this cause I work with many spreadsheets.
@torque6389
@torque6389 4 жыл бұрын
Another great video! You are very good at explaining things.
@amritamishra2313
@amritamishra2313 3 жыл бұрын
I Was stuck here since 2 hours..your code did help..!!Thanks...
@brendafosmire6519
@brendafosmire6519 4 жыл бұрын
Thanks. Very useful for me since I’m learning Pandas.
@aqiltank
@aqiltank 3 жыл бұрын
Love from Malaysia, I like your videos. Keep it up 😇
@hasibulhasan2798
@hasibulhasan2798 4 жыл бұрын
Thanks man! This was very helpful also to remove unnecessary character from csv data.
@nkunam
@nkunam 4 жыл бұрын
You are the best Derrick. Thank you.
@monihareddy5491
@monihareddy5491 3 жыл бұрын
You are a life savior for me. Thank you so much for this.
@mohamed4743
@mohamed4743 2 жыл бұрын
Thank you Derrrick great video. I'm new to python and was a bit sceptical to use with excel. I'm quite proficient in excel, but this seems faster than text to columns functionality of excel. I'll give a bash and see.
@luismoreyra6804
@luismoreyra6804 2 жыл бұрын
wow! i'm really amazed with this vid! you saved me a lot of headaches dude!! thanks a lot!!!
@MildlyAmusingComedyC
@MildlyAmusingComedyC 3 жыл бұрын
Awesome Simple video. Exactly What I was looking for
@johnnynicolas4622
@johnnynicolas4622 2 жыл бұрын
thanks I spent about two days to solve that problem finally you help me !
@kadhraedgerly4267
@kadhraedgerly4267 3 жыл бұрын
Great video! I'm currently trying to remove the units (and characters) in 3 columns of a dataset so I can gain greater insights and analytics. Basically, I want to just leave the numbers and convert it to a float from an object. I'm working in Python for this project. Would love a video on this!
@jaredoirere4746
@jaredoirere4746 4 ай бұрын
Brilliant and precisely well explained
@darshitsolanki7352
@darshitsolanki7352 3 жыл бұрын
Amazing video 🎥 keep hustling 🔥🔥🔥🔥
@conscioussleeppill1093
@conscioussleeppill1093 3 жыл бұрын
Great 👍.. I got this useful for my Current code
@hassanmahamat-pz8fx
@hassanmahamat-pz8fx 2 ай бұрын
Very clear explanation
@sirfsimran482
@sirfsimran482 3 жыл бұрын
Thank You Derrick Bro :) Keep up the good Work ;)
@Snooch5991
@Snooch5991 3 жыл бұрын
you just saved my homework, thank you good sir
@animation__001
@animation__001 3 жыл бұрын
Thanks you so much this is what i need right now🔥
@waynefmj
@waynefmj 3 жыл бұрын
Very nicely done
@rodkrebs9140
@rodkrebs9140 3 жыл бұрын
Excelente video, very clear, thanks!!!
@darshitsolanki7352
@darshitsolanki7352 3 жыл бұрын
I like really easy and short videos to explain things and ur contents are really like mine category 😂
@sachinmaroky4600
@sachinmaroky4600 Жыл бұрын
thanks for the explanation
@priyanshurohilla737
@priyanshurohilla737 2 жыл бұрын
amazing sirr.
@anelisabolosha9934
@anelisabolosha9934 Жыл бұрын
Thank you so much this is really great💟💯
@muhanadkais
@muhanadkais 3 жыл бұрын
Thank you! So helpful!
@yashshah6919
@yashshah6919 3 жыл бұрын
Thank You Derrick
@ExcelVisual
@ExcelVisual 4 жыл бұрын
Great example, thanks! LIKE
@nooknaruk9
@nooknaruk9 4 жыл бұрын
You got my subscribe , thanks so much
@JohnsonKongor
@JohnsonKongor 4 жыл бұрын
I love this much. Thanks.
@chintu70017
@chintu70017 3 жыл бұрын
Derrick thank you!!!
@kelikisbiyantoro2518
@kelikisbiyantoro2518 3 жыл бұрын
Thanks Darrick very helpfull
@muhammadnouman307
@muhammadnouman307 Жыл бұрын
THANKS BROTHER
@adeboyeopatimehin578
@adeboyeopatimehin578 2 жыл бұрын
Good job, pls trying to do this on a data set that is on multiple excel spreadsheets. Will appreciate any useful input.
@chrisber
@chrisber 4 жыл бұрын
Thanks legend, subbed
@shankerm3959
@shankerm3959 4 жыл бұрын
Excellent job How about an advanced regex tutorial? That would be awesome to have.
@gerardogomez3601
@gerardogomez3601 Жыл бұрын
thank you!
@terboonway4346
@terboonway4346 4 жыл бұрын
Thanks for this great video! May I know what text editor do you use? Im using Jupyternotebook, and it seems like your text editor is better, as it has auto fill up function
@sivaarwin8816
@sivaarwin8816 Жыл бұрын
I love you bro thanks for this video
@hemantsah8567
@hemantsah8567 4 жыл бұрын
Great Work Dude... Can you make a video on extracting and cleaning data from excel file? e.g. those excel sheets contain some row as headers or uneven number of rows.
@hogobi
@hogobi 4 жыл бұрын
thank . powerful video
@cordularaecke
@cordularaecke 4 жыл бұрын
Great video, you explain things very clearly. I take your point about different approaches to apply cleaning function. You could perhaps elaborate about more advanced pandas like using lambda functions? For example: cleanup = lambda x: x.astype(str).str.replace(r'\W', '') df = df.apply(cleanup) # restore salary to integer value df.Salary = df.Salary.astype(int) # export to excel without index (not normally needed) df.to_excel('removed_characters.xlsx', index=False) What about grouping and aggregation? Or timeseries analysis ... I'm terrible at all of them ! :) Thanks again for great series.
@CodeWithDerrick
@CodeWithDerrick 4 жыл бұрын
Nice catch on returning the salary to int type, totally slipped my mind. I tend to stay away from Lambda's because my dislikes on the video tend to skyrocket haha. Very useful though, just hard to implement and not lose beginners. A full video on them in the future is a good idea though! I've done a little bit with groupby from time to time but it would be good to do a deep dive on more techniques. Timeseries are a must in the future too! Thanks for your kind words.
@phil.pinsky
@phil.pinsky 4 жыл бұрын
Fantastic video, thanks for the great content! Very tangential question, but how are you doing your face at the bottom right corner during the screen recordings?
@CodeWithDerrick
@CodeWithDerrick 4 жыл бұрын
Thanks for the kind words! I'm using a green screen behind me. So I key the screen out and resize the headshot during editing with premiere pro.
@MrStudent1978
@MrStudent1978 4 жыл бұрын
Can you please show how the word spellings can be corrected after removing special characters in an excel file...your video was really very helpful! Thanks for sharing! Respects to you from Punjab India
@digitaldiary9410
@digitaldiary9410 3 жыл бұрын
Great dude!
@greenchiptechnology5
@greenchiptechnology5 3 жыл бұрын
Hi Derrick, great video Create one video for set timer and auto trigger py script to run excel work and save in excel as xlsb format.
@davidallyn1818
@davidallyn1818 4 жыл бұрын
Ah! of course... regex!! Thank you sir!
@parshuramsinghthakur6405
@parshuramsinghthakur6405 4 жыл бұрын
Thank you Derrick for sharing this. It helped me a lot. Can you please help me on how to do this - Remove special characters from the column headings and replace all spaces with single underscore
@enricomendiola9952
@enricomendiola9952 4 ай бұрын
Hello Derrick great video. I just want to ask what IDE or editor are you using in this video?
@comprehensivemathsscience1217
@comprehensivemathsscience1217 3 жыл бұрын
Looks useful
@raphaelokoye4310
@raphaelokoye4310 4 жыл бұрын
Nice video. Very explanatory. The report I generate on a weekly basis is so messy. The software churns out four reports on a single sheet. When loaded in pandas most columns are lost in the dataframe and their corresponding column names in pandas becomes unnamed.. still figuring out the best way to handle that report. I could clean it in Excel but want to automate it with python
@matattz
@matattz Жыл бұрын
I need a recommendation for the BEST Regex course online, or books!It should be easy to digest but also all you really need. Thanks guys!
@nicolasjousson5162
@nicolasjousson5162 3 жыл бұрын
Great content, thank you Derrick. Do you know if this would also remove Enter's from a cell?
@namitachadha3882
@namitachadha3882 4 жыл бұрын
Great!! Thankyou. But when i execute the for loop for removing the special character ,cell consisting of ''no special character'' or noise word is giving empty cell . While exporting, I am getting only the clean data..I dont know why is this happening.. Please help!!
@bindassbaba3479
@bindassbaba3479 3 жыл бұрын
Love from India
@testbetadasa4367
@testbetadasa4367 4 жыл бұрын
Thank you Derrick! can we extract Excel charts(bar,stacked charts) to power point presentations(PPT) ,if possible can you make a video on it.
@ShadabKhan-sn9xc
@ShadabKhan-sn9xc 3 жыл бұрын
Hi you are great May i please request you to make video on excel formulas using python and getting it upto last row?
@sharathkanaparthi8430
@sharathkanaparthi8430 3 жыл бұрын
Amazing !!
@howstuffworks-channel
@howstuffworks-channel Жыл бұрын
This is amazing! Thank you! I wonder if there is a way to remove non-english characters as well like those in Chinese etc. from a list.
@heyrobined
@heyrobined 2 жыл бұрын
thanks
@loveyoutube22
@loveyoutube22 4 жыл бұрын
Thanks!!👌🙏
@yibrahim7
@yibrahim7 2 жыл бұрын
Hey man, thanks a lot. great work, simple and to the point just like we want it. but I have a question, in last step, when I removed the characters from all the columns, I had one cell that was removed completely(originally it contained only numbers and was converted to text as per your advise) so could you help on the reason and solution please?
@cindywang2658
@cindywang2658 2 жыл бұрын
This is very helpful. Thank you! Do you have example to remove all characters that are not in ascii[0-127]
@nhibnguyen5547
@nhibnguyen5547 4 жыл бұрын
Thank you so much for sharing Derrick! This video is really helpful. Also, how can we remove the unwanted characters but still keep the space?
@CodeWithDerrick
@CodeWithDerrick 4 жыл бұрын
Thanks for your kind words! You would just change the regular expression, so the W in this script. Haven't tested it (So you might need to tweak it) but I think it would look something like this: ^a-zA-Z0-9_ Where the underscore would represent the white space. Happy coding!
@muhanadkais
@muhanadkais 3 жыл бұрын
@@CodeWithDerrick What about the opposite case where to remove only the spaces and keep the characters?
@ZaidSoftware
@ZaidSoftware 3 жыл бұрын
Hi @Derrick Sherrill how can split words in one column to many columns in Excel by python thank for everything
@deepakdakhore
@deepakdakhore 4 жыл бұрын
Perfect
@MY83XJidMad
@MY83XJidMad 4 жыл бұрын
thank you
@Richkotite1
@Richkotite1 2 жыл бұрын
Great video, I wanted to output to excel at the end of the routine but its not creating the file. It worked in your "if replacement video" but not here, any ideas?
@leeblack2103
@leeblack2103 2 жыл бұрын
Whether if you wanted to reduce the length of each field. For example, if the field exceeds 35,000 characters truncated.
@bennri
@bennri 3 жыл бұрын
A bigger problem are things like donâ€TMt which should be changed to don't. I found python's ftfy module to be useful for this problem which arises because Windows Excel uses windows-1252 encoding by default, but pandas interprets it as utf-8. It affects not only apostrophes but also accents like café, “quotes,” etc.
@CarlosMoreno-vw4ir
@CarlosMoreno-vw4ir 4 жыл бұрын
This is great, it would be great if we can delete entire blank rows within the data Also how can we get all the prices of certain flights from a flight website? Thanks a lot! You are the best
@cordularaecke
@cordularaecke 4 жыл бұрын
Try this : df.dropna(axis=0, how='all', inplace=True)
@andreswanepoel4826
@andreswanepoel4826 4 жыл бұрын
HI please do a demo on searching for specific characters in a cell (Search and contain)
@leonchen4164
@leonchen4164 3 жыл бұрын
Buster Keaton!
@DebayanKar7
@DebayanKar7 3 жыл бұрын
What if i have a dataset full of mis-typed words and a well defined dictionary of correct, standard words. I there an efficient way to replace the mistyped words in pandas?
@swatisingh4041
@swatisingh4041 2 жыл бұрын
Please make a video on removing elements and their corresponding numeric values from an alloy. For example, Al20Co5Fe6 Cr2, How to split these numeric values in individual columns?
@rushas
@rushas 4 жыл бұрын
Thanks for the good content. The other day I needed to replace a non-empty column with non-empty column in the same sheet. Assume that we have these columns: col-A: Names col-B: Old phone numbers col-C: Email addresses col-D: Some other infos col-E: New phone numbers I simply want to replace col-B with the col-E. You know, in the Excel you can just select col-E and cut the column then select col-B and insert the new column and then delete the old data (col-B). I was wondering if there is an easy way to solve this kind of problems with pandas?
@ilmiahgunggus7078
@ilmiahgunggus7078 2 жыл бұрын
So when we use str.replace(special_characters), a column which contains integer will be ignored ?
@snakesnarroz
@snakesnarroz 4 жыл бұрын
Thanks for this. Oddly my excel file has all columns formatted as text. However, running the script throws an error AttributeError: Can only use .str accessor with string values! This is due to one of the columns is not a string, its a float64 or something. Any advice on how to correct this?
@shivanireddy7734
@shivanireddy7734 2 жыл бұрын
Hi, can you please do a video on how to remove text from an image? Maybe by using Opencv with python
@rish_1823
@rish_1823 Жыл бұрын
Hi Derrick how could we remove special characters from starting of a string and ending of a string in a dataframe
@douglaslopez2311
@douglaslopez2311 2 жыл бұрын
Thanks for your video. I tried to delete any special character of my pandas DF as you explained and I still have the error "None of Index are in the [columns]". Do you know what is the cause? Thank you.
@samuelmensahbaffoe8303
@samuelmensahbaffoe8303 3 жыл бұрын
hi, how do you remove Numbers attached to letters as a string in a column
@radoonhadoon
@radoonhadoon 4 жыл бұрын
hey, what should i do if i need to remove all the special characters and also the numbers
@habibtmg
@habibtmg 3 жыл бұрын
hay, kindly make a video on paths where we are out of the same directory and can perform different calcutions. thnanks.
@dennisasamoah2213
@dennisasamoah2213 4 жыл бұрын
amazing
@geriray4412
@geriray4412 2 жыл бұрын
Hi, how do I apply this with multiple sheets from the same workbook?
@jyotiali5670
@jyotiali5670 Жыл бұрын
I want to clean a column and replacing all the special characters with correct letters , how do I do
@customnotion
@customnotion 4 жыл бұрын
But for the r'' is for raw string?, Can we use it for regex also?
Quick Data Visualization of Excel Data Demo | Python Pandas Tutorial
8:05
Data Cleaning in Pandas | Python Pandas Tutorials
38:37
Alex The Analyst
Рет қаралды 266 М.
Scary Teacher 3D Nick Troll Squid Game in Brush Teeth White or Black Challenge #shorts
00:47
Looks realistic #tiktok
00:22
Анастасия Тарасова
Рет қаралды 98 МЛН
Мы никогда не были так напуганы!
00:15
Аришнев
Рет қаралды 6 МЛН
Introducing Python in Excel
19:01
Leila Gharani
Рет қаралды 1,5 МЛН
Real World Data Cleaning in Python Pandas (Step By Step)
40:01
Ryan Nolan Data
Рет қаралды 59 М.
Are You Still Using Excel? AUTOMATE it with PYTHON
7:19
John Watson Rooney
Рет қаралды 48 М.
Read Messy & Poorly Structured Excel Files Using Pandas (Python)
9:26
Replace Excel If Function with Python Pandas
8:05
Derrick Sherrill
Рет қаралды 129 М.
Replace Excel Vlookup with Python - Five Minute Python Scripts
6:33
Derrick Sherrill
Рет қаралды 227 М.
Python Excel - Reading Excel files with Pandas read_excel
19:33
Very Academy
Рет қаралды 71 М.
OZON РАЗБИЛИ 3 КОМПЬЮТЕРА
0:57
Кинг Комп Shorts
Рет қаралды 1,7 МЛН
iPhone socket cleaning #Fixit
0:30
Tamar DB (mt)
Рет қаралды 7 МЛН
АЙФОН 20 С ФУНКЦИЕЙ ВИДЕНИЯ ОГНЯ
0:59
КиноХост
Рет қаралды 217 М.