Three ways to optimize your Pandas data frame's memory footprint

  Рет қаралды 2,008

Python and Pandas with Reuven Lerner

Python and Pandas with Reuven Lerner

Жыл бұрын

If you work with large data sets in Pandas, you'll quickly discover that they take up lots of memory. How can you stop wasting memory, and reduce the size of your data frame? In this video, I show you three techniques you can apply to nearly any project you work on.
The Jupyter notebook for this video (and all of my videos) is at github.com/reuven/youTube-not....
And don't forget my free, weekly "Better developers" newsletter, with new Python articles every week, at BetterDevelopersWeekly.com/.

Пікірлер: 14
@vhphan19
@vhphan19 Жыл бұрын
Prior to doing df.loc[df['total_amount'] >= 0] the dataframe index is a range index (memory efficient). after the .loc operation the index becomes an array with 6 million + entries. Doing a reset_index(drop=True) should reduce your memory usage back to 180Mb
@ReuvenLerner
@ReuvenLerner Жыл бұрын
Of course! I was concentrating on the data, but the index obviously takes up memory, as well, and a RangeIndex is far more compact than one with integers (or other objects). I even tell that to my students, but sometimes when recording live, you forget these obvious things. Thanks so much for the comment and correction! (And for a good exercise question to ask in my classes...)
@aflous
@aflous Жыл бұрын
That's what I was going to comment about. Other than that, this is a very helpful video. Keep'em coming!
@OhsoLosoo
@OhsoLosoo Жыл бұрын
I appreciate that you left in that little bit where the memory jumped up unexpectedly. Thanks to another comment when I faced this issue on one of my tests I was able to easily resolve this issue. Keep up the amazing work!
@ReuvenLerner
@ReuvenLerner Жыл бұрын
Glad you liked it - I feel it's important to show that yes, I might know more than the average Python/Pandas user, but I definitely make mistakes, forget things, and just plain ol' get things wrong. I've learned a *ton* from my students over the years, and I try to pass along what I learn to future groups. Glad you're enjoying and learning!
@Vijay-Yarramsetty
@Vijay-Yarramsetty Жыл бұрын
great content. thanks Reuven
@ReuvenLerner
@ReuvenLerner Жыл бұрын
So glad to know you're enjoying!
@gerardohernandez1317
@gerardohernandez1317 Жыл бұрын
Thank you for this! It was my first time working with something like this and it kept giving me a warning. Now I know how to fix it! :)
@ReuvenLerner
@ReuvenLerner Жыл бұрын
I'm so happy to hear it helped!
@philtoa334
@philtoa334 Жыл бұрын
Thank for the vidéo , is it possible to change Datetime to 32 bits ?
@ReuvenLerner
@ReuvenLerner Жыл бұрын
What a great question! I was sure that datetime64 is all we've got, and so we're stuck with 64 bits. But I did some digging, and it looks like Apache Arrow supports 32-bit times and dates. PyArrow does as well, so *maybe* it's possible to get this to work behind the scenes using specialized dtypes. But it's not a core part of Pandas.
@philtoa334
@philtoa334 Жыл бұрын
@@ReuvenLerner Thx.
@jshiff3938
@jshiff3938 Жыл бұрын
Obviously the taxi trips with negative distance were subtracting from your memory usage.
@ReuvenLerner
@ReuvenLerner Жыл бұрын
Ha! That made me literally laugh out loud...
Replacing values in a Pandas data frame
6:18
Python and Pandas with Reuven Lerner
Рет қаралды 1,3 М.
Interpolating missing values (NaN) in Pandas data frames
10:50
Python and Pandas with Reuven Lerner
Рет қаралды 2,2 М.
¡Puaj! No comas piruleta sucia, usa un gadget 😱 #herramienta
00:30
JOON Spanish
Рет қаралды 22 МЛН
New Gadgets! Bycycle 4.0 🚲 #shorts
00:14
BongBee Family
Рет қаралды 6 МЛН
OMG 😨 Era o tênis dela 🤬
00:19
Polar em português
Рет қаралды 11 МЛН
Why? 😭 #shorts by Leisi Crazy
00:16
Leisi Crazy
Рет қаралды 47 МЛН
The six most important read_csv arguments in Pandas
16:50
Python and Pandas with Reuven Lerner
Рет қаралды 2,4 М.
Speed Up Your Pandas Dataframes
11:15
Rob Mulla
Рет қаралды 67 М.
Memoization: The TRUE Way To Optimize Your Code In Python
7:32
Reading Large File as Pandas DataFrame  Memory Error Issue
9:16
Soumil Shah
Рет қаралды 24 М.
Stop wasting memory in your Pandas DataFrame!
5:00
Visual Studio Code
Рет қаралды 12 М.
«Память и Python. Что надо знать для счастья?» Алексей Кузьмин, ЦНС
30:52
Understanding "with" and Python's context managers
14:00
Python and Pandas with Reuven Lerner
Рет қаралды 721
Process HUGE Data Sets in Pandas
10:04
NeuralNine
Рет қаралды 35 М.
Make Your Pandas Code Lightning Fast
10:38
Rob Mulla
Рет қаралды 175 М.
5 НЕЛЕГАЛЬНЫХ гаджетов, за которые вас посадят
0:59
Кибер Андерсон
Рет қаралды 754 М.
Топ-3 суперкрутых ПК из CompShop
1:00
CompShop Shorts
Рет қаралды 301 М.
AMD больше не конкурент для Intel
0:57
ITMania - Сборка ПК
Рет қаралды 507 М.
⁉️На какой ANDROID ПЕРЕЙТИ c iPhone📱
0:38