No video

Process HUGE Data Sets in Pandas

  Рет қаралды 39,623

NeuralNine

NeuralNine

Күн бұрын

Today we learn how to process huge data sets in Pandas, by using chunks.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine...
💻 The Algorithm Bible Book: www.neuralnine...
👕 Programming Merch: www.neuralnine...
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine...
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/Neu...
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/

Пікірлер: 44
@Open5to6
@Open5to6 6 ай бұрын
I can't always follow everything he says, cause he moves pretty quick and throws a lot at you, but he's always straight to the point, no fluff, and innovative. I always glean more things to look up after hearing it from NeuralNine first.
@DonaldRennie
@DonaldRennie 13 күн бұрын
Very good! I'm a beginner, and this guy spent more time explaining this topic than DataCamp. The only thing I didn't understand was the "series" part.
@thisoldproperty
@thisoldproperty Жыл бұрын
I like the simplicity. Wonder if a similar thing could be done with sql queries given they usually store incredibly large datasets.
@jaysont5311
@jaysont5311 Жыл бұрын
I thought I read that you could, I could be wrong tho
@mikecripps2011
@mikecripps2011 9 ай бұрын
Yes, do it all day long. I read 2.5. billion records a new level for me this week on a wimpy PC. I chunk it by 200 K Rows normally.
@nuhuhbruhbruh
@nuhuhbruhbruh 8 ай бұрын
@@mikecripps2011 the whole point of SQL databases is that you can directly manipulate arbitrary amounts of data without having to load it all in memory though, so you don't need to do any chunking, just let the database run the query and retrieve the processed output
@maloukemallouke9735
@maloukemallouke9735 8 ай бұрын
thanks but how you deal with depending row like times series data or observations like text where context correletead to row?
@mainak222
@mainak222 Ай бұрын
I have the same question, do you have an answer?
@leythecg
@leythecg Жыл бұрын
wie immer top content perfekt präsentiert!
@TomKnudsen
@TomKnudsen Жыл бұрын
Thank you.. Could you please make a tutorial on how you would stip out certain elements from a file that is not your typical "list", "csv" or "json".. Find this task to be the most confusing and difficult things you can do in Python. If needed, I can provide you with a text file which include information about airports such as runways, elevation, etc. Perhaps there are some way to clean such file up or even convert it to a json/excel/csv etc.
@lilDaveist
@lilDaveist Жыл бұрын
Can you explain what you mean? List is a data structure inside Python, csv is a file format (comma separated values), and json is also a file format (JavaScript Object Notation). If you have a file which incorporates many different ways of storing data you have either manually or in a script way copied a file line by line and pasted it in another file.
@kavetisaikumar
@kavetisaikumar Жыл бұрын
What kind of file are you referring to here?
@Ngoc-KTVHCM
@Ngoc-KTVHCM 8 ай бұрын
In excel file, method "pd.read_excel" has no parameter "chunksize", how to handling the big data in many sheet in excel? Please help me!
@aniv6346
@aniv6346 Жыл бұрын
Thanks a ton ! This is very helpful !
@goku-np5bk
@goku-np5bk 8 ай бұрын
why would you use csv format instead of parquet or hdf5 for large datasets?
@chrisl.9750
@chrisl.9750 8 күн бұрын
pandas, for example, doesn't read parquet in chunks. CSV is still relevant for small, easy data transfers.
@lakshay1168
@lakshay1168 Жыл бұрын
Your explanation is very good can you do a video on the Python project that else the position of an eye
@hynguyen1794
@hynguyen1794 7 ай бұрын
i'm a simple man, i see vim, i press like
@csblueboy85
@csblueboy85 Жыл бұрын
Great video thanks
@franklimmaciel
@franklimmaciel 5 ай бұрын
Thanks!
@wzqdhr
@wzqdhr 2 ай бұрын
The hard part is how to append the new feature back to the original dataset without loading them in one shot
@siddheshphaple342
@siddheshphaple342 10 ай бұрын
How can I connect database in python, and how to optimise it if I have 60L+ records in it
@wildchildhep
@wildchildhep Жыл бұрын
it works! thanks!
@uzeyirktk6732
@uzeyirktk6732 Жыл бұрын
how we can further work on it. Suppose if want to use groupby function on column [ 'A '].
@15handersson16
@15handersson16 9 ай бұрын
By experimenting yourself
@tcgvsocg1458
@tcgvsocg1458 Жыл бұрын
i was litteraly watch a video when you post a new video...i like that!(8)
@artabra1019
@artabra1019 Жыл бұрын
OMG tnx im trying to open csv file with million data then my pc collapse so i find some i9 computer with 16gb ram to open it thanks now i can open big files using pandas.
@FabioRBelotto
@FabioRBelotto Жыл бұрын
Can we use each chunk to spawn a new process and do it in parallel?
@Supercukr
@Supercukr Ай бұрын
That would defeat the purpose of saving the RAM
@ramaronin
@ramaronin 4 ай бұрын
brilliant!
@JuanCarlosMH
@JuanCarlosMH Жыл бұрын
Awesome!
@tauseefmemon2331
@tauseefmemon2331 Жыл бұрын
Why was the RAM increasing? should not it stop increasing once the data is loaded?
@thisoldproperty
@thisoldproperty Жыл бұрын
It takes a while to load 4GB into memory. So the shown example was during the process load.
@vishkerai9229
@vishkerai9229 6 ай бұрын
is this faster than Dask?
@MegaLukyBoy
@MegaLukyBoy Жыл бұрын
Is pickle better?
@WilliamDean127
@WilliamDean127 Жыл бұрын
Still would load all data at one time
@hkpeaks
@hkpeaks Жыл бұрын
Benchmark (Pandas vs Peaks vs Polars) kzbin.info/www/bejne/Z3zRZ2lrdqmGmc0
@RidingWithGerdas
@RidingWithGerdas Жыл бұрын
Or with really huge datasets, use Koalas, interface is pretty much the same as pandas
@Zonno5
@Zonno5 Жыл бұрын
Provided you have access to scalable compute clusters. Recently Spark got a pandas API so koalas has sort of become unnecessary for that purpose.
@RidingWithGerdas
@RidingWithGerdas Жыл бұрын
@@Zonno5 talking about pyspark?
@pasqualegu
@pasqualegu Жыл бұрын
all workеd
@ashraf_isb
@ashraf_isb 4 ай бұрын
1000th like 😀
@imclowdy
@imclowdy Жыл бұрын
Awesome! First comment :D
@driouichelmahdi
@driouichelmahdi Жыл бұрын
Thank you
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 263 М.
How to work with big data files (5gb+) in Python Pandas!
11:20
TechTrek by Keith Galli
Рет қаралды 38 М.
Doing This Instead Of Studying.. 😳
00:12
Jojo Sim
Рет қаралды 37 МЛН
Кадр сыртындағы қызықтар | Келінжан
00:16
Lehanga 🤣 #comedy #funny
00:31
Micky Makeover
Рет қаралды 30 МЛН
ПОМОГЛА НАЗЫВАЕТСЯ😂
00:20
Chapitosiki
Рет қаралды 29 МЛН
Makefiles in Python For Professional Automation
13:43
NeuralNine
Рет қаралды 41 М.
Intro to Python Dask: Easy Big Data Analytics with Pandas!
20:31
Bryan Cafferky
Рет қаралды 13 М.
1 billion row challenge in Rust using Apache Arrow
9:12
Josiah Parry
Рет қаралды 9 М.
I loaded 100,000,000 rows into MySQL (fast)
18:27
PlanetScale
Рет қаралды 177 М.
This Is Why Python Data Classes Are Awesome
22:19
ArjanCodes
Рет қаралды 801 М.
The BEST library for building Data Pipelines...
11:32
Rob Mulla
Рет қаралды 74 М.
Modern Graphical User Interfaces in Python
11:12
NeuralNine
Рет қаралды 1,5 МЛН
Make Your Pandas Code Lightning Fast
10:38
Rob Mulla
Рет қаралды 181 М.
Python 101: Learn the 5 Must-Know Concepts
20:00
Tech With Tim
Рет қаралды 1,1 МЛН
Doing This Instead Of Studying.. 😳
00:12
Jojo Sim
Рет қаралды 37 МЛН