Process HUGE Data Sets in Pandas

  Рет қаралды 44,670

NeuralNine

NeuralNine

Күн бұрын

Пікірлер: 45
@Open5to6
@Open5to6 11 ай бұрын
I can't always follow everything he says, cause he moves pretty quick and throws a lot at you, but he's always straight to the point, no fluff, and innovative. I always glean more things to look up after hearing it from NeuralNine first.
@Roman-kn7kt
@Roman-kn7kt 4 ай бұрын
As always, your tutorials are incredible!
@DonaldRennie
@DonaldRennie 5 ай бұрын
Very good! I'm a beginner, and this guy spent more time explaining this topic than DataCamp. The only thing I didn't understand was the "series" part.
@maloukemallouke9735
@maloukemallouke9735 Жыл бұрын
thanks but how you deal with depending row like times series data or observations like text where context correletead to row?
@mainak222
@mainak222 6 ай бұрын
I have the same question, do you have an answer?
@thisoldproperty
@thisoldproperty 2 жыл бұрын
I like the simplicity. Wonder if a similar thing could be done with sql queries given they usually store incredibly large datasets.
@jaysont5311
@jaysont5311 2 жыл бұрын
I thought I read that you could, I could be wrong tho
@mikecripps2011
@mikecripps2011 Жыл бұрын
Yes, do it all day long. I read 2.5. billion records a new level for me this week on a wimpy PC. I chunk it by 200 K Rows normally.
@nuhuhbruhbruh
@nuhuhbruhbruh Жыл бұрын
@@mikecripps2011 the whole point of SQL databases is that you can directly manipulate arbitrary amounts of data without having to load it all in memory though, so you don't need to do any chunking, just let the database run the query and retrieve the processed output
@goku-np5bk
@goku-np5bk Жыл бұрын
why would you use csv format instead of parquet or hdf5 for large datasets?
@chrisl.9750
@chrisl.9750 5 ай бұрын
pandas, for example, doesn't read parquet in chunks. CSV is still relevant for small, easy data transfers.
@Ngoc-KTVHCM
@Ngoc-KTVHCM Жыл бұрын
In excel file, method "pd.read_excel" has no parameter "chunksize", how to handling the big data in many sheet in excel? Please help me!
@leythecg
@leythecg 2 жыл бұрын
wie immer top content perfekt präsentiert!
@TomKnudsen
@TomKnudsen 2 жыл бұрын
Thank you.. Could you please make a tutorial on how you would stip out certain elements from a file that is not your typical "list", "csv" or "json".. Find this task to be the most confusing and difficult things you can do in Python. If needed, I can provide you with a text file which include information about airports such as runways, elevation, etc. Perhaps there are some way to clean such file up or even convert it to a json/excel/csv etc.
@lilDaveist
@lilDaveist 2 жыл бұрын
Can you explain what you mean? List is a data structure inside Python, csv is a file format (comma separated values), and json is also a file format (JavaScript Object Notation). If you have a file which incorporates many different ways of storing data you have either manually or in a script way copied a file line by line and pasted it in another file.
@kavetisaikumar
@kavetisaikumar 2 жыл бұрын
What kind of file are you referring to here?
@siddheshphaple342
@siddheshphaple342 Жыл бұрын
How can I connect database in python, and how to optimise it if I have 60L+ records in it
@tcgvsocg1458
@tcgvsocg1458 2 жыл бұрын
i was litteraly watch a video when you post a new video...i like that!(8)
@uzeyirktk6732
@uzeyirktk6732 Жыл бұрын
how we can further work on it. Suppose if want to use groupby function on column [ 'A '].
@15handersson16
@15handersson16 Жыл бұрын
By experimenting yourself
@aniv6346
@aniv6346 2 жыл бұрын
Thanks a ton ! This is very helpful !
@lakshay1168
@lakshay1168 2 жыл бұрын
Your explanation is very good can you do a video on the Python project that else the position of an eye
@wzqdhr
@wzqdhr 7 ай бұрын
The hard part is how to append the new feature back to the original dataset without loading them in one shot
@FabioRBelotto
@FabioRBelotto 2 жыл бұрын
Can we use each chunk to spawn a new process and do it in parallel?
@Supercukr
@Supercukr 6 ай бұрын
That would defeat the purpose of saving the RAM
@hynguyen1794
@hynguyen1794 Жыл бұрын
i'm a simple man, i see vim, i press like
@csblueboy85
@csblueboy85 2 жыл бұрын
Great video thanks
@tauseefmemon2331
@tauseefmemon2331 2 жыл бұрын
Why was the RAM increasing? should not it stop increasing once the data is loaded?
@thisoldproperty
@thisoldproperty 2 жыл бұрын
It takes a while to load 4GB into memory. So the shown example was during the process load.
@artabra1019
@artabra1019 2 жыл бұрын
OMG tnx im trying to open csv file with million data then my pc collapse so i find some i9 computer with 16gb ram to open it thanks now i can open big files using pandas.
@TruthBomber42
@TruthBomber42 2 жыл бұрын
Is pickle better?
@WilliamDean127
@WilliamDean127 2 жыл бұрын
Still would load all data at one time
@wildchildhep
@wildchildhep 2 жыл бұрын
it works! thanks!
@vishkerai9229
@vishkerai9229 11 ай бұрын
is this faster than Dask?
@franklimmaciel
@franklimmaciel 10 ай бұрын
Thanks!
@ramaronin
@ramaronin 9 ай бұрын
brilliant!
@JuanCarlosMH
@JuanCarlosMH 2 жыл бұрын
Awesome!
@hkpeaks
@hkpeaks Жыл бұрын
Benchmark (Pandas vs Peaks vs Polars) kzbin.info/www/bejne/Z3zRZ2lrdqmGmc0
@RidingWithGerdas
@RidingWithGerdas 2 жыл бұрын
Or with really huge datasets, use Koalas, interface is pretty much the same as pandas
@Zonno5
@Zonno5 2 жыл бұрын
Provided you have access to scalable compute clusters. Recently Spark got a pandas API so koalas has sort of become unnecessary for that purpose.
@RidingWithGerdas
@RidingWithGerdas 2 жыл бұрын
@@Zonno5 talking about pyspark?
@imclowdy
@imclowdy 2 жыл бұрын
Awesome! First comment :D
@ashraf_isb
@ashraf_isb 9 ай бұрын
1000th like 😀
@pasqualegu
@pasqualegu 2 жыл бұрын
all workеd
@driouichelmahdi
@driouichelmahdi 2 жыл бұрын
Thank you
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 272 М.
How to work with big data files (5gb+) in Python Pandas!
11:20
TechTrek by Keith Galli
Рет қаралды 42 М.
GIANT Gummy Worm #shorts
0:42
Mr DegrEE
Рет қаралды 152 МЛН
БАБУШКА ШАРИТ #shorts
0:16
Паша Осадчий
Рет қаралды 4,1 МЛН
15 POWERFUL Python Libraries You Should Be Using
22:31
ArjanCodes
Рет қаралды 68 М.
Python dataclasses will save you HOURS, also featuring attrs
8:50
5 Python Libraries You Should Know in 2025!
22:30
Keith Galli
Рет қаралды 87 М.
Solving 100 Python Pandas Problems! (from easy to very difficult)
5:20:18
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Doug Mercer
Рет қаралды 222 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 280 М.
Intro to Python Dask: Easy Big Data Analytics with Pandas!
20:31
Bryan Cafferky
Рет қаралды 15 М.
Extract PDF Content with Python
13:15
NeuralNine
Рет қаралды 236 М.