Another excellent video! I've been using Pandas a lot lately,, but should try this, given that most datasets have a few million rows and a few hundred columns. Given that my colleagues have variable laptop specs, some with an nvidia gpu, some without, then I guess the trick is to build in a check to see which engine to use.... oh what fun!
@PythonSimplifiedАй бұрын
Excellent idea and thank you so much for your comment!!! 😀😀😀 If you're using a notebook interface you can run a check along the lines of: is_gpu = !nvidia-smi | head if is_gpu[0] == "/bin/bash: line 1: nvidia-smi: command not found": print("GPU not available") else: print("GPU is available") It is probably not a perfect solution, because it: A. works for Linux only given the hardcoded "/bin/bash". B. may not be able to detect Nvidia based GPUs that are not CUDA capable (there's not too many of them, but still...) If you're not using a notebook, you can try to substitute it with: import os is_gpu = os.system('nvidia-smi | head') Thanks again, dear, and have an incredible day! 😉
@cerealport2726Ай бұрын
@@PythonSimplified Ahh, this is so cool Mariya, thanks very much!!! I will give it a go when I'm back at the office tomorrow. My GPU sits around doing very little, generally, so I think it's about time it earned its keep. It's possible some of my colleagues have old clunker GPUs without CUDA (...bar a CUDA...? sorry, I'll see myself out), but that's pretty much on them to solve as they shouldn't have company computers older than 5 years anyway.
@katrinabryceАй бұрын
@@cerealport2726 If it is an Intel or AMD GPU, then it won't have CUDA. I do pretty much all my stuff remotely using JupyterHub, so on the laptop, or even iPad, I just need a web browser and internet connection. Having said that, the iPad is about twice as fast as my Threadripper on single-threaded CPU workloads.
@willy_fonkerАй бұрын
Awesome ❤..... It's always a learning experience with you around 📝📝
@Kaffeejunk1eАй бұрын
i switched after 24 years of windows to linux. with that, i started with some bash scripting and building a veeeeeery simple gui in python to manage them. my next step ist going to dive into pandas, maybe now polars and especially xlsxwriter. we use excel in office so i have to import, transform and manipulate data sets and write them into new files. for now i am doing that stuff in vba and power query. we soon get the excel version including python. because i want to do more with python in general and move away from MS exclusive language. well, i guess, i found my teacher for the transforming part. i am not an english native and often struggling to understand the spoken word on other yt vids. but with you i am totaly fine and hell, i love your voice:)
@solomonmegerssa612229 күн бұрын
You always amaze me. I have learned a lot from you. Keep up the good work!
@WhizBackАй бұрын
Hi from YVR! Great content. You do make learning engaging, useful, and fun. Thanks so much. oh, and the production quality of your videos is amazing as well....
@PythonSimplifiedАй бұрын
Hello from Mission, neighbor!! 😀😀😀 Thank you so much for your incredible comment and feedback!! I'm super happy you enjoy my tutorials, and I wish you lots of sunshine, frequent vacations in Whistler, and many more Canucks wins! 🍁
@WhizBackАй бұрын
Hehe... you're very welcome! And the same to you... hopefully we're done with atmospheric rivers for this year...
27 күн бұрын
What an absolutely amazing voice 👌and a great tutorial 🚀
@RN-er7mzАй бұрын
Very informative, thanks for the video!
@PythonSimplifiedАй бұрын
Absolutley! Thank you for the comment and enjoy! :)
@sajithjames4692Ай бұрын
Awesome work! Loved your tutorial
@PythonSimplifiedАй бұрын
Thank you so much!!! 😀
@boogie.deveraАй бұрын
Brilliant video. Thanks
@PythonSimplifiedАй бұрын
Thank you so much!!! Glad you liked it! 😀
@rafaeloliveira3029Ай бұрын
Awesome! You're great on your explanations ;)
@PythonSimplifiedАй бұрын
Thank you so much! glad you enjoyed it! :)
@katrinabryceАй бұрын
Just did some benchmarking on a project I'm working on at the moment, 6.5m rows of data I have a Threadripper Pro 3945WX with 512GB RAM and an RTX 3080Ti WSL with GPU - 17.4s WSL with CPU - 4.2s Debian with CPU - 1.69s FreeBSD with CPU (CUDA isn't available on FreeBSD) - 0.6s
@PythonSimplifiedАй бұрын
Thank you so much for sharing the benchmarks, Katrina! It sure looks like the CPU takes the lead in smaller scale datasets! Also - I had no idea that WSL might be a limiting factor... it would be interesting to try this GPU/CPU comparison on a Linux system! 😀 In terms of dataset size, one of the feedback points I got on this tutorial was that GPU Polars is at peak performance on Parquet datasets between 8-10 GB (the one I used is only 4GB but has over 260M entries). So in theory, the closer we get to 8GB or 550M entries, the better the GPU will perform over CPU. Thanks again for the incredible input! 🙂🙂🙂
@balintlaszlo1999Ай бұрын
hello for everyone .... I'm very happy to see you're tutorial and you must be a teacher because you explain very well.
@PythonSimplifiedАй бұрын
Thank you so much for your comment! Glad you enjoyed my video! 🙂
@trollenzАй бұрын
Wow ! Very useful, thanks a lot 👌🏻 Great, great content, as always.
@PythonSimplifiedАй бұрын
Thank you so much! Glad you liked it! :)
@medellinszАй бұрын
Superb explanation, I´d like you to make more videos using Polars, Greetings from Mexico!
@rembautimes8808Ай бұрын
Really great tutorial and I just joined as a sub. I used to work in a small bank 🏦 😂 , good line of insight . Btw the way Tx data analysis is usually for fraud or money laundering , but you added a dimension on customer behaviour which is good as well
@PythonSimplifiedАй бұрын
Thank you so much for the feedback! Super happy you enjoyed my tutorial and welcome aboard!! 😀😀😀 I don't know if the Simulated Transactions dataset is suitable for fraud detection use cases. I imagine I would like to know the interface used for the transaction, the time of day, the payee, etc. But it's definitely a great point! It would be a much more meaningful investigation 😉
@Alexandros.Alexandros777Ай бұрын
Мариичка, Ты просто умница! Ты лучший программист в мире! Спасибо Тебе за Твой канал и видео тьюторинги. Надеюсь, когда-то я дорасту до Твоих 30 процентов 😅.
@Ilya-iu5ihАй бұрын
AWESSOME! I need more of Maria and Data Analysis!
@HadiPirhosseinlouАй бұрын
I was waiting for this tutorial
@PythonSimplifiedАй бұрын
Awesome! I hope you like it! 😀
@vasylpavuk391Ай бұрын
Finally, the Polars :)
@ccnfrankbrАй бұрын
excelent!! thanks a lot !!
@katrinabryceАй бұрын
I would be tempted to use a log scale for Amount in that chart if I wanted to see the smaller values in more detail.
@PythonSimplifiedАй бұрын
That's an awesome idea, Katrina!!! I'll give it a try on my end, it sounds like it might be more efficient than casting!! 😀😀😀
@mnaliАй бұрын
Great tips, thank you.
@PythonSimplifiedАй бұрын
Thank you so much! Great South Park avatar! 🙂
@CaribouDataScienceАй бұрын
Polars wants to be Tidyverse when it grows up😮
@PythonSimplifiedАй бұрын
I'm not really a big fan of R, it doesn't feel as intuitive as Python to me... 🙂
@sanjeevrai42467 сағат бұрын
It was quite helpful video. I was unable to run it on gpu earlier but thanks to your video, everything worked just fine. So, you got one more subscriber added just now. One quick question, what GPU are you using since mine is returning memoryerror (using rtx 3080 10G VRAM). Tried using streaming=True but that reverted to CPU engine.
@Mstislav_Efimov27 күн бұрын
You are PythonnGod!
@patekreol974Ай бұрын
j'adore vos vidéos !
@PythonSimplifiedАй бұрын
merci beaucoup! :)
@zilph82Ай бұрын
Amazing library, thanks for sharing your great knowledge and insights. Do you think Polar will replace many of the tasks one used to do with Sql?
@tmb8807Ай бұрын
Different use cases really. As long as relational databases exist, so will SQL. But for working with file-based data on a single machine I don’t think there’s anything better than Polars at the moment.
@PythonSimplifiedАй бұрын
At this point, it's safe to assume that nothing will replace anything else! Folks have been lamenting PHP for ages, and it's still here! 😅😅😅 hahaha With that said, Polars is definitely more SQL-ish than Pandas! it's a full-blown query engine and its' method names reflect SQL statements like SELECT, GROUP BY, CAST, etc. Other than that, I'm glad there's a variety of solutions which always results in healthy competition and continuous innovation! 😉
@zilph82Ай бұрын
@@PythonSimplified Very good explanation.. Yeah.. for many tasks SQL is still the tool best suited for, but after learning what you shared with us about Polars, some tasks I used to do in SQL for filtering and grouping the information I needed for my dataset and that Pandas wasn't that powerful enough seem to be able to be performed with Polars + Parquette files in object storage like AWS S3, similar to having a DataLake with big files that can do things similar to SQL+ Datawarehouse, specially if the final task is a huge dataset for visualization in Python or ML Trainning with Tensorflow.
@orlaede28 күн бұрын
Super vídeo. Tanks.
@alexanderzikal7244Ай бұрын
Thank You, very powerful Solution! 2 Questions: GPU is working with Apple Silicon? Is the same also possible with the origin Polars in Rust?
@PythonSimplifiedАй бұрын
Hi Alexander! 😀 At this point in time, the Polars GPU engine is only available with Python (it's super new, so it might change in the future). The engine is also based on RAPIDS cuDF, which is proprietary to Nvidia hardware. But the good news is - you can access a free compatible GPU on Google Colab! You can even upgrade it there to an H100 beast! 😉
@solistoguitarАй бұрын
thank you ❤
@PythonSimplifiedАй бұрын
You're welcome! Enjoy!! 😀
@78wesley67Ай бұрын
Hi, lovely video. Have you already made a video about the "uv" package manager ? If not can you do it?
@thyche7866Ай бұрын
i guess for installation it need python > 3.9 ... cause up until now mine at 3.9 cant install it.. need some help
@PythonSimplifiedАй бұрын
The Polars GPU engine requires Python 3.11 and up, but you can install any Python version you'd like if you use Anaconda (just like I've used in this tutorial). Please checkout my introduction to Anaconda tutorial here for more info: kzbin.info/www/bejne/g4a9pYl7ebCLqc0 Good luck! :)
@thyche7866Ай бұрын
@@PythonSimplified there u go thank you .. ill try later
@karag4487Ай бұрын
Can you please make more tutorials on Polars pleaseeee?
@PythonSimplifiedАй бұрын
What Polars topics/use cases would you like to learn? 🙂
@karag4487Ай бұрын
@@PythonSimplified more into data manipulation, more methods or anything to give us a competitive edge at interviews 😭
@PythonSimplifiedАй бұрын
Interesting!! You're the second dev who mentions Polars in the context of job interviews! It looks like it's becoming more and more popular in the workplace, so I'll definitely have another look at a follow-up project! But I must admit, it also depends on how many viewers will find this tutorial helpful, which we will see in the next few weeks😉
@karag4487Ай бұрын
@@PythonSimplified thank you so much!!! 🙏🙏🙏😭😭😭
@thunde7226Ай бұрын
oooooooooooooooh, the boom lady, So cute---> she got the Polars Tshirt..................................:) bye
@Rohittarkar175Ай бұрын
I am missing that line "We don't use AI we make AI" 😅
@PythonSimplifiedАй бұрын
Hahaha I'll try to incorporate it in my upcoming tutorial, but can't make any promises at the moment! 😅
@nikluz3807Ай бұрын
7:12 data.sink_parquet() took my PC about 7 mins to process, and I have an RTX 4090 with this CPU 13th Gen Intel(R) Core(TM) i9-13900KF Base speed: 3.00 GHz Sockets: 1 Cores: 24 Logical processors: 32 Virtualization: Enabled L1 cache: 2.1 MB L2 cache: 32.0 MB L3 cache: 36.0 MB Utilization 2% Speed 1.82 GHz Up time 16:01:51:30 Processes 290 Threads 8057 Handles 195262 and I have 64GB of DDR5 RAM. so BE WARNED. Mariya, when the jupyter lab notebook is launched, it should automatically be configured to use the gpu correct? I noticed about 60% CPU usage and about 0-23% GPU usage during the sink_parquet method.
@PythonSimplifiedАй бұрын
Great choice, Nik! Very similar to my laptop, just a slightly different CPU! (i9-13980HX) 😃 In terms of the GPU, it only kicks in when you call the .collect(engine="gpu") commmand. So when you call sink_parquet(), I believe the GPU shouldn't even be involved! I'm also a bit surprised that it takes 7 minutes, as we have very similar specs, and on my end, it's about a minute. Is there a chance you're using Windows? 🤔 (The GPU stuff is optimized for WSL/Linux only, that might explain the issues)
@nikluz3807Ай бұрын
@@PythonSimplified I’m on windows 11 and using WSL, I also noticed that my RAM usage went up by 20 GB during the process but that makes sense because the original CSV file is 20 GB. By the way, GPTdump is my x account. :)
@man0883927 күн бұрын
Why not new posts on Instagram account?
@DanishAlam-lp2cqАй бұрын
How can we integrate this in ML.. Do polars support ML or its bets for EDA
@OkyCapriattoАй бұрын
Great content
@christopherc452622 күн бұрын
Well done.
@sriramsriram924627 күн бұрын
how many seconds did it take for the 1 million?
@taylormccoy7492Ай бұрын
Hey great video! I ran into an issue using the wget copy paste in google colab. Just returned errors for me for some reason. Just wanted to let you know! I did find a workaround.
@PythonSimplifiedАй бұрын
Hi Taylor, thank you so much for the feedback!! 😀 Do you mind sharing what error you got and how you were able to solve it? Maybe you can help other folks who struggle with the same issue 😉 Cheers, and thanks again!
@taylormccoy7492Ай бұрын
Sure! So when I sent the wget request I got a 400 and 404 error (with trying directly copy/pasting and copying the full url respectively.) I solved this by plugging in the full file url directly into my browser and downloading directly onto my computer, the uploading it into my google drive (only took a couple minutes.) Then you use the pl.scan_parquet(*insert file path here*) method to get the data. Also make sure you have already mounted your google drive!
@taylormccoy7492Ай бұрын
I also ran into a second issue where for some reason at the end I couldn't plot because it kept saying my altair packages was out of date for polars to use, even though I directly upgraded both pip and my altair package in the google colab env. I don't know if that's fixable. I replicated locally using vs code and it worked just fine so, just letting anyone know who ran into that one too.
@PythonSimplifiedАй бұрын
Yeyy!!! Thank you so much for sharing your solution, dear!! 😀😀😀 Thanks so much for catching it and taking the time to help others!!! :)
@lunstra_studios21 күн бұрын
new intro I see,, but always the batman ...
@rofsjan16 күн бұрын
Thank you. Interesting. Btw, Gandalf is 55023 years old.
@yeray-g9bАй бұрын
GPU❤
@ericmunschi4655Ай бұрын
Thanks for rhis new videos. If i inderstand, polar is able to process huge data at high speed. But comparing to pandas, for raisonnable amont of data, is it reel advantages to learn and use polar ? Maybe some fonctionnality does't exist on pandas? And thanks for you job , Mariya 👍
@simankim17 күн бұрын
It always blows my mind how someone with English as their second language can explain something better than a native speaker.
@toulasanthaАй бұрын
Wow amazing
@LIFE-gm1onАй бұрын
HI1 it's very interesting
@ValnuratАй бұрын
I want to marry your brain. Thank you for an awesome video. ❤
@markhouАй бұрын
I still prefer dask
@PythonSimplifiedАй бұрын
I've heard lots of good things about Dask! I believe Polars is a bit faster and since it has a bear as a logo - nobody can compete!!! 😁😁😁 hahahaha
@ChandrashekarCNАй бұрын
💖💖💖💖
@DaesooLee-l4hАй бұрын
damm that's fast.
@nikluz3807Ай бұрын
good lord your hair is so flowy
@PythonSimplifiedАй бұрын
Hair compliments are wonderful but what do you think about Polars???😁😁😁 hahahaha
@billyblackburn864Ай бұрын
well, cuDF might be faster in some regards but I see what you mean. polars is not for gpu
@PythonSimplifiedАй бұрын
What do you mean it's not for GPU? Polars officially released a new GPU engine earlier this month and we're using it pretty much everywhere in the video 😀
@billyblackburn864Ай бұрын
@@PythonSimplified omg, I didnt know they did that. thats really cool. i love csv files!!!, im sorry for ever doubting you
@PythonSimplifiedАй бұрын
No worries!!! please feel free to doubt me anytime! 😉 (I often deserve it hahaha)
@djangoworldwide792511 күн бұрын
One xannot winder why tou wouldnt reference to publix documentation, as if these are the only options. You basically gave apples, instead of teaching how to harvest and grow the tree
@PythonSimplified11 күн бұрын
Polars docs are written with simple non-technical language, providing plenty of code examples both in Python and Rust. You don't need video training to read them, just have a look yourself and if you're still struggling after - let me know what specific function/method you need help with.
@haraldhacker10 сағат бұрын
Python SIMP-lified
@_-Skeptic-_Ай бұрын
Now do the same using SQL and C#. I wish your ability to explain clearly was not wasted on how to use libraries. What about C++ that created python and those who wrote these libraries? can't find any content on that.
@PythonSimplifiedАй бұрын
I wouldn't think that using C++ or C# is a practical solution for data scientists, especially with how fast Polars/DuckDB/cuDF operate nowadays 😀 You probably can't find content because it's not a common way to deal with oversized datasets. Is there a reason why you're avoiding Python for these tasks?
@_-Skeptic-_Ай бұрын
@@PythonSimplified I'm talking in general not data analysis in particular, even in python, I would love seeing people showing how these libraries are created rather than import this pip that and this is how you do it.
@_-Skeptic-_Ай бұрын
@@PythonSimplified this was also a compliment on how good you are in explaining
@PythonSimplifiedАй бұрын
Aha!! I got ya! I actually started working on a video of that kind (create your own Python library and upload to pip), loaded a bunch of it on GitHub - and then moved to cover other topics before finishing it 😅😅😅 It will definitely be released sometime in the future, just not with a C based language, but with good old Python! 😉 (it's a bit challenging to release non Python videos on a channel named Python Simplified hahahaha) Thank you so much for you comments and compliments, and have a fantastic day! 😀
@_-Skeptic-_Ай бұрын
@@PythonSimplified looking forward for that video. I started adding python to my arsenal, and came across your videos, as I said all the topics are explained very well and clear .