I'm a simple engineering student and a modest python user but those dynamic histograms sent chills down my spine
@mCoding2 жыл бұрын
Check out plotly and the source code in the description!
@aarondewindt3 жыл бұрын
The buildin dataclass also has default_factory for defining default mutable values .
@mCoding3 жыл бұрын
😬 oops, thanks for pointing this out! I should have been more careful when I made the feature matrix.
@PeterZaitcev Жыл бұрын
Furthermore, unlike slots support, this was on the release.
@mpilosov3 жыл бұрын
This is a great breakdown. I’ve had to explain this so many times to team members, now I’ll refer people to this video!
@franchiniitalo3 жыл бұрын
Hey James, I just wanted to sincerely congratulate you for both the quality content and humor in your videos, amazing work!!
@mCoding3 жыл бұрын
Thank you very much for your kind words and support!
@markasiala63553 жыл бұрын
I actually have a large ongoing project where I used namedtuples early on, with typing stored in a second tuple, then refactored to NamedTuple using the built in typing (which simplified storing the typing separately), and finally to dataclasses after seeing your video on that. It fit my application perfectly as I needed the flexibility of being able to modify the dataclass. If only I had known about dataclasses to start with. :) attr class also sounds interesting for my needs, I need to check that out.
@subjekt5577 Жыл бұрын
I wish he covered classes extending from named tuple, one of my favorite pre attr methods....
@laurinneff43043 жыл бұрын
So when _are_ you going to explain slots? I have no idea what those are
@mCoding3 жыл бұрын
Gulp, I feel the pressure.
@AzureCz3 жыл бұрын
@@mCoding yeah, I don't know what you're talking about either D:
@cameronball39983 жыл бұрын
That was the Google search I made right after this video 😂 I am intrigued
@Elijah_Lopez3 жыл бұрын
Classes usually use a dictionary to store variable. If you define a ___slots___ = 'var1', 'var2', you're class can only set those attributes to value mentioned in slots.
@AzureCz3 жыл бұрын
@@Elijah_Lopez sir you're a legend
@sphereron Жыл бұрын
I've often struggled with ways people define hyperparameters and inputs to neural networks in open source code. This video definitely helped me in my choice going forward.
@Jakub1989YTb3 жыл бұрын
2:06 got me .. "real life". Those air quotes are heavy.
@kosmonautofficial296 Жыл бұрын
Great thanks so much for this video! I am starting to study pydantic and I haven't been made aware of these differences. This is a huge help and I wish more people would explain these important differences when telling others they should use this or that.
@r2_rho2 жыл бұрын
this is really the best Python channel on KZbin. I've learned more on this channel than all others combined
@mCoding2 жыл бұрын
Wow thank you!
@Aang1392 жыл бұрын
Also would have loved thoughts on TypedDict which mirrors NamedTuple for dictionaries, giving type hinting and string key checking
@Yotanido3 жыл бұрын
I've used most of these, it turns out. Started with the class that repeats everything. I then used the dict to try and make things slightly more convenient, but that was only feasible in very limited circumstances. Then I discovered named tuples, but... they are tuples. Wasn't a huge fan. Then, finally, I came across attr. That was a huge revelation and I absolutely loved it. Finally something decent. And then dataclasses were introduced to the standard library and I basically switched to using those. attr can do more, sure - but the dataclasses are easier to use and don't need the dependency. Unless I actually need the power of attr, I'll just use these.
@PanduPoluan2 жыл бұрын
Depends on the data, tuples can be very suitable. For instance, I have to consume a YAML file containing a HUGE sequence of geo-coordinates (lat/long). For these kind of data, the kind that you read, keep in memory, and must not change, tuples are perfectly suitable, uses less memory, and very fast. And NamedTuple, just like other classes, can have methods defined within. So for instance I can write a distance_to() method which will calculate the big circle distance between one geo-coordinate with another geo-coordinate. If you need mutability, though, of course tuple just won't cut it.
@jemand7713 жыл бұрын
I really enjoyed the text comment/annotation overlays in this video. they both add useful background info and give the video a more relaxed vibe without distracting from the main points! :D
@PanduPoluan2 жыл бұрын
Basically, one very strong rule of thumb is: If you need immutability and you can validate the data on your own, NamedTuple will _always_ be the best, hands down.
@zacky78623 жыл бұрын
Yeah pyndatic is so great for parsing/Serializing Json data. I've been using it. But for simple data, I use built in dataclass
@hackergr3253 жыл бұрын
At first you got my interest, after the "Presenting with meaningless example" you got my attention. Awesome video once again!
@GRAYgauss3 жыл бұрын
Type hint gang. I came from a C background, so ducktyping felt like a Godsend. Then I got into rust and realized how much time I was spending debugging code because it was ducktypable. (lets not forget rust's awesome toolchain compared to python's...well yeah.) Hell, I was using i_var, etc just because it made it easier to reason about and not have to backtrack, which is when I first started wondering about it...Didn't fully click until I made the switch though.
@danielrhouck3 жыл бұрын
Iʼm starting a new Python project and Iʼm using `attrs` because of this video. Otherwise I would have used `namedtuple`, because I think without your videos I somehow would have missed even `dataclas`.
@FranciscoCorreaDias3 жыл бұрын
1:24 "Will I ever explain slots?" One week later... Thank you so much for your explanations, James!
@doc0core2 жыл бұрын
This is serious pro stuff. I started using dataclass thanks to your vid, then YT pushed another vid for pydantic and I was like bleh. Luckily this vid set me striaght. Now I understand each's use case. THANKS
@mCoding2 жыл бұрын
Glad it helped!
@Mutual_Information3 жыл бұрын
I do not use data classes nearly enough. This is good motivation to change that.
@falxie_3 жыл бұрын
Really glad to see slots supported in dataclasses now. When you have a lot of instances of one class slots can save a ton of memory
@mishikookropiridze3 жыл бұрын
This was added in 3.10?
@falxie_3 жыл бұрын
@@mishikookropiridze That's more of a statement than a question isn't it
@mishikookropiridze3 жыл бұрын
@@falxie_ It is statement and hence you can assign boolean value.
@jlp2011 Жыл бұрын
Pydantic 2.0’s just out, built around a Rust core. They claim up to 50x perf improvement so some of this might be changed. Still, kudos for covering v1’s overhead.
@mCoding Жыл бұрын
Great point! Maybe ill have to do an update video!
@arkadiuszszydeko7264 Жыл бұрын
@@mCoding Looking forward to see how does it compare to what you presented here :)
@fartzy2 жыл бұрын
Wow this is amazing man thanks for putting this together
@jochengietzen3 жыл бұрын
Please keep the onscreen comments coming! Adds the perfect amount of fun to an informative topic "cries in mypy" 😁 Great video, thanks 😊
@pedrokalil4410 Жыл бұрын
I am the owner of a backend project at my company and i use only pydantic, as we perform multiple api calls the validations are essential, and it integrates really well with fastapi
@Scranny3 жыл бұрын
I have used almost all of these, so I can say this is a fantastic summary of the various options.
@tudor18994 ай бұрын
Best Python channel on KZbin. Thank you. If you used neovim too, it would be out of this world 😅
@mCoding4 ай бұрын
Thank you! Not a neovim user although vim was my main editor for a while there.
@maimee12 жыл бұрын
4:36 There's TypedDict to consider too tho. (As in the type safety thing. You could type both dict and tuple and use a static type checker. If you use PyCharm and single quotes, accessing data by key is also not typo prone.)
@PanduPoluan2 жыл бұрын
Why single quotes? There's no difference between single quotes and double quotes.
@maimee12 жыл бұрын
@@PanduPoluan Idk, ask PyCharm (and also VS Code I just found out) out. Too clarify: not typo prone => there's IntelliSense / auto completion.
@PanduPoluan Жыл бұрын
@@maimee1 well personally I don't find any difference between using single quotes and double quotes. But then again I always use double quotes because Black enforces that.
@MrLiuHai2 жыл бұрын
Thx for the explanation! But it seems at this point Python is contradicting its own Zen: "There should be one-- and preferably only one --obvious way to do it." IMHO one should always prefer immutability. The diff between creating a new instance and setter could be ignored. If the performance is that critical, maybe one shouldn't choose Python at the first place.
@relsunkaev3 жыл бұрын
The apischema package is a good middle ground between Pydantic and dataclasses. It allows you to do the same runtime validation on dataclasses if you need to and has the same features as well as a GraphQL schema generator. It also performs validation faster than Pydantic.
@mCoding3 жыл бұрын
Never used that one, thanks foe llr sharing!
@MrBoubource3 жыл бұрын
Wonderful last seconds, but wonderful video too!
@tamles9373 жыл бұрын
Great video! As always, the topic is well explained and I learnt something new The on-screen comments are really fun, I hope you'll put more of this in the future videos!
@Azzonith3 жыл бұрын
Great stuff! That would be even better if you'll make a follow up video about serialization of those objects and libs that can help. Often it's required to send tuple/datacalss/etc data over Kafka, to a DB or save as json and etc. Include 'marshmallow' lib in the vid as well!
@PythonisLove3 жыл бұрын
your videos are always the best
@mikegazes Жыл бұрын
Thanks! This is exactly what I needed.
@soberhippie3 жыл бұрын
Creating a new tuple still looks just as fast as modifying a value in a dict, interesting
@mCoding3 жыл бұрын
Yeah that was the biggest surprise for me, but I guess it kinda makes sense since a tuple can be implemented as a thin wrapper around raw memory, but a dict has to do hashing and such.
@NateROCKS1123 жыл бұрын
However, you'll likely end up needing to get the tuple's values in order to instantiate a new one. So performing a function similar to dict setattr would be at a significant cost.
@SvetlinTotev3 жыл бұрын
A few arguments for dict gang: Everybody knows how it works and what the syntax is. Many libraries use it as inputs or outputs. If used as an interface it is easy to change it without breaking things. It is trivial to load and store them in json or send them over the network. I generally don't like speed comparisons of python code since that should never be the bottleneck of your program (you are using the wrong language if it is) but it is nice to know that dicts are fast af. I also haven't had any problems with reliability. But I guess that's partially due to my vscode extensions checking what I'm typing and giving suggestions. But I have to agree with you that the syntax is quite ugly compared to accessing elements with a dot. Though I don't think it would be too bad if the language added similar syntax for that (basically any kind of shorthand for ["string"]. But I guess . would be ambiguous. and most other characters already have a meaning. so maybe a double dot? some_dict..some_element)
@Alex-uh6qh3 жыл бұрын
The problem of dicts is that you cannot check types in compile time. You can store different data types in one field due all runtime. By the way, static analyzers cannot predict the type of element of dict, so other IDEs (like PyCharm) cannot help you with suggestions, especially with available methods for each field. In your IDE, you use an AI-based extensions that predicts data types, but it is not a static analyzer
@masheroz3 жыл бұрын
This is timely. I've got a program in writing now, and am using dictionaries. I still think that formatting my data as nested dictionaries is the best representation of that data. Also, the original data format is actually defined as a dictionary.
@SvetlinTotev3 жыл бұрын
@@Alex-uh6qh This is true, but I think by picking python as your programming language you've already given up on being able to easily track the types of objects. With all the type hinting and other type-related stuff you are still quite far from the type information you have in languages like C++.
@oxey_3 жыл бұрын
I feel like I should go to casinos more often because I have no idea what slots are :) Great video! Typehint gang
@ripp_2 жыл бұрын
I think in the past, because I've been lazy, I've used tuples but, because I don't hate myself, I had constants for which index was which. I don't recommend this but that would give you the speed power of tuples with some of the naming power of namedtuple
@brooklyngerma56742 ай бұрын
Hi mCoding, this is a pretty good video, it would be really cool if you could maybe do an updated version? Specifcally I would like to see the addition of TypedDict.
@aaronm66753 жыл бұрын
Already know this is gonna be helpful and instructive!
@mrgeniasworld43743 жыл бұрын
For sure 🙂
@mrtnsnp3 жыл бұрын
OK, on type hints. I probably need a kick in the you-know-where, but I can't get it to play nicely with a few packages and features I need. I frequently use numpy, and a lot of funtions don't really care about receiving a single number or a full array of numbers. They may not even care if the number is a float or an int, but let's focus on floats here. The return value typically has the same shape as the main input, but may be a single number. How do I set up type hinting for numpy arrays? How do I set type hinting up for polymorphism?
@wsrgs43 жыл бұрын
I haven't looked into it extensively, but I'm aware there is a numpy.typing module which includes an ArrayLike type for anything that can be converted into an array, including scalars. you might want to look into the module documentation. specifying the dimensions of an array in python's type hinting system is generally difficult however, so I'm not sure there's a way to incorporate that information in your annotations.
@PanduPoluan2 жыл бұрын
Use TypeVar. For instance, here's a made up function: T = TypeVar("T") def makelist(n: int, item: T) -> list[T]: return [item for _ in range(n)]
@ManuelBTC213 жыл бұрын
If you care about correctness, I would argue for NamedTuple. The fact that it's immutable is a feature, not a bug.
@mCoding3 жыл бұрын
Immutabulity is definitely a feature, but mutability is also a feature. As always you should choose based on what is most appropriate for your problem.
@mystisification3 жыл бұрын
Very informative video, thanks James!
@adirmazhir91593 жыл бұрын
its also possible to use namedtuple like this: T = namedtuple('T', 'n f s')
@LerikPav3 жыл бұрын
There's also TypedDict (since 3.8) with typesafety
@mCoding3 жыл бұрын
TypedDict is actually just a dict at runtime, it's value is only for static typing.
@noteverymonday3 жыл бұрын
"As well as BaseModel, pydantic provides a dataclass decorator which creates (almost) vanilla python dataclasses with input data parsing and validation."
@tamerelsayed63682 жыл бұрын
thank you for the thorough explanation
@hansdietrich1496 Жыл бұрын
Good comparison, thanks!
@TechSY7302 жыл бұрын
In current versions of attrs, it only requires assigning fields to an `attrs.ib` if you need anything per field option beyond a default. Otherwise you can use regular variable declarations like dataclasses does. (You might need to use the "next-gen" API, I can't remember at the moment)
@rdean1502 жыл бұрын
I started using pydantic bc it allows specifying a conversion function to try to cast input values to the desired type. I didn't realize how much of a performance hit that library incurs, or that attrs can do this also but much more cheaply. I guess I should switch to attrs.
@mCoding2 жыл бұрын
I didn't specifically compare times for when you are doing conversions. Make sure to time your use case yourself since Pydantic may still be faster if you are doing conversions.
@rdean1502 жыл бұрын
@@mCoding Ah, thanks for the heads up. That probably accounts for a decent chunk of the time difference, as I think pydantic is always going to try to do basic type casting on all values when instantiating new instances, which surely comes with some overhead, particularly when you supply a custom function for it.
@sevdalink6676 Жыл бұрын
For me Pydantic is great for prototyping and the losses are acceptable for the sake of being always in detail informed about data errors. It even enables you to to skip writing early tests because of that. Still the charts are extremely useful to show that Pydantic can be an important target in optimization.
@mCoding Жыл бұрын
An excellent point. This is Python after all, raw speed is not usually what we optimize for and paying some extra runtime cost for data validation when it "shouldn't" be needed may be worth it depending on the situation.
@heroe1486 Жыл бұрын
Is it tho ? 5 microseconds for creation, 9 and 400 ns for getting and setting, and it was before pydantic v2 enhanced by rust. Unless doing several thousands of those are we really concerned about those numbers in python ? Especially when writing an API where the network latency and DB queries could easily reach the 100ms mark in good conditions.
@sevdalink6676 Жыл бұрын
@@heroe1486 I agree that it would be great to see this video with Pydantic V2 performance included. They made amazing progress. I agree with the rest you said as well. You asked and answered you question. Like I said, it can be an important part, not everywhere, but it's good to have it on your checklist.
@12nites2 жыл бұрын
man, you really hammered down on this issue. No need to watch anything else.
@ananzero87513 жыл бұрын
What library was used to generate the graph? It looks nice.
@the_crypter3 жыл бұрын
Plotly, It's easily the most interactive Visualization Library. It's as simple as matplotlib.
@mCoding3 жыл бұрын
Yep, plotly express specifically. Check out the code on github! Link in desc.
@saadisave3 жыл бұрын
@@the_crypter That's a bad measure of simplicity
@etienneboutet71933 жыл бұрын
Great video ! But I feel like the onscreen comments were a bit distracting
@elnico5623 Жыл бұрын
I wish there was a channel like this for lua
@ОлегАндрус-ю5е2 жыл бұрын
thank you man! You helped me a lot!
@behnam_salehi2 жыл бұрын
Thank you. This information is really useful
@meneereenhoorn3 жыл бұрын
Great video, thanks so much!
@mCoding3 жыл бұрын
And thank you for watching!
@eniocc3 жыл бұрын
Perfect video. Congrats
@yky493 жыл бұрын
It is possible to use @dataclass(init=False) and custom __init__() for a parsing purpose. With slots for sure ;)
@khoda813 жыл бұрын
How did u measure memory usage?
@ilyam.18723 жыл бұрын
Yeah that's cool and whatnot, but have you ever tried this? class D(dict): __getattr__=dict.__getitem__; __setattr__=dict.__setitem__; __delattr__=dict.__delitem__
@mCoding3 жыл бұрын
Lol no i never considered that :)
@ilyam.18723 жыл бұрын
@@mCoding absolutely should, it's so easy and error-prone, practically a cheeseburger of python.
@scottbrewer4743 жыл бұрын
And here I was thinking I was fancy by bundling data into a dictionary vs lots of variables! (Stupid Dunning-Kruger effect)
@zachwhite27169 ай бұрын
Give yourself enough time and you’ll come back to the wisdom of simply using dictionaries instead of complex nested objects.
@Timmie_Tudor Жыл бұрын
Hello, if you didn't know, I decided to use the dataclass Python decorator as my handle
@mCoding Жыл бұрын
Haha you are gonna get a lot of accidental mentions with a handle like that!
@t2udu3 жыл бұрын
Really liked the visualization. Is that plotly?
@mCoding3 жыл бұрын
Yep! See the code to produce it on GitHub!
@t2udu3 жыл бұрын
@@mCoding will check that out
@hdtlab3 жыл бұрын
Still prefer dataclass since there is no need to install additional packages :)
@VegetableJuiceFTW2 жыл бұрын
would have been cool to compare pydantic with the validation turned off for fairness sake :D
@joshbennett59082 жыл бұрын
What tool are you using for your bar chart?
@mCoding2 жыл бұрын
Plotly express! It can export to html you can share in your browser without python even installed.
@red13emerald3 жыл бұрын
Awesome comparison! What did you create the interactive graph at the end with? Looks like a nicer version of matplotlib.
@aflous Жыл бұрын
Plotly
@florianfuchs3252 жыл бұрын
Hi Excellent Video! I was wondering what would be the right choice if I wanted to use the created class in a jit compiled numba function? As far as I have seen, namedtuples seem to be most suitable?
@PanduPoluan2 жыл бұрын
I think you need a class that is serializable. namedtuple and NamedTuple are serializable by default.
@Destrolll3 жыл бұрын
Please care to explain why shouldn't I assign attributes to an instance of an empty class? 4:50
@lex_darlog_fun3 жыл бұрын
@mCoding are you REALLY sure you've measured memory footprint correctly? What was your test methodology? The difference between NamedTuple/dataclass/class is supposed to be quite different from what you've shown (they do differ but not THAT much). According to this video (it's in russian, but code is clearly visible): youtube /tsEG0WM3m_M?t=60 : 1. The author uses pympler.asizeof() function instead of built-in ones since it's the only right way to measure *FULL* memory consumption of a given object. I personally re-tested it (generated a HUGE collections, taking literally gigabytes if RAM) - and yes, the built-in ones were returning some rediculous results, not even close to the actual RAM taken by python interpreter. 2. According to his tests, the difference is actually like this (on 1k instances): 2:05 - dict = ~ 1.2MB 3:44 - dataclass = ~ 1Mb 5:04 - namedtuple = ~ 720 Kb 5:54 - typed NamedTuple = also ~ 720 Kb
@mCoding3 жыл бұрын
It's hard to say whether the way I counted things is the "correct" way because it depends on what you wanted to count, but the numbers are approximately the same with pympler vs the getsize method I used. The order of which classes use the most memory is exactly the same with either method. The main difference between what pympler does vs what I did not do is that pympler tried to account for object alignment. pympler assumes that all Python objects are 8-byte aligned and no packing is done (hence why the pympler answers are all multiples of 8), counting padding bytes in the total size count. On the opposite end my getsize assumes all objects are optimally packed together, not including padding bytes in the total size. The truth is probably somewhere in the middle and also an implementation detail that could change at any moment. But, in any case, I wouldn't call either method the "correct" one, they are both good estimates and their difference is pretty small. Also note that depending on the way you do your tests the data can make a big difference in how much space is actually used. For example (1,1) uses less memory than (1,2) because the 1 objects in the first tuple are the same. pympler 0: dataclass (slots) - 168 bytes 1: plain class (slots) - 168 bytes 2: tuple - 176 bytes 3: NamedTuple - 176 bytes 4: namedtuple - 176 bytes 5: attr class (slots) - 176 bytes 6: dataclass - 432 bytes 7: plain class - 432 bytes 8: attr class - 432 bytes 9: dict - 512 bytes 10: SimpleNamespace - 552 bytes 11: pydantic - 560 bytes method i used in video 0: dataclass (slots) - 162 bytes 1: plain class (slots) - 162 bytes 2: tuple - 170 bytes 3: NamedTuple - 170 bytes 4: namedtuple - 170 bytes 5: attr class (slots) - 186 bytes 6: dataclass - 408 bytes 7: plain class - 408 bytes 8: attr class - 408 bytes 9: dict - 488 bytes 10: SimpleNamespace - 528 bytes 11: pydantic - 536 bytes
@lex_darlog_fun3 жыл бұрын
@@mCoding thanks for such a detailed responce. > For example (1,1) uses less memory than (1,2) Obviously, when you do performance tests, you need to intentionally break those under-the-hood optimisations. Back then, when I was checking myself examples from the forementioned video, I used the simplest values for items I could think of. iirc, each class (simple class, dataclass, dict, set, list, tuple and various types of named tuples) had just 3 values: 1. an int, unique for each item (and I know that int is internally optimised up to 256 or smth, but that's neglegable relative to the total number of items I had for test - iirc, it was about millions, tens of millions or smth of that matter). 2. the same int, converted to a string, padded with random ASCII characters to make all the strings of equal length (used random characters instead of zeroes - just to be sure). 3. a float in [0, 100.0] range - also unique for each item. And to be the most precise, as I said, I kept increasing the number of items until the total collection size reached above 1 Gb. Each measure attmpt was done in a separate python session. And that's the thing I'm intrested the most when I asked about your methodology. With your method - did you just create a single instance and measured it or you generated a big enough number of them, measured the total consumption and divided it by the number of items? I mean, a single item difference might be 168 bytes vs 162. But if you have a tuple with a million of dataclass instances vs the same tuple type storing the same million of items with the same underlying data, but items themselves are NamedTuples now, my results were very different from what you've shown. At the end of a day, it doesn't matter that each individual instance is reported about the same. What matters is when you have a ton of them, and the only varying factor is type of an item, you should count the total difference as overhead. You won't use just a single instance of that dataclass/namedtuple in your program. So I don't know the theory behind it, but in practice my own tests gave the same results that russian guy tells in the video. And dataclass vs NamedTuple were nowhere near 162 vs 170 numbers you provide. Speaking of which, I have no idea how it's even possible for dataclass to take less memory than a named tuple or even a simplest tuple. So, could you disclose your methodology? To be clear: I'm not attacking, I really want to know the actual difference in various types of data containers. I'm just concerned that the numbers you provide conflict with basically everything I ever heard on the subject and with my own synthetic tests.
@AleksandarLazov Жыл бұрын
Also dataclasses can be "frozen" so they are not modified, which to me is better than pydantic's BaseModel
@jmcantrell3 жыл бұрын
What are you using for the visualizations at the end?
@rikschaaf3 жыл бұрын
Can't you throw your python code through some optimizer to convert everything to a tuple wherever possible? Your source code would still be your own readable code, but the optimized code that comes from that will be more optimized for speed and memory usage. Best of both worlds!
@luisraguzzoni54093 жыл бұрын
Your videos are so good that I believe you could create a good intermediate-advanced python course. Just saying
@vekyll2 жыл бұрын
I'm a bit confused... do you have any idea why SimpleNamespace's get is so horribly slow? I mean, it's a hash lookup anyway.
@dylan-dylan-dylan Жыл бұрын
Accessing a dictionary's values by key is its primary purpose...it's only error-prone if you are ignorant to the pass-by rules of the value's type. #teamdict
@0730pleomax3 жыл бұрын
Pydantic, attr, dataclasses, NamedTuple
@alansnyder8448 Жыл бұрын
@mCoding. Could you redo this video with Pydantic 2.0? I get what you are saying about @dataclass being used in internal applications but sometimes you don't know for sure if it won't eventually be serialized into JSON, so pydantic is something I choose if I'm not sure. I want to know if the new 2.0 with Rust implementation has gotten the speed into the same ballpark as the other options.
@mCoding Жыл бұрын
Hmm, perhaps. While a rust implementation under the hood may improve performance, I suspect that it will not change the qualitative picture very much. Pydantic is slower primarily because it is fundamentally doing more work, namely validation and conversion, whereas the other options do neither validation nor conversion.
@alansnyder8448 Жыл бұрын
@@mCoding Maybe a good video might be how to use Dataclass and Pydantic together. I think in my case half of my projects are with FastAPI which I love and it depends on Pydantic. I've seen too many videos that compare Pydantic with Dataclasses (yours included) and have come to think of them in the same category. Since I'm already working with Pydantic in half my projects I've just gotten very comfortable with them. Knowing the performance hit puts a slightly different spin on the situation so maybe Dataclasses should be used for all internal-only data that won't be parsed. So then maybe just wrap a Dataclass in a field of a Pydantic class when you need to parse it. I'll keep this in mind myself in the future. Pydantic + Dataclasses would be an interesting video for me if you solicit ideas.
@plato4ek2 жыл бұрын
4:43 "SimpleNamespace is just like object, except it allows you to set attributes on it at runtime whereas object doesn't " Maybe I don't get something, but objects do allow you to set attributes (= instance variable, right?) at runtime: ```python-repl >>> class A: ... pass ... >>> a= A() >>> a >>> a.a Traceback (most recent call last): File "", line 1, in AttributeError: 'A' object has no attribute 'a' >>> a.a = 5 >>> a.a 5 ```
@mCoding2 жыл бұрын
Glad to see you are watching so many of my videos! Yes classes let you set attributes at runtime. What im referring to here is that objects whose type is literally object, as in "x = object()" cannot have attributes set on them. If you try "x.a = 0" you get an error!
@plato4ek2 жыл бұрын
@@mCoding OK, I see now. Thanks. Yes your videos are really interesting and useful. So I decided to watch them all. But I guess I should spend less time watching videos and more time writing code. :)
@jesavius3 жыл бұрын
Pydantic is always the answer. Since you can use the built-in dataclasses within Pydantic, if needed. But if you work in corporate, than yeah, dataclasses is the only answer 😅
@grzegorzryznar51012 жыл бұрын
@mCoding How do measure speed execution in a repative way? I was trying to measure performance, but for the same setup I had got scores differing a lot (more than few percentages). Code was purely in Pyhon, no external sources, no io, but still differences were very noticeable.
@mCoding2 жыл бұрын
For this video I believe I used timeit since they are tiny snippets, and the timing code is available in the github repository in the description. Timing measurements may vary drastically depending on things such as on your your cpu and version of Python, which is why it is always best to verify the timings for your own setup!
@PanduPoluan2 жыл бұрын
@@mCoding Also with Intel's franken-CPU having "P" cores and "E" cores, it will be a gamble.
@chriskeo3922 жыл бұрын
What is the use case for slots?
@deekshantwadhwa2 жыл бұрын
Which software/package/language are you using for the graphs UI in the end?
@mCoding2 жыл бұрын
Plotly! See the source code in the description if you would like to see the exact code i use to generate the plots.
@cicik572 жыл бұрын
okay, so first, dataclass has no type checking , with attrs you must give validator with validator= , so the notation alone n:int is not working . This foreign library classes are horrible. How do i do it. It is no problem to write a class what is defined as in pydantic, read kwargs and set args with type checking on init and methods, including checking of collection items types, like List what i am almost sure these libraries are not making, but have nice ability with one- command to turn it off as debug is done.
@mCoding2 жыл бұрын
Hi, it seems like you are new to Python. The notation x: int is not supposed to be something checked at runtime, these hints are completely ignored at runtime as this would be a huge (think 10x) performance penalty, which is shown in the graphs in the video. Most type errors can be found by static analyzers, which is who the x: int is for. The only case when you need to do runtime checking is when you don't know the types ahead of time. The most common situation this happens is parsing since you don't know what data you are going to read in next, and this is why pydantic purposefully pays the cost of runtime type checking.
@cicik572 жыл бұрын
@@mCoding hey, i am not new. It is function field types what are ignoted, here is declaration of static class field (n) equals to a class (int): n = int I thought, in THIS tools for example @dataclass the notation SHOULD typecheck, because, why do we write it dataclass construction like that? And i just checked to enter like float instead of int, and it works smoothly. So My solution would be, retaining the @dataclass syntax what i found kind of convinient, because it retains order and no need to specify all arguments as named, create default type- checkers and turn on them, and if you want custom checker, you can write there like a = lambda x: 0
@liesdamnlies33723 жыл бұрын
ALL the dataclasses
@mCoding3 жыл бұрын
I'm sure to get comments about others I forgot :)
@jakubjakubec96933 жыл бұрын
I have my own class decorator that returns dataclass(cls), but I get no type hints this way. Is there a way to fix it ?
@guzziiw2 жыл бұрын
Do you mind explaining why using dict is error-prone? Doesn't seem trivial to me.
@PanduPoluan2 жыл бұрын
Unless you define a TypedDict, you might accidentally mistyped a key resulting in a KeyError.
@zachwhite27169 ай бұрын
Personally I find that the “potential typo” issue is overstated. I have 20 years of python experience and it’s never been a serious source of errors. Code that isn’t easily understood, like when you use a mess of nested classes instead of a simple data structure with a dictionary at its root, however, has caused me a ton of problems and really hard to debug situations.
@vxsery2 жыл бұрын
🎉🎉🎉🎉
@korbiniankoch2 жыл бұрын
Which tool are you using to create the interactive bar charts?
@mCoding2 жыл бұрын
Plotly express
@chaseduckett1353 жыл бұрын
Are you using R ggplot for the plot?
@mCoding3 жыл бұрын
im using plotly!
@tamilvanan3423 жыл бұрын
I see you import modules inside function. Any particular reason?
@mCoding3 жыл бұрын
This was just to make it easier to see which imports were needed for which examples.
@xBZZZZyt3 жыл бұрын
What about list?
@BosonCollider Жыл бұрын
I like msgspec
@viktornerlander14092 жыл бұрын
if i have a very large set of data, with different types of data like multiple timeseries, single character/digit variables etc, should i use dataclasses to store them? and if so how? do i pickle classes? right now i'm using pandas for everything. thanks for the video
@zachwhite27169 ай бұрын
I may be in the extreme minority here, but IMO dataclasses are not a good fit in most situations, but particularly here where you have large sets of nested data. Just stick with dict or pandas.
@NYKevin1002 жыл бұрын
My 2¢ on attrs vs dataclasses: * For application code, you can do whatever you want, but you should consider the cost of taking a dependency. This cost varies depending on the nature of your application and how you build/test/package/etc. it, so there's no one-size-fits-all answer here. * For library code, don't take a dependency unless you absolutely have to, because you will be forcing it on all of your clients. dataclasses exists, so attrs is firmly in the "not absolutely necessary" bucket, and libraries should not depend on it in most cases.
@juliejones87852 жыл бұрын
If only python-box was included. It provides both dictionary style and dot style access