Which Python @dataclass is best? Feat. Pydantic, NamedTuple, attrs...

Рет қаралды 110,776

mCoding

Күн бұрын

Пікірлер: 219

@LiamInviteMelonTeee 2 жыл бұрын

I'm a simple engineering student and a modest python user but those dynamic histograms sent chills down my spine

@mCoding 2 жыл бұрын

Check out plotly and the source code in the description!

@aarondewindt 3 жыл бұрын

The buildin dataclass also has default_factory for defining default mutable values .

@mCoding 3 жыл бұрын

😬 oops, thanks for pointing this out! I should have been more careful when I made the feature matrix.

@PeterZaitcev Жыл бұрын

Furthermore, unlike slots support, this was on the release.

@mpilosov 3 жыл бұрын

This is a great breakdown. I’ve had to explain this so many times to team members, now I’ll refer people to this video!

@franchiniitalo 3 жыл бұрын

Hey James, I just wanted to sincerely congratulate you for both the quality content and humor in your videos, amazing work!!

@mCoding 3 жыл бұрын

Thank you very much for your kind words and support!

@markasiala6355 3 жыл бұрын

I actually have a large ongoing project where I used namedtuples early on, with typing stored in a second tuple, then refactored to NamedTuple using the built in typing (which simplified storing the typing separately), and finally to dataclasses after seeing your video on that. It fit my application perfectly as I needed the flexibility of being able to modify the dataclass. If only I had known about dataclasses to start with. :) attr class also sounds interesting for my needs, I need to check that out.

@subjekt5577 Жыл бұрын

I wish he covered classes extending from named tuple, one of my favorite pre attr methods....

@laurinneff4304 3 жыл бұрын

So when _are_ you going to explain slots? I have no idea what those are

@mCoding 3 жыл бұрын

Gulp, I feel the pressure.

@AzureCz 3 жыл бұрын

@@mCoding yeah, I don't know what you're talking about either D:

@cameronball3998 3 жыл бұрын

That was the Google search I made right after this video 😂 I am intrigued

@Elijah_Lopez 3 жыл бұрын

Classes usually use a dictionary to store variable. If you define a ___slots___ = 'var1', 'var2', you're class can only set those attributes to value mentioned in slots.

@AzureCz 3 жыл бұрын

@@Elijah_Lopez sir you're a legend

@sphereron Жыл бұрын

I've often struggled with ways people define hyperparameters and inputs to neural networks in open source code. This video definitely helped me in my choice going forward.

@Jakub1989YTb 3 жыл бұрын

2:06 got me .. "real life". Those air quotes are heavy.

@kosmonautofficial296 Жыл бұрын

Great thanks so much for this video! I am starting to study pydantic and I haven't been made aware of these differences. This is a huge help and I wish more people would explain these important differences when telling others they should use this or that.

@r2_rho 2 жыл бұрын

this is really the best Python channel on KZbin. I've learned more on this channel than all others combined

@mCoding 2 жыл бұрын

Wow thank you!

@Aang139 2 жыл бұрын

Also would have loved thoughts on TypedDict which mirrors NamedTuple for dictionaries, giving type hinting and string key checking

@Yotanido 3 жыл бұрын

I've used most of these, it turns out. Started with the class that repeats everything. I then used the dict to try and make things slightly more convenient, but that was only feasible in very limited circumstances. Then I discovered named tuples, but... they are tuples. Wasn't a huge fan. Then, finally, I came across attr. That was a huge revelation and I absolutely loved it. Finally something decent. And then dataclasses were introduced to the standard library and I basically switched to using those. attr can do more, sure - but the dataclasses are easier to use and don't need the dependency. Unless I actually need the power of attr, I'll just use these.

@PanduPoluan 2 жыл бұрын

Depends on the data, tuples can be very suitable. For instance, I have to consume a YAML file containing a HUGE sequence of geo-coordinates (lat/long). For these kind of data, the kind that you read, keep in memory, and must not change, tuples are perfectly suitable, uses less memory, and very fast. And NamedTuple, just like other classes, can have methods defined within. So for instance I can write a distance_to() method which will calculate the big circle distance between one geo-coordinate with another geo-coordinate. If you need mutability, though, of course tuple just won't cut it.

@jemand771 3 жыл бұрын

I really enjoyed the text comment/annotation overlays in this video. they both add useful background info and give the video a more relaxed vibe without distracting from the main points! :D

@PanduPoluan 2 жыл бұрын

Basically, one very strong rule of thumb is: If you need immutability and you can validate the data on your own, NamedTuple will _always_ be the best, hands down.

@zacky7862 3 жыл бұрын

Yeah pyndatic is so great for parsing/Serializing Json data. I've been using it. But for simple data, I use built in dataclass

@hackergr325 3 жыл бұрын

At first you got my interest, after the "Presenting with meaningless example" you got my attention. Awesome video once again!

@GRAYgauss 3 жыл бұрын

Type hint gang. I came from a C background, so ducktyping felt like a Godsend. Then I got into rust and realized how much time I was spending debugging code because it was ducktypable. (lets not forget rust's awesome toolchain compared to python's...well yeah.) Hell, I was using i_var, etc just because it made it easier to reason about and not have to backtrack, which is when I first started wondering about it...Didn't fully click until I made the switch though.

@danielrhouck 3 жыл бұрын

Iʼm starting a new Python project and Iʼm using `attrs` because of this video. Otherwise I would have used `namedtuple`, because I think without your videos I somehow would have missed even `dataclas`.

@FranciscoCorreaDias 3 жыл бұрын

1:24 "Will I ever explain slots?" One week later... Thank you so much for your explanations, James!

@doc0core 2 жыл бұрын

This is serious pro stuff. I started using dataclass thanks to your vid, then YT pushed another vid for pydantic and I was like bleh. Luckily this vid set me striaght. Now I understand each's use case. THANKS

@mCoding 2 жыл бұрын

Glad it helped!

@Mutual_Information 3 жыл бұрын

I do not use data classes nearly enough. This is good motivation to change that.

@falxie_ 3 жыл бұрын

Really glad to see slots supported in dataclasses now. When you have a lot of instances of one class slots can save a ton of memory

@mishikookropiridze 3 жыл бұрын

This was added in 3.10?

@falxie_ 3 жыл бұрын

@@mishikookropiridze That's more of a statement than a question isn't it

@mishikookropiridze 3 жыл бұрын

@@falxie_ It is statement and hence you can assign boolean value.

@jlp2011 Жыл бұрын

Pydantic 2.0’s just out, built around a Rust core. They claim up to 50x perf improvement so some of this might be changed. Still, kudos for covering v1’s overhead.

@mCoding Жыл бұрын

Great point! Maybe ill have to do an update video!

@arkadiuszszydeko7264 Жыл бұрын

@@mCoding Looking forward to see how does it compare to what you presented here :)

@fartzy 2 жыл бұрын

Wow this is amazing man thanks for putting this together

@jochengietzen 3 жыл бұрын

Please keep the onscreen comments coming! Adds the perfect amount of fun to an informative topic "cries in mypy" 😁 Great video, thanks 😊

@pedrokalil4410 Жыл бұрын

I am the owner of a backend project at my company and i use only pydantic, as we perform multiple api calls the validations are essential, and it integrates really well with fastapi

@Scranny 3 жыл бұрын

I have used almost all of these, so I can say this is a fantastic summary of the various options.

@tudor1899 4 ай бұрын

Best Python channel on KZbin. Thank you. If you used neovim too, it would be out of this world 😅

@mCoding 4 ай бұрын

Thank you! Not a neovim user although vim was my main editor for a while there.

@maimee1 2 жыл бұрын

4:36 There's TypedDict to consider too tho. (As in the type safety thing. You could type both dict and tuple and use a static type checker. If you use PyCharm and single quotes, accessing data by key is also not typo prone.)

@PanduPoluan 2 жыл бұрын

Why single quotes? There's no difference between single quotes and double quotes.

@maimee1 2 жыл бұрын

@@PanduPoluan Idk, ask PyCharm (and also VS Code I just found out) out. Too clarify: not typo prone => there's IntelliSense / auto completion.

@PanduPoluan Жыл бұрын

@@maimee1 well personally I don't find any difference between using single quotes and double quotes. But then again I always use double quotes because Black enforces that.

@MrLiuHai 2 жыл бұрын

Thx for the explanation! But it seems at this point Python is contradicting its own Zen: "There should be one-- and preferably only one --obvious way to do it." IMHO one should always prefer immutability. The diff between creating a new instance and setter could be ignored. If the performance is that critical, maybe one shouldn't choose Python at the first place.

@relsunkaev 3 жыл бұрын

The apischema package is a good middle ground between Pydantic and dataclasses. It allows you to do the same runtime validation on dataclasses if you need to and has the same features as well as a GraphQL schema generator. It also performs validation faster than Pydantic.

@mCoding 3 жыл бұрын

Never used that one, thanks foe llr sharing!

@MrBoubource 3 жыл бұрын

Wonderful last seconds, but wonderful video too!

@tamles937 3 жыл бұрын

Great video! As always, the topic is well explained and I learnt something new The on-screen comments are really fun, I hope you'll put more of this in the future videos!

@Azzonith 3 жыл бұрын

Great stuff! That would be even better if you'll make a follow up video about serialization of those objects and libs that can help. Often it's required to send tuple/datacalss/etc data over Kafka, to a DB or save as json and etc. Include 'marshmallow' lib in the vid as well!

@PythonisLove 3 жыл бұрын

your videos are always the best

@mikegazes Жыл бұрын

Thanks! This is exactly what I needed.

@soberhippie 3 жыл бұрын

Creating a new tuple still looks just as fast as modifying a value in a dict, interesting

@mCoding 3 жыл бұрын

Yeah that was the biggest surprise for me, but I guess it kinda makes sense since a tuple can be implemented as a thin wrapper around raw memory, but a dict has to do hashing and such.

@NateROCKS112 3 жыл бұрын

However, you'll likely end up needing to get the tuple's values in order to instantiate a new one. So performing a function similar to dict setattr would be at a significant cost.

@SvetlinTotev 3 жыл бұрын

A few arguments for dict gang: Everybody knows how it works and what the syntax is. Many libraries use it as inputs or outputs. If used as an interface it is easy to change it without breaking things. It is trivial to load and store them in json or send them over the network. I generally don't like speed comparisons of python code since that should never be the bottleneck of your program (you are using the wrong language if it is) but it is nice to know that dicts are fast af. I also haven't had any problems with reliability. But I guess that's partially due to my vscode extensions checking what I'm typing and giving suggestions. But I have to agree with you that the syntax is quite ugly compared to accessing elements with a dot. Though I don't think it would be too bad if the language added similar syntax for that (basically any kind of shorthand for ["string"]. But I guess . would be ambiguous. and most other characters already have a meaning. so maybe a double dot? some_dict..some_element)

@Alex-uh6qh 3 жыл бұрын

The problem of dicts is that you cannot check types in compile time. You can store different data types in one field due all runtime. By the way, static analyzers cannot predict the type of element of dict, so other IDEs (like PyCharm) cannot help you with suggestions, especially with available methods for each field. In your IDE, you use an AI-based extensions that predicts data types, but it is not a static analyzer

@masheroz 3 жыл бұрын

This is timely. I've got a program in writing now, and am using dictionaries. I still think that formatting my data as nested dictionaries is the best representation of that data. Also, the original data format is actually defined as a dictionary.

@SvetlinTotev 3 жыл бұрын

@@Alex-uh6qh This is true, but I think by picking python as your programming language you've already given up on being able to easily track the types of objects. With all the type hinting and other type-related stuff you are still quite far from the type information you have in languages like C++.

@oxey_ 3 жыл бұрын

I feel like I should go to casinos more often because I have no idea what slots are :) Great video! Typehint gang

@ripp_ 2 жыл бұрын

I think in the past, because I've been lazy, I've used tuples but, because I don't hate myself, I had constants for which index was which. I don't recommend this but that would give you the speed power of tuples with some of the naming power of namedtuple

@brooklyngerma5674 2 ай бұрын

Hi mCoding, this is a pretty good video, it would be really cool if you could maybe do an updated version? Specifcally I would like to see the addition of TypedDict.

@aaronm6675 3 жыл бұрын

Already know this is gonna be helpful and instructive!

@mrgeniasworld4374 3 жыл бұрын

For sure 🙂

@mrtnsnp 3 жыл бұрын

OK, on type hints. I probably need a kick in the you-know-where, but I can't get it to play nicely with a few packages and features I need. I frequently use numpy, and a lot of funtions don't really care about receiving a single number or a full array of numbers. They may not even care if the number is a float or an int, but let's focus on floats here. The return value typically has the same shape as the main input, but may be a single number. How do I set up type hinting for numpy arrays? How do I set type hinting up for polymorphism?

@wsrgs4 3 жыл бұрын

I haven't looked into it extensively, but I'm aware there is a numpy.typing module which includes an ArrayLike type for anything that can be converted into an array, including scalars. you might want to look into the module documentation. specifying the dimensions of an array in python's type hinting system is generally difficult however, so I'm not sure there's a way to incorporate that information in your annotations.

@PanduPoluan 2 жыл бұрын

Use TypeVar. For instance, here's a made up function: T = TypeVar("T") def makelist(n: int, item: T) -> list[T]: return [item for _ in range(n)]

@ManuelBTC21 3 жыл бұрын

If you care about correctness, I would argue for NamedTuple. The fact that it's immutable is a feature, not a bug.

@mCoding 3 жыл бұрын

Immutabulity is definitely a feature, but mutability is also a feature. As always you should choose based on what is most appropriate for your problem.

@mystisification 3 жыл бұрын

Very informative video, thanks James!

@adirmazhir9159 3 жыл бұрын

its also possible to use namedtuple like this: T = namedtuple('T', 'n f s')

@LerikPav 3 жыл бұрын

There's also TypedDict (since 3.8) with typesafety

@mCoding 3 жыл бұрын

TypedDict is actually just a dict at runtime, it's value is only for static typing.

@noteverymonday 3 жыл бұрын

"As well as BaseModel, pydantic provides a dataclass decorator which creates (almost) vanilla python dataclasses with input data parsing and validation."

@tamerelsayed6368 2 жыл бұрын

thank you for the thorough explanation

@hansdietrich1496 Жыл бұрын

Good comparison, thanks!

@TechSY730 2 жыл бұрын

In current versions of attrs, it only requires assigning fields to an `attrs.ib` if you need anything per field option beyond a default. Otherwise you can use regular variable declarations like dataclasses does. (You might need to use the "next-gen" API, I can't remember at the moment)

@rdean150 2 жыл бұрын

I started using pydantic bc it allows specifying a conversion function to try to cast input values to the desired type. I didn't realize how much of a performance hit that library incurs, or that attrs can do this also but much more cheaply. I guess I should switch to attrs.

@mCoding 2 жыл бұрын

I didn't specifically compare times for when you are doing conversions. Make sure to time your use case yourself since Pydantic may still be faster if you are doing conversions.

@rdean150 2 жыл бұрын

@@mCoding Ah, thanks for the heads up. That probably accounts for a decent chunk of the time difference, as I think pydantic is always going to try to do basic type casting on all values when instantiating new instances, which surely comes with some overhead, particularly when you supply a custom function for it.

@sevdalink6676 Жыл бұрын

For me Pydantic is great for prototyping and the losses are acceptable for the sake of being always in detail informed about data errors. It even enables you to to skip writing early tests because of that. Still the charts are extremely useful to show that Pydantic can be an important target in optimization.

@mCoding Жыл бұрын

An excellent point. This is Python after all, raw speed is not usually what we optimize for and paying some extra runtime cost for data validation when it "shouldn't" be needed may be worth it depending on the situation.

@heroe1486 Жыл бұрын

Is it tho ? 5 microseconds for creation, 9 and 400 ns for getting and setting, and it was before pydantic v2 enhanced by rust. Unless doing several thousands of those are we really concerned about those numbers in python ? Especially when writing an API where the network latency and DB queries could easily reach the 100ms mark in good conditions.

@sevdalink6676 Жыл бұрын

@@heroe1486 I agree that it would be great to see this video with Pydantic V2 performance included. They made amazing progress. I agree with the rest you said as well. You asked and answered you question. Like I said, it can be an important part, not everywhere, but it's good to have it on your checklist.

@12nites 2 жыл бұрын

man, you really hammered down on this issue. No need to watch anything else.

@ananzero8751 3 жыл бұрын

What library was used to generate the graph? It looks nice.

@the_crypter 3 жыл бұрын

Plotly, It's easily the most interactive Visualization Library. It's as simple as matplotlib.

@mCoding 3 жыл бұрын

Yep, plotly express specifically. Check out the code on github! Link in desc.

@saadisave 3 жыл бұрын

@@the_crypter That's a bad measure of simplicity

@etienneboutet7193 3 жыл бұрын

Great video ! But I feel like the onscreen comments were a bit distracting

@elnico5623 Жыл бұрын

I wish there was a channel like this for lua

@ОлегАндрус-ю5е 2 жыл бұрын

thank you man! You helped me a lot!

@behnam_salehi 2 жыл бұрын

Thank you. This information is really useful

@meneereenhoorn 3 жыл бұрын

Great video, thanks so much!

@mCoding 3 жыл бұрын

And thank you for watching!

@eniocc 3 жыл бұрын

Perfect video. Congrats

@yky49 3 жыл бұрын

It is possible to use @dataclass(init=False) and custom __init__() for a parsing purpose. With slots for sure ;)

@khoda81 3 жыл бұрын

How did u measure memory usage?

@ilyam.1872 3 жыл бұрын

Yeah that's cool and whatnot, but have you ever tried this? class D(dict): __getattr__=dict.__getitem__; __setattr__=dict.__setitem__; __delattr__=dict.__delitem__

@mCoding 3 жыл бұрын

Lol no i never considered that :)

@ilyam.1872 3 жыл бұрын

@@mCoding absolutely should, it's so easy and error-prone, practically a cheeseburger of python.

@scottbrewer474 3 жыл бұрын

And here I was thinking I was fancy by bundling data into a dictionary vs lots of variables! (Stupid Dunning-Kruger effect)

@zachwhite2716 9 ай бұрын

Give yourself enough time and you’ll come back to the wisdom of simply using dictionaries instead of complex nested objects.

@Timmie_Tudor Жыл бұрын

Hello, if you didn't know, I decided to use the dataclass Python decorator as my handle

@mCoding Жыл бұрын

Haha you are gonna get a lot of accidental mentions with a handle like that!

@t2udu 3 жыл бұрын

Really liked the visualization. Is that plotly?

@mCoding 3 жыл бұрын

Yep! See the code to produce it on GitHub!

@t2udu 3 жыл бұрын

@@mCoding will check that out

@hdtlab 3 жыл бұрын

Still prefer dataclass since there is no need to install additional packages :)

@VegetableJuiceFTW 2 жыл бұрын

would have been cool to compare pydantic with the validation turned off for fairness sake :D

@joshbennett5908 2 жыл бұрын

What tool are you using for your bar chart?

@mCoding 2 жыл бұрын

Plotly express! It can export to html you can share in your browser without python even installed.

@red13emerald 3 жыл бұрын

Awesome comparison! What did you create the interactive graph at the end with? Looks like a nicer version of matplotlib.

@aflous Жыл бұрын

Plotly

@florianfuchs325 2 жыл бұрын

Hi Excellent Video! I was wondering what would be the right choice if I wanted to use the created class in a jit compiled numba function? As far as I have seen, namedtuples seem to be most suitable?

@PanduPoluan 2 жыл бұрын

I think you need a class that is serializable. namedtuple and NamedTuple are serializable by default.

@Destrolll 3 жыл бұрын

Please care to explain why shouldn't I assign attributes to an instance of an empty class? 4:50

@lex_darlog_fun 3 жыл бұрын

@mCoding are you REALLY sure you've measured memory footprint correctly? What was your test methodology? The difference between NamedTuple/dataclass/class is supposed to be quite different from what you've shown (they do differ but not THAT much). According to this video (it's in russian, but code is clearly visible): youtube /tsEG0WM3m_M?t=60 : 1. The author uses pympler.asizeof() function instead of built-in ones since it's the only right way to measure *FULL* memory consumption of a given object. I personally re-tested it (generated a HUGE collections, taking literally gigabytes if RAM) - and yes, the built-in ones were returning some rediculous results, not even close to the actual RAM taken by python interpreter. 2. According to his tests, the difference is actually like this (on 1k instances): 2:05 - dict = ~ 1.2MB 3:44 - dataclass = ~ 1Mb 5:04 - namedtuple = ~ 720 Kb 5:54 - typed NamedTuple = also ~ 720 Kb

@mCoding 3 жыл бұрын

It's hard to say whether the way I counted things is the "correct" way because it depends on what you wanted to count, but the numbers are approximately the same with pympler vs the getsize method I used. The order of which classes use the most memory is exactly the same with either method. The main difference between what pympler does vs what I did not do is that pympler tried to account for object alignment. pympler assumes that all Python objects are 8-byte aligned and no packing is done (hence why the pympler answers are all multiples of 8), counting padding bytes in the total size count. On the opposite end my getsize assumes all objects are optimally packed together, not including padding bytes in the total size. The truth is probably somewhere in the middle and also an implementation detail that could change at any moment. But, in any case, I wouldn't call either method the "correct" one, they are both good estimates and their difference is pretty small. Also note that depending on the way you do your tests the data can make a big difference in how much space is actually used. For example (1,1) uses less memory than (1,2) because the 1 objects in the first tuple are the same. pympler 0: dataclass (slots) - 168 bytes 1: plain class (slots) - 168 bytes 2: tuple - 176 bytes 3: NamedTuple - 176 bytes 4: namedtuple - 176 bytes 5: attr class (slots) - 176 bytes 6: dataclass - 432 bytes 7: plain class - 432 bytes 8: attr class - 432 bytes 9: dict - 512 bytes 10: SimpleNamespace - 552 bytes 11: pydantic - 560 bytes method i used in video 0: dataclass (slots) - 162 bytes 1: plain class (slots) - 162 bytes 2: tuple - 170 bytes 3: NamedTuple - 170 bytes 4: namedtuple - 170 bytes 5: attr class (slots) - 186 bytes 6: dataclass - 408 bytes 7: plain class - 408 bytes 8: attr class - 408 bytes 9: dict - 488 bytes 10: SimpleNamespace - 528 bytes 11: pydantic - 536 bytes

@lex_darlog_fun 3 жыл бұрын

@@mCoding thanks for such a detailed responce. > For example (1,1) uses less memory than (1,2) Obviously, when you do performance tests, you need to intentionally break those under-the-hood optimisations. Back then, when I was checking myself examples from the forementioned video, I used the simplest values for items I could think of. iirc, each class (simple class, dataclass, dict, set, list, tuple and various types of named tuples) had just 3 values: 1. an int, unique for each item (and I know that int is internally optimised up to 256 or smth, but that's neglegable relative to the total number of items I had for test - iirc, it was about millions, tens of millions or smth of that matter). 2. the same int, converted to a string, padded with random ASCII characters to make all the strings of equal length (used random characters instead of zeroes - just to be sure). 3. a float in [0, 100.0] range - also unique for each item. And to be the most precise, as I said, I kept increasing the number of items until the total collection size reached above 1 Gb. Each measure attmpt was done in a separate python session. And that's the thing I'm intrested the most when I asked about your methodology. With your method - did you just create a single instance and measured it or you generated a big enough number of them, measured the total consumption and divided it by the number of items? I mean, a single item difference might be 168 bytes vs 162. But if you have a tuple with a million of dataclass instances vs the same tuple type storing the same million of items with the same underlying data, but items themselves are NamedTuples now, my results were very different from what you've shown. At the end of a day, it doesn't matter that each individual instance is reported about the same. What matters is when you have a ton of them, and the only varying factor is type of an item, you should count the total difference as overhead. You won't use just a single instance of that dataclass/namedtuple in your program. So I don't know the theory behind it, but in practice my own tests gave the same results that russian guy tells in the video. And dataclass vs NamedTuple were nowhere near 162 vs 170 numbers you provide. Speaking of which, I have no idea how it's even possible for dataclass to take less memory than a named tuple or even a simplest tuple. So, could you disclose your methodology? To be clear: I'm not attacking, I really want to know the actual difference in various types of data containers. I'm just concerned that the numbers you provide conflict with basically everything I ever heard on the subject and with my own synthetic tests.

@AleksandarLazov Жыл бұрын

Also dataclasses can be "frozen" so they are not modified, which to me is better than pydantic's BaseModel

@jmcantrell 3 жыл бұрын

What are you using for the visualizations at the end?

@rikschaaf 3 жыл бұрын

Can't you throw your python code through some optimizer to convert everything to a tuple wherever possible? Your source code would still be your own readable code, but the optimized code that comes from that will be more optimized for speed and memory usage. Best of both worlds!

@luisraguzzoni5409 3 жыл бұрын

Your videos are so good that I believe you could create a good intermediate-advanced python course. Just saying

@vekyll 2 жыл бұрын

I'm a bit confused... do you have any idea why SimpleNamespace's get is so horribly slow? I mean, it's a hash lookup anyway.

@dylan-dylan-dylan Жыл бұрын

Accessing a dictionary's values by key is its primary purpose...it's only error-prone if you are ignorant to the pass-by rules of the value's type. #teamdict

@0730pleomax 3 жыл бұрын

Pydantic, attr, dataclasses, NamedTuple

@alansnyder8448 Жыл бұрын

@mCoding. Could you redo this video with Pydantic 2.0? I get what you are saying about @dataclass being used in internal applications but sometimes you don't know for sure if it won't eventually be serialized into JSON, so pydantic is something I choose if I'm not sure. I want to know if the new 2.0 with Rust implementation has gotten the speed into the same ballpark as the other options.

@mCoding Жыл бұрын

Hmm, perhaps. While a rust implementation under the hood may improve performance, I suspect that it will not change the qualitative picture very much. Pydantic is slower primarily because it is fundamentally doing more work, namely validation and conversion, whereas the other options do neither validation nor conversion.

@alansnyder8448 Жыл бұрын

@@mCoding Maybe a good video might be how to use Dataclass and Pydantic together. I think in my case half of my projects are with FastAPI which I love and it depends on Pydantic. I've seen too many videos that compare Pydantic with Dataclasses (yours included) and have come to think of them in the same category. Since I'm already working with Pydantic in half my projects I've just gotten very comfortable with them. Knowing the performance hit puts a slightly different spin on the situation so maybe Dataclasses should be used for all internal-only data that won't be parsed. So then maybe just wrap a Dataclass in a field of a Pydantic class when you need to parse it. I'll keep this in mind myself in the future. Pydantic + Dataclasses would be an interesting video for me if you solicit ideas.

@plato4ek 2 жыл бұрын

4:43 "SimpleNamespace is just like object, except it allows you to set attributes on it at runtime whereas object doesn't " Maybe I don't get something, but objects do allow you to set attributes (= instance variable, right?) at runtime: ```python-repl >>> class A: ... pass ... >>> a= A() >>> a >>> a.a Traceback (most recent call last): File "", line 1, in AttributeError: 'A' object has no attribute 'a' >>> a.a = 5 >>> a.a 5 ```

@mCoding 2 жыл бұрын

Glad to see you are watching so many of my videos! Yes classes let you set attributes at runtime. What im referring to here is that objects whose type is literally object, as in "x = object()" cannot have attributes set on them. If you try "x.a = 0" you get an error!

@plato4ek 2 жыл бұрын

@@mCoding OK, I see now. Thanks. Yes your videos are really interesting and useful. So I decided to watch them all. But I guess I should spend less time watching videos and more time writing code. :)

@jesavius 3 жыл бұрын

Pydantic is always the answer. Since you can use the built-in dataclasses within Pydantic, if needed. But if you work in corporate, than yeah, dataclasses is the only answer 😅

@grzegorzryznar5101 2 жыл бұрын

@mCoding How do measure speed execution in a repative way? I was trying to measure performance, but for the same setup I had got scores differing a lot (more than few percentages). Code was purely in Pyhon, no external sources, no io, but still differences were very noticeable.

@mCoding 2 жыл бұрын

For this video I believe I used timeit since they are tiny snippets, and the timing code is available in the github repository in the description. Timing measurements may vary drastically depending on things such as on your your cpu and version of Python, which is why it is always best to verify the timings for your own setup!

@PanduPoluan 2 жыл бұрын

@@mCoding Also with Intel's franken-CPU having "P" cores and "E" cores, it will be a gamble.

@chriskeo392 2 жыл бұрын

What is the use case for slots?

@deekshantwadhwa 2 жыл бұрын

Which software/package/language are you using for the graphs UI in the end?

@mCoding 2 жыл бұрын

Plotly! See the source code in the description if you would like to see the exact code i use to generate the plots.

@cicik57 2 жыл бұрын

okay, so first, dataclass has no type checking , with attrs you must give validator with validator= , so the notation alone n:int is not working . This foreign library classes are horrible. How do i do it. It is no problem to write a class what is defined as in pydantic, read kwargs and set args with type checking on init and methods, including checking of collection items types, like List what i am almost sure these libraries are not making, but have nice ability with one- command to turn it off as debug is done.

@mCoding 2 жыл бұрын

Hi, it seems like you are new to Python. The notation x: int is not supposed to be something checked at runtime, these hints are completely ignored at runtime as this would be a huge (think 10x) performance penalty, which is shown in the graphs in the video. Most type errors can be found by static analyzers, which is who the x: int is for. The only case when you need to do runtime checking is when you don't know the types ahead of time. The most common situation this happens is parsing since you don't know what data you are going to read in next, and this is why pydantic purposefully pays the cost of runtime type checking.

@cicik57 2 жыл бұрын

@@mCoding hey, i am not new. It is function field types what are ignoted, here is declaration of static class field (n) equals to a class (int): n = int I thought, in THIS tools for example @dataclass the notation SHOULD typecheck, because, why do we write it dataclass construction like that? And i just checked to enter like float instead of int, and it works smoothly. So My solution would be, retaining the @dataclass syntax what i found kind of convinient, because it retains order and no need to specify all arguments as named, create default type- checkers and turn on them, and if you want custom checker, you can write there like a = lambda x: 0

@liesdamnlies3372 3 жыл бұрын

ALL the dataclasses

@mCoding 3 жыл бұрын

I'm sure to get comments about others I forgot :)

@jakubjakubec9693 3 жыл бұрын

I have my own class decorator that returns dataclass(cls), but I get no type hints this way. Is there a way to fix it ?

@guzziiw 2 жыл бұрын

Do you mind explaining why using dict is error-prone? Doesn't seem trivial to me.

@PanduPoluan 2 жыл бұрын

Unless you define a TypedDict, you might accidentally mistyped a key resulting in a KeyError.

@zachwhite2716 9 ай бұрын

Personally I find that the “potential typo” issue is overstated. I have 20 years of python experience and it’s never been a serious source of errors. Code that isn’t easily understood, like when you use a mess of nested classes instead of a simple data structure with a dictionary at its root, however, has caused me a ton of problems and really hard to debug situations.

@vxsery 2 жыл бұрын

🎉🎉🎉🎉

@korbiniankoch 2 жыл бұрын

Which tool are you using to create the interactive bar charts?

@mCoding 2 жыл бұрын

Plotly express

@chaseduckett135 3 жыл бұрын

Are you using R ggplot for the plot?

@mCoding 3 жыл бұрын

im using plotly!

@tamilvanan342 3 жыл бұрын

I see you import modules inside function. Any particular reason?

@mCoding 3 жыл бұрын

This was just to make it easier to see which imports were needed for which examples.

@xBZZZZyt 3 жыл бұрын

What about list?

@BosonCollider Жыл бұрын

I like msgspec

@viktornerlander1409 2 жыл бұрын

if i have a very large set of data, with different types of data like multiple timeseries, single character/digit variables etc, should i use dataclasses to store them? and if so how? do i pickle classes? right now i'm using pandas for everything. thanks for the video

@zachwhite2716 9 ай бұрын

I may be in the extreme minority here, but IMO dataclasses are not a good fit in most situations, but particularly here where you have large sets of nested data. Just stick with dict or pandas.

@NYKevin100 2 жыл бұрын

My 2¢ on attrs vs dataclasses: * For application code, you can do whatever you want, but you should consider the cost of taking a dependency. This cost varies depending on the nature of your application and how you build/test/package/etc. it, so there's no one-size-fits-all answer here. * For library code, don't take a dependency unless you absolutely have to, because you will be forcing it on all of your clients. dataclasses exists, so attrs is firmly in the "not absolutely necessary" bucket, and libraries should not depend on it in most cases.