CppCon 2017: Fedor Pikus “C++ atomics, from basic to advanced. What do they really do?”

Рет қаралды 202,557

Күн бұрын

CppCon.org
-
Presentation Slides, PDFs, Source Code and other presenter materials are available at: github.com/CppCon/CppCon2017
-
C++11 introduced atomic operations. They allowed C++ programmers to express a lot of control over how memory is used in concurrent programs and made portable lock-free concurrency possible. They also allowed programmers to ask a lot of questions about how memory is used in concurrent programs and made a lot of subtle bugs possible.
This talk analyzes C++ atomic features from two distinct points of view: what do they allow the programmer to express? what do they really do? The programmer always has two audiences: the people who will read the code, and the compilers and machines which will execute it. This distinction is, unfortunately, often missed. For lock-free programming, the difference between the two viewpoints is of particular importance: every time an explicit atomic operation is present, the programmer is saying to the reader of the program "pay attention, something very unusual is going on here." Do we have the tools in the language to precisely describe what is going on and in what way it is unusual? At the same time, the programmer is saying to the compiler and the hardware "this needs to be done exactly as I say, and with maximum efficiency since I went to all this trouble."
This talk starts from the basics, inasmuch as this term can be applied to lock-free programming. We then explore how the C++ lock-free constructs are used to express programmer's intent clearly (and when they get in the way of clarity). Of course, there will be code to look at and to be confused by. At the same time, we never lose track of the fact that the atomics are one of the last resorts of efficiency, and the question of what happens in hardware and how fast does it happen is of paramount importance. Of course, the first rule of performance - "never guess about performance!" - applies, and any claim about speed must be supported by benchmarks.
If you never used C++ atomics but want to learn, this is the talk for you. If you think you know C++ atomics but are unclear on few details, come to fill these few gaps in your knowledge. If you really do know C++ atomics, come to feel good (or to be surprised, and then feel even better).
-
Fedor Pikus: Mentor Graphics - Siemens business, Chief Scientist
Fedor G Pikus is a Chief Engineering Scientist in the Design to Silicon division of Mentor Graphics Corp. His earlier positions included a Senior Software Engineer at Google and a Chief Software Architect for Calibre PERC, LVS, DFM at Mentor Graphics. He joined Mentor Graphics in 1998 when he made a switch from academic research in computational physics to software industry. His responsibilities as a Chief Scientist include planning long-term technical direction of Calibre products, directing and training the engineers who work on these products, design and architecture of the software, and research in new design and software technologies. Fedor has over 25 patents and over 90 papers and conference presentations on physics, EDA, software design, and C++ language.
-
Videos Filmed & Edited by Bash Films: www.BashFilms.com
*-----*
Register Now For CppCon 2022: cppcon.org/registration/
*-----*

Пікірлер: 133

@get_youtube_channel_name 5 жыл бұрын

i used to think all these conferences were bs, but man.. these days i just kept watching cppcon when i was supposed to watch memes and vines

@Bakuta1103 4 жыл бұрын

relatable.....

@DanielKierkegaardAndersen 3 жыл бұрын

I just read that while staying up for 10hours.. watching almost cppcon only 🙃

@DrumMeister 2 жыл бұрын

This one is if you write Rust code :-D

@VivekYadav-ds8oz 2 жыл бұрын

The best part is that you can learn from them regardless of if you use C++ or not. I write Rust but since there isn't much native content for Rust, I watch these CppCon videos and they're made really well by really smart and experienced people.

@TranscendentBen Жыл бұрын

Memes and WHATS? This comment sure shows its age ...

@dirtbikersteve 6 жыл бұрын

Finally a good explanation of memory orders without hand waving and saying 'it's to complex to worry about'

@somehrjdbddhsjsbnsndndk751 2 жыл бұрын

Hello can you please explain to me what he meant at 56:00 time stamp when he was talking about loading an atomic value in a destructor. I don't understand it and it has been now 4 days that I am bothered with it. If you can just point me to some resource at least, I would be really thankful. I even asked a question in the cpp forum but it seems no one knows really as I got zero answers to it. Two things I don't understand: 1) Why reading that value in the destructor is wrong or can cause undefined behavior? 2) How come the fast that we could use a non atomic read solves the problem? It is even harder for me to understand as I don't seem to detect or have a hint at what the issue is. Thank you very much.

@FedorPikus 2 жыл бұрын

@@somehrjdbddhsjsbnsndndk751 It's not wrong and it won't cause undefined behavior. However, it's also not necessary: atomic accesses are needed only when two threads may be accessing the same value at the same time. Usually, when the destructor runs, no other thread is accessing the same object. Therefore, atomic access is not necessary. To explicitly state that I do not expect any other thread to call methods on an object at the same time as it's destroyed, I would like to use explicit non-atomic accesses in the destructor. On the other hand, there are some rare cases when another thread may observe the object while it's being destroyed. To highlight that I'm dealing with this specific and tricky situation, I would like to use atomic accesses in the destructor only when they are truly needed. The standard gives me no way to express this difference. Every access to an atomic variable is atomic. Even if I know that particular access happens in a single-threaded context, I can't convey this knowledge to the reader of my program.

@philipye7936 2 жыл бұрын

Really love this talk, using 1 hour watching this video is actually more efficient than reading cpp reference using a day :)

@Cons-Cat 2 жыл бұрын

I totally agree. Very few talks make me feel this way.

@bofengluo1287 2 жыл бұрын

maybe it is just me who speed more than one hour to watch this video😅

@jacob_90s Ай бұрын

Well on the one hand you're watching something by a person who wants to teach you something, on the other hand you're reading something written by someone who assumes you're already an expert in what you're reading about.

@sumeshnb2 3 жыл бұрын

Finally, this guy made me understand memory ordering. Nice presentation...

@CppCon 3 жыл бұрын

Glad it helped!

@Sychonut 4 жыл бұрын

I really enjoy listening to Fedor. Knowledgeable speaker doing a great job explaining complicated topics.

@Gulzt 6 жыл бұрын

Very insightful. Especially with the pictures things fell in the place!

@kullatnunu6894 5 жыл бұрын

Very good talk! Thank you very much for speeding up my learning about atomics and lock free algorithms.

@BartoszZasieczny 2 ай бұрын

Best video explaining memory ordering ever.

@ogalevn 4 жыл бұрын

That's a great talk! Worth to watch!

@user-vs6kl8ph5v Жыл бұрын

the most decent and clear explanation of atomics i've seen so far

@flisboac 5 жыл бұрын

Amazing presentation. Learned a lot. Thank you!

@JiveDadson 6 жыл бұрын

I was hoping for a clear explanation of the various std::memory_order types. It starts at 42:26

@openyourmind1694 2 жыл бұрын

You have to be patient

@Astfresser 2 жыл бұрын

How I understand it: A memory barrier forces the cpu to commit its variable-changes to a place where they are visible to other threads on other cores such as main memory or through special sockets between caches (as he talks about in the q&a). Otherwise changes are accumulated in L1 and L2 caches and written to main memory together (flush of the cache line). x.store(1, std::memory_order_release) commits all writes that happened on the executing CPU before the memory barrier to main memory and makes it instantly visible to other threads. x.load(std::memory_order_acquire) makes sure that all the writes that happened until the writing of x (with x.store()) in the other thread are now visible to this thread -> it "acquires" those writes , basically requesting clarity of memory at that point

@pavelperina7629 2 жыл бұрын

@@Astfresser If I understand it, release/acquire must be paired and then it says everything to the point of the store operation is guaranteed to be written. SEQ_CST is quite hard to understand. And then it's a bit hard to understand what happens when there are pairs like release/seq_cst. And I guess once I use condition_wait and notify_one, I can just use relaxed on all atomic variables because these synchronization primitives and mutexes use memory barriers as well.

@hl2mukkel 5 жыл бұрын

What an amazing talk! Very well explained!

@ilnurKh 6 жыл бұрын

Great. Thanks for speech, examples and bechmarks.

@assonancex 2 жыл бұрын

This talk was superb. Complex topic delivered eloquently. Thanks for that.

@CppCon 2 жыл бұрын

Glad you liked it!

@radioleta 6 жыл бұрын

These conferences are pure gold

@rohith805 6 жыл бұрын

Amazing Talk..I just lovd it

@QT-yt4db Жыл бұрын

Excellent. First time I really understand these ideas.

@supercrazy2388 6 жыл бұрын

i like this guy

@afigegoznaet 6 жыл бұрын

Fantastic!

@DevCurioso 3 ай бұрын

Eye opening presentation, very well explained

@nguyendaison27 2 жыл бұрын

Thank you Mr.Fedor !

@shengliangxu8252 4 жыл бұрын

Thank you sir. Really well explained.

@joeybluee Жыл бұрын

Great talk, it's the clearest explanation I ever watched.

@CppCon Жыл бұрын

Thank you for your comment. Pleased to hear that this presentation was helpful.

@wyhao31 4 жыл бұрын

Amazing talk! Thanks!

@anassgoldenboy 5 жыл бұрын

Great talk.

@vbelyanin 5 жыл бұрын

Great video. Thanks

@alexey104 2 жыл бұрын

That was really interesting and useful talk, thanks a lot!

@CppCon 2 жыл бұрын

Glad to hear that!

@2005kpboy Жыл бұрын

All these conferences on cppcon are awesome.

@CppCon Жыл бұрын

Thank you so much!

@user-jo7ue3xq1x 3 ай бұрын

it really helps a lot !!!!

@mohsenzare2511 4 ай бұрын

c++ is beautiful, and this talk is great

@potter2806 2 жыл бұрын

You let me know more detailed about the atmoic under the hood. When it comes to hardware, the optimize granularity can be really small.

@marcel_pi 4 жыл бұрын

Is there a reason for the omission of atomic single bit operations in the standard? (Excellent presentation!)

@EmperorKatax Жыл бұрын

great work ... : ) ,, .. in case you like to review a topic again: // 11:24 limitations for atomic // 15:40 compare_exchange_strong // 23:06 atomic vs mutex speed // 26:30 secret of atomic ! // 33:58 compare_exchange_weak // 42:12 memory // 56:48 review

@TranscendentBen Жыл бұрын

13:15 This is crazy! I would NEVER have guessed this. To misquote Haldane's Law, things aren't just more complicated than we imagine, they're more complicated than we CAN imagine.

@notsojharedtroll23 2 жыл бұрын

To be really honest, with this I finally comprehended how threads are important and hence fort, a little intro to parallel computing. Really getting into the C++ magic stuff

@scolic03 6 жыл бұрын

at 48:00 he is referencing some other talk, about locks? what video is that?

@asdfasdfasdfasdfadsf 6 жыл бұрын

As to the naming of compare_exchange_weak(), from Anthony Williams "C++ Concurrency in Action: Practical Multi threading" pg. 113: "For compare_exchange_weak(), the store might not be successful even if the original value was equal to the expected value, in which case the value of the variable in unchanged and the return value of compare_exchange_weak() is false. This is most likely to to happen on machines that lack a single compare-and-exchange instruction, if the processor can't guarantee that the operation has been done atomically - possibly because the thread performing the operation was switched out tin the middle of the necessary sequence of instructions and another thread scheduled in its place by the operating system where there are more threads than processors. This is called a spurious failure, because the reason for the failure is a function of the timing rather than the values of the variables."

@intvnut 6 жыл бұрын

On LoadLink/StoreConditional architectures, the StoreConditional could fail for arbitrary reasons. For example, you might have a single hardware monitor, and you need to invalidate it (lose the link) on a context switch, or when cache coherence messages require you to move the line associated with the monitor. "TimedLock" is not necessarily a bad way to model this. However, it's a bit misleading in the sense that time isn't expressed in terms of a simple time unit; rather, it's saying "enough time has passed for a frustrating operation to arrive before I make progress, forcing me to try again."

@ZiyangJiangSheep 5 жыл бұрын

@@Key_C0de hey man...

@user-tv8vl8mk2h 4 ай бұрын

How fantanstic!

@gubgubbubbub 2 жыл бұрын

Great presentation! That intro was great, LOL

@CppCon 2 жыл бұрын

Glad you liked it!

@pruibiebehastoet1914 5 жыл бұрын

Program B (code in 2:20) performs the computation locally and only locks the mutex to store the result. No wonder it is so much faster than A where an atomic read-modify-write has to lock the cache line for each update.. That's comparing apples and oranges.

@timothymusson5040 5 жыл бұрын

Pruibiebe Hastoet that’s his point. Algorithm trumps everything. Then, measure performance to verify assumptions.

@homomorphic 6 жыл бұрын

Interesting. std::atomic actually offers more guarantees than I assumed it would (the miracle of low expectations :-).

@selvalooks Жыл бұрын

Wonderful!!!

@CppCon Жыл бұрын

Many thanks!!

@haxpor 3 жыл бұрын

For the 2nd question, memory_order_relaxed just guarantees atomic operation will be performed, but no constraint applied.

@AlexYu-nm3dg 11 ай бұрын

Great talk. I have my own benchmark but it shows: atomic > mutex > spinlock('>' means faster). Definitely my benchmark doesn't coordinate with his. Any chance I can take a look at the source code of his benchmark?

@lkshtfl7594 2 жыл бұрын

Thanks

@szymoniak75 5 ай бұрын

am I the only one who is slightly confused, especially about the memory order part? everyone says this talk is great and all but I'd watched Herb Sutter talk before on atomics and I thought I somewhat understood it and now watched this and now I'm confused

@tuobak4375 3 жыл бұрын

49:22: thx!! solve my doubt about CAS

@sanjayts Жыл бұрын

Would someone please know which talk he is referring to when he says "I'm not going to go into as much details as Paul did"?

@user-fm9gn5wh1d 6 ай бұрын

On slide 53 (38:00) what happens if one thread sets, say, q[7] while another sets q[8]? If they are both in the same cache line, won't the threads interfere with each other?

@user-ff8qw2ry6u 4 ай бұрын

Logarithmic conference cppcon. Thanks.

@microcolonel 4 жыл бұрын

It's funny to me that the default is not a CAS-based multiply. On the other hand, it is tragic that they made the method name for CAS so goddamn long. It's somehow longer than the standard name for the operation (compare and swap).

@TheWarTurkey 2 жыл бұрын

Awesome talk; I've watched it several times! I have a question, though: I understand that unsynchronized reading and writing to the same location in memory is undefined behavior. I also understand that the CPU/memory subsystem moves data in terms of cache lines. What I am still unclear on is what happens if I have two variables in my program that just so happen to be on the same cache line (e.g. two 32-bit ints), but I didn't know that: My program has two threads that each read and write from their own variable, but I, the programmer, thought it was OK because there's no (intentional) "sharing?" If these threads are non-atomically writing to their own variables, do they still have the same problem where they can "stomp" on each other's writes?

@dingliu5273 2 жыл бұрын

For modern processors that can do single-byte reads, there's no such worry.

@FedorPikus 2 жыл бұрын

If two variables are on the same cache line but not at the same address, correctness-wise nothing bad happens (caches are hardware optimizations that are supposed to be invisible when observable results are concerned). Performance-wise, if two threads are accessing the same cache line, it's very bad: it uses the same hardware lock as it does when using a true shared value like a mutex. Each thread locks the entire cache line and blocks the other thread until it's done. So you get the performance of a mutex without the benefits. Now, in order to get two values that close, you have to either declare them as local variables next to each other or put them in the same data structure. You are very unlikely to get this by calling new/malloc twice. But if you have two threads operating on parts of the same data structure (like elements in a vector) then you need to be careful. If it's a large vector (array, etc) and you divide it between threads (thread 0 gets elements 0 to N-1, thread 1 gets N-1 to 2N), don't worry about it, odds are the threads will be looping over their sections of the vector and won't hit the boundary around Nth element at the same time. If you have a data structure with one or two small elements per thread (like an array of ints, one per thread) then you will get dismal performance unless you scatter them across different cache lines. The easiest way is using alignas to align each per-thread value on a 64-byte boundary (in C++17, alignas(std::hardware_destructive_interference_size) is available instead of hardcoding 64).

@TheWarTurkey Жыл бұрын

Thank you,@@FedorPikus!

@Astfresser 2 жыл бұрын

I figure that atomics work a lot like git. you prepare memory in your local cache ("local repo") and release it ("pushing to a remote origin")

@thejezzi5219 7 ай бұрын

What a tough crowd

@PizzaPhD 6 жыл бұрын

Spurious failures are there for ll/sc architectures. sc will fail if the atomic value (cache line) has been written, even if the same value was written as was originally stored. This is useful, as it makes it easier to write ABA-safe algorithms. sc will also fail "spuriously" in other situations such as a context switch (or cache flush, etc.) between the ll and sc operations. cmpxchg has no equivalent failure mode, because the operation is atomic with respect to context switches. ll and sc, as separate instructions, don't have this same guarantee. ll/sc platforms can spuriously fail the sc in many ways; a cache flush/context switch is the biggie, but they might well not allow multiple locations to be lled simultaneously; they might not even allow a regular load to occur between an ll/sc pair. I think the compiler should make sure to respect these rules (there's no point doing nested ll/sc pairs if they'll always fail, after all) but the compiler can't prevent context switches. If you're writing for an ll/sc architecture, trapping these spurious failures and handling them specially can allow certain algorithmic improvements. In this situation, you'd use the weak version. If you're using an algorithm written for a CAS architecture, and have no special handling of sc failures, then you'll want to use the strong version anyway.

@intvnut 6 жыл бұрын

Yep, pretty much. In the one LL/SC architecture I helped architect, if anything came within a hair's breadth of the location being monitored (due to LL), we'd shoot down the monitor and force a loop. One thing we did mess up on that architecture: We didn't properly invalidate the monitor when taking an interrupt, so we required a costly software workaround to manually do so. This was an artifact of our memsys folks not really talking with our CPU folks, and the fact that the architect above it all was being somewhat ignored by both, at least with respect to LL/SC, since multicore systems were still rare in our space at that time. It was a rather unfortunate situation, as the feature could have been quite nice. IIRC, we ended up only shipping LL/SC support on one device, which is rather disappointing to me. It was a casualty of politics, I suppose. Our particular variant of LL/SC actually split it into _three_ operations: LoadLink/StoreLink/CommitLink. The reason had to do with our pipeline behavior (exposed pipeline VLIW). We didn't have a natural way to store a value to memory and write a result to the register file afterward. So, SL would deposit data in the monitor alongside the monitored address, and CMTL would actually pull the trigger (write the value to memory) if the monitor hadn't been invalidated. From the CPU's perspective, CMTL's pipeline behavior was similar to a load.

@az09letters92 4 жыл бұрын

To prevent ABA, I simply include a counter within some bits of cmpxchg'd value. That way not only I know same value has been written, but also how many times... not that knowing that count is that helpful in real life scenarios.

@openyourmind1694 2 жыл бұрын

50:56 On x86: - all loads are acquire-loads, all stores are release-stores ● but adding the other barrier is expensive Does it mean that on x86 even I specify "memory_order_relaxed" for a store (just to give an example) the real memory order continues to be "memory_order_release" ? No possibility of "downgrade" in this architecture?

@ckingpro Ай бұрын

I saw this 2 years late, but not necessarily. Nothing prevents the compiler from rearranging things around even if x86 architecture doesn’t. Or worse, the compiler doesn’t rearrange now, but does in a newer version in the future and the bug becomes visible

@shelvacu 3 жыл бұрын

"It's hard to write if you don't care about occaisonal bugs; It's *really* hard if you do"

@mywtfmp3 6 жыл бұрын

Note that for struct B {long; long; }, b.is_lock_free == true before gcc7, but b.is_lock_free == false since gcc7. This is due to an ABI change in libatomic. For more info, please look at gcc mailing list, gcc.gnu.org/ml/gcc-patches/2017-01/msg02344.html.

@eugeniyavoroshilo7400 Жыл бұрын

25:42 I am confused. How does "spinlock" catch up at large number of threads? In your benchmark it is faster all the way? Beats atomic easily doesn't it? Also, on 51:13, "release on read and acquire on write are really expensive". Isn't it compile-time prohibited to do those operations?

@FedorPikus Жыл бұрын

Depends on the hardware, but yes, spinlock can be faster (a lot of effort went into optimizing and eliding locks in hardware). Note that even when it gives faster throughput, spinlock burns a lot more CPU time, so it's possible that the overall application is better off doing atomics or CAS. In C++, it is UB (but not diagnosed by the compiler) to use acquire order with store() or release order with load(). You have to use the exchange hack. That being said, this is a C++ limitation, hardware may be able to do it, and X86 can but it's very expensive relative to the "natural" memory order (of course, you have to write inline assembly to get this behavior).

@NolePTR 6 жыл бұрын

On Windows the fastest way I've found in general is to use a benaphore.

@origamibulldoser1618 6 жыл бұрын

Double plus good.

@somehrjdbddhsjsbnsndndk751 2 жыл бұрын

@simhendra2377 10 ай бұрын

I believe the implication was that it is nonsensical/bad for your program to allow one thread to be using an object at the same time another thread is destroying that object - one should never write code that allows this case to happen. So, it is not the "reading that value in the destructor" which is wrong, it is the idea that another thread can be using the object while it is being concurrently destroyed which is wrong. And finally, the fact that there is a std::memory_order written there implies to the reader of the code that this nonsensical situation exists.

@user-zp3nd6ht8v 4 жыл бұрын

18:23, the joke is so good that it makes someone need an ambulance.

@Raattis 6 жыл бұрын

How come no programming language handles cache lines explicitly?

@michaellee2786 3 жыл бұрын

That is the job of the OS and micro architecture.

@user-hk3ej4hk7m 3 жыл бұрын

It's not portable to do so, different architectures have different cache models.

@theIpatix 5 жыл бұрын

28:51 Isn't that supposed to be an xmm register rather than a mmx register?

@az09letters92 4 жыл бұрын

I'd assume so. Using MMX has been counterproductive in last 15-20 years. Of course, if sizeof(long) == 4 (bytes), you could fit two of them in an MMX register. No idea why anyone would ever want to do that. To clarify, his sizeof(long) seems to be 8, which is *not* true on all x86-64 platforms.

@oraz. 9 ай бұрын

the message I get is "use someone else's lock free queue"

@rakesh4a1 6 жыл бұрын

Can anyone please help me with this. Why "struct a {long x};" is lock-free but not "struct b {long x, long y, long z};" ? Links or any other source would help. Thanks in advance.

@teekwick 6 жыл бұрын

He explained it himself starting at 28:00.

@dkierans 5 жыл бұрын

“Demons fly out your nose”. Ha! Excellent talk.

@thebellster 6 жыл бұрын

The example at the beginning of the talk is quite misleading. The mutex is only acquired once per thread, where the atomic int is incremented once per person in every thread! For a fair comparison, the atomic version should have a local accumulator like the mutex version does.

@jcsahnwaldt 6 жыл бұрын

Fedor Pikus deliberately compared a slow algorithm that uses an atomic with a fast algorithm that uses a lock to prove the point he makes at 03:22 (slide 7): "Algorithms reign supreme". In other words, we should first make sure that we use good algorithms before we fiddle with details like locks versus atomics. I think his point was to remind his audience that everything they learn in this class is less important than good algorithms. Maybe he should have made that clearer. He addresses a few more details in his answer to the question at 01:06:55.

@JiveDadson 6 жыл бұрын

The instant I saw the slide, I blurted out, "That''s stoopid."

@JiveDadson 6 жыл бұрын

Not a great idea to encourage people to raise their hands and then tell the audience that everyone who raised his hand got it wrong.. Grrrrrr.

@homomorphic 6 жыл бұрын

Why not? That's a great way of letting people know that their current mental model is incorrect.

@timothymusson5040 5 жыл бұрын

JiveDadson the problem with these types of questions is the either/or presentation. He should have asked: did you expect this? 1) Yes; 2) No; 3) I don’t have enough information to have an informed opinion.

@daleowens7695 3 жыл бұрын

9:13, could someone explain the difference between: #include std::atomic x(0); //vs std::atomic x=0; presumably, the behavior would not be the same otherwise he probably wouldn't have called attention to it. The top one is calling the default(i think) constructor and the other calling the copy(i think) constructor? How would the result in different behavior? Is it the implicit cast of 0 to atomic? It's such a minor difference and C++ is filled with these sorts of quirks. It's almost impossible to write "good" code because there so many of these land mines. In any case, could someone clarify this situation a bit, please?

@FalyPr1nc3 3 жыл бұрын

Yes, the second one is about copy constructor. This is explained quite well here: stackoverflow.com/questions/15249998/why-are-stdatomic-objects-not-copyable

@mertmalazalumni1815 3 жыл бұрын

Hey, You can check the answer to your questions here. stackoverflow.com/questions/43939343/in-class-initialization-of-atomic

@BGFutureBG 3 жыл бұрын

Honestly, syntactically, I don't quite get behind that one. The two lines are both initialization (not assignment!) and should both call the very same copy ctor. Maybe this is something compiler specific (I'm on MSVC), but both versions compile just fine for me and there are no runtime issues whatsoever.

@andreaszetterstrom7418 3 жыл бұрын

The way I interpreted it is that atomic simply doesn't have a copy constructor or assignment operator. And when I looked at the source that confirmed my suspicion. The reason for this I think is to prevent users from making the mistaken assumption that if you had two atomic variables x and y and you wrote x = y you'd think that was an atomic operation, which it wouldn't be, so they simply removed that option altogether.

@NolePTR 6 жыл бұрын

Can't SSE and AVX extensions be used to increase the atomic lock free size?

@intvnut 6 жыл бұрын

It sounds like Intel doesn't want to guarantee it: software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/536979

@fwang3695 9 күн бұрын

future lecture should just invite some CPU architect to explain

@m0rphiumsuchtig 4 жыл бұрын

Why "aquire and release barrier" instead of "sync caches"?

@deepanarasut9471 2 жыл бұрын

The gripe about the c++ standard is wrong. (56:16) reason: constructors and destructors execute in isolation. Of course, unless you call the destructor directly.

@patchouli000 3 ай бұрын

"Welcome to the defense of dark art class" 🤣

@magno5157 2 жыл бұрын

The expression "become visible" is just awful. What Pikus meant by "becoming visible" after an acquire barrier is simply that operations that appear after the barrier in the program are guaranteed to execute after the barrier. Likewise, "becoming visible" before a release barrier means operations before the barrier in the program are guaranteed to have been executed by the time the barrier is reached. A little caveat is that for the acquire barrier, operations before the barrier can be moved to execute after the barrier, but operations after the barrier in the program cannot be moved to execute before the barrier is reached. The opposite applies to release barrier.

@azadalishah2966 3 жыл бұрын

🤔 At 10:00: My understanding is all threads(cores) share same caches. They don’t have different caches, then what’s mean by communication between caches?

@ryanobrady569 3 жыл бұрын

As I understand all cores usually share the L3 cache, the L2 is usually a smaller group and the L1 is usually per core. In zen the L1 and L2 are shared by one core (2 threads) and the L3 is shared by all cores

@user-cw3nb8rc9e 2 жыл бұрын

Why the designers of C++ did not made it the way, that each new thread makes a unique copy of all variables in the program, so then such problems never arise?