No video

Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023

  Рет қаралды 17,876

CppCon

CppCon

Күн бұрын

cppcon.org/
---
Single Producer Single Consumer Lock-free FIFO From the Ground Up - Charles Frasch - CppCon 2023
github.com/Cpp...
In this session we will construct a Single Producer Single Consumer Lock-free Wait-free Fifo from the ground up. But why write our own if we can get one from reliable sources such as Boost.Lockfree? There are a couple of answers to this question.
* Writing such a fifo is a fairly gentle introduction to lock free programming.
* There are some interesting performance optimizations that can be made.
* There may be some specific requirements that are not met in out-of-the box implementations.
In the presentation we will first develop a simple circular fifo. Next we will make it thread-safe as well as lock-free and wait-free. After that we will address issues associated with cache coherency and false sharing in particular. We will show a few additional optimizations that can be added as needed to meet specific requirements. Finally, the performance of our fifo will be compared with a few out of the box implementations. Along the way we will touch on subjects such as thread safety, data-races, false sharing, object lifetime, and relaxed atomics.
---
Charles Frasch
Charles Frasch is a Senior Core Developer with the IEX Group where he is working to re-platform their core trading infrastructure in modern c++. Charles has worked in the financial technology world for more than twenty years in areas such as high-frequency trading and low-latency, high-performance infrastructure.
__
Videos Filmed & Edited by Bash Films: www.BashFilms.com
KZbin Channel Managed by Digital Medium Ltd: events.digital...
---
Registration for CppCon: cppcon.org/reg...
#cppcon #cppprogramming #cpp

Пікірлер: 42
@anon_y_mousse
@anon_y_mousse 5 ай бұрын
I've probably given this tip to more people than any other, but if you're optimizing around an array of any variety, use a size that's a power of two. If you need to constrain an index into such an array, a `bitwise and` is the fastest way to do so. Never use division/modulus and oddly sized arrays, even when setting up the backing store for a hash table. Prime number sizes may yield better distribution, but it's always at the expense of performance. Also, with regards to hash tables, always store the unnormalized hash with the keys so that when you need to resize the table you don't need to query the hashing function again to rebuild the entire table.
@phusicus_404
@phusicus_404 5 ай бұрын
Are there benchmarks which show that power of 2 size is better?
@Dominik-K
@Dominik-K 5 ай бұрын
I very much agree with this advice. Especially for anything that is performance sensitive, bitwise and is a go-to means to restrict value ranges.
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
While I completely agree regarding power of 2 + bitwise and on one minus the size, I would also not be surprised if the hardware didn't recognize this optimization. In my day job I constrain the size to be a power of 2.
@MenaceInc
@MenaceInc 5 ай бұрын
Seems like he was going to say the same thing at 8:30 before he lost his train of thought
@MenaceInc
@MenaceInc 5 ай бұрын
Ah, he gets it out by 9:50
@fabiogaluppo2635
@fabiogaluppo2635 5 ай бұрын
Great talk. We need more of this type of content in C++ talks!
@natedoggstyle
@natedoggstyle 5 ай бұрын
One of the best explanations for implementing a lock-free SPSC I've come across. Kudos!
@amaama4140
@amaama4140 5 ай бұрын
Wow, what a marvelous presentation! I enjoyed every single second of it. I have always wanted some similar learning material to start with lock-free programming, and this one was just the right one. Thank you very much, Mr. Frasch.
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
You are welcome. The references in the slides are also very informative.
@user-mc9wh3df5v
@user-mc9wh3df5v 9 ай бұрын
I completely misunderstood Roi's question about 1:03. Yes, I agree that if you grab multiple pushers or poppers in the same block they will be destructed in the reverse order in which they were acquired. And, if you do that you must carefully consider that fact. The pusher and popper destructors just advance the cursor. So, it is possible that at the end of the block the cursors are correct. But, my real answer is, if you need to read or write multiple values from and to the FIFO you should extend the API to add functions that do that correctly.
@szirsp
@szirsp 5 ай бұрын
Well, yeah, they just advance the cursor... But because of that only the cursor values will be correct at the end! The FIFO won't be! Because when you are getting multiple pushers in a row they are getting a reference to the same memory, and when one lifetime ends only the cursor position is changes the still existing pushers won't be, so they will overwrite the previous item and skip one item when they go out of scope. Instead of advancing the cursor implicitly in the destructor I would use a commit() method. That way you can get a pusher, commit and then get another pusher in the same scope. I'm not sure if I would actually implement pusher and popper at all, just use some kind of peek methods to get direct access the write and read location and use commit and free or similarly named methods to advance the write and read head.
@youtou252
@youtou252 5 ай бұрын
how is your comment 3 months old when the video was published 2 days ago ?
@Jeanpierrec19
@Jeanpierrec19 5 ай бұрын
Wonderful talk. Have done a similar implementation but never needed the extra performance over fifo2 so never looked into how it worked. Your talk explained everything very well. Thank you.
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
@@youtou252 I'm the presenter; they allowed me to preview the video.
@DedmenMiller
@DedmenMiller 5 ай бұрын
41:00 I had that issue too. My solution was to not store objects in the queue, but just store pointers. the new/delete are done outside. I used another fifo of "Available buffers", pre-allocated pointers that can be reused. Instead of new grab one from it, instead of delete push it back to be reused. That way I get rid of the push/pop memcpy's, and ALSO of runtime allocations. Problem is just that you have to handle when the push fails and the queue is full, you need to either throw the pointer back into the available buffers, or hold onto it until it can be pushed.
@starchsky
@starchsky 5 ай бұрын
So many good things explained so well in one go. Bravo.
@yb9737
@yb9737 5 ай бұрын
Beautiful talk
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
Thank you.
@zmweb4828
@zmweb4828 Ай бұрын
I’m glad I’m not the only one struggling to understand memory ordering 😂 To understand this talk fully, I read chapter 5 from the book “C++ Concurrency in Action” (2nd edition), which goes into depth on the topic of memory ordering. Had to read it twice to understand everything, but it was worth it. Thank you for this excellent talk!
@ferinzz
@ferinzz Ай бұрын
Me a noob thinking I was optimizing by messing with the std lib and working around those. This talk expanding my understanding of how you can truly optimize a system at a deep level.
@dian-lunlin7349
@dian-lunlin7349 5 ай бұрын
Great talk! I enjoyed it a lot.
@LaserFur
@LaserFur 5 ай бұрын
One way to handle a pointer request when near the end of the FiFo wrap around point is to allocate extra space past the main circular buffer. Then the get pointer function can in the rare cases where the buffer wraps around to do a copy into this extra space so that the caller has a linear buffer. and then the release can copy data back into the FiFo buffer. The only problem is when the FiFo can auto expand the buffer. In that case a mutex is needed and all callers can't be holding pointers. Luckily in my case I don't need lock free and the code just grabs a mutex for any operation and also keeps the mutex held between the getting of the pointer and the release of the pointer. I also use RAII in that my get pointer returns a class to release the pointer and the mutex.
@szirsp
@szirsp 5 ай бұрын
It's not clear to me what problem you are describing or trying to solve. But your proposal doesn't seem like a sound solution to me.
@user-bk4ux9rv3t
@user-bk4ux9rv3t Ай бұрын
Great talk! Just a warning though, it is not a working FIFO, once pushCursor will get to the end of max size_t, it will start overwriting the buffer at [0] and on wards.
@aioia3885
@aioia3885 21 күн бұрын
it's assumed that std::size_t is 64 bits I think, therefore it is not realistic for it to reach the maximum value
@ksvishvajith
@ksvishvajith 2 ай бұрын
That gentleman looks so much like the cousin of Bjourne, cool presentation
@tulipshaar5361
@tulipshaar5361 Ай бұрын
I am sure I have missed it but in bool Fifo5::push(T const& value) , how does *pusher = value; end up invoking the pusher::get() when no access to the underlying Fifo5 object has been provided (operator*()) to allow access
@szirsp
@szirsp 5 ай бұрын
12:07 While this would work for most, this is not correct [when capacity is not 2's power]. It relies on size_t not overflowing. (Which is a safe bet when it's 64 bit since it would take hundreds of years, but it is NOT when it's 32 bit or less, for example when running on a 32 bit or 8 bit microcontroller, where it only takes minutes to overflow.) To demonstrate the problem create a FIFO with 10 capacity and set the cursors to 0xfffffffaUL (or push and pop that many times), then push 10 items into the FIFO (compiled to a 32 bit platform). The write index will be 0,1,2,3,4,5,0,1,2,3. Meaning even though the FIFO was initially empty and had enough space to hold 10 items, it overwrites 4 of those items... and it will try to destruct some items twice. Also I usually don't like using modulo operation for indexing. Because capacity is not a compile time constant modulo division won't be optimized which could be a problem (mostly on processors without hardware division unit) (even when capacity is 2's power). I'd rather limit the cursor to valid index range / capacity to make sure it at all times contains actual valid array index (which makes debugging easier). (of course you would need to differentiate between full and empty in another way) I have written a number of lock free FIFOs for microcontrollers over the decades, so it seemed trivial to me that this is possible (even though the implementation might not be trivial, and may include memory barriers) I also realize that it might not be obvious for "modern"/young developers. I still wondered what could be said about them to fill out a 1 hour long presentation... So I still watched it (on 2x speed) to see if he ever addresses this overflow issue, and the talk did contain some interesting pitfalls. This shows that multi-threading and modern processors are complicated, so nearly everyone should use standard libraries instead of rolling their own FIFO.
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
That's right. The presentation serves its purpose as is. It would be far too complicated to to deal with ABA issues; perhaps I should have mentioned that.
@user-ms2if2un7f
@user-ms2if2un7f 5 ай бұрын
slide 28, in push function, shouldn't we load popCursor with acquire first, and then load pushCursor with relaxed, correct me if I misunderstand this.
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
From a correctness point of view the load order shouldn't make any difference. In the push function and while the cursors are being loaded the fifo can only become emptier. If the cursor comparison shows full but the reader thread popped something off, the function will return full and noop. The next time through a slot will be obtained and the push will succeed. The full condition is already the slow path. From a performance point of view it might be a bit more efficient to start the load acquire first since it is a more expensive operation than load relaxed. OTOH, the processor's instruction scheduler show run the loads concurrently. Regardless, the comparison cannot proceed until both values are available.
@Justin_Wang
@Justin_Wang 5 ай бұрын
Great talk. I wonder how hardware works and how often do fifo1 with out atomic get error, but after 10,000 times of test in release mode, it still work correctly, do any one have any hint of it?
@Justin_Wang
@Justin_Wang 5 ай бұрын
I find out that x86 is a strongly-ordered system; everything is release-acquire by default. So, whether Fiio1 will pass the test or not is decided by how the program is compiled, not at runtime. Am I right about it?
@rocknroooollllll
@rocknroooollllll 5 ай бұрын
Don't we have to compare/exchange in order to get thread safety here?
@user-mc9wh3df5v
@user-mc9wh3df5v 5 ай бұрын
You can certainly use atomic compare and exchange. It is a more complicated and expensive operation that the atomic load and stores I use.
My Cheetos🍕PIZZA #cooking #shorts
00:43
BANKII
Рет қаралды 26 МЛН
Кадр сыртындағы қызықтар | Келінжан
00:16
Lehanga 🤣 #comedy #funny
00:31
Micky Makeover
Рет қаралды 29 МЛН
Databases are the endgame for data-oriented design
20:31
SpacetimeDB
Рет қаралды 1,6 М.
How This New Battery is Changing the Game
12:07
Undecided with Matt Ferrell
Рет қаралды 34 М.
My Cheetos🍕PIZZA #cooking #shorts
00:43
BANKII
Рет қаралды 26 МЛН