Showing benchmarks for SC *loads* is somewhat misleading as all of the cost for sequential consistency is in SC *stores* (especially on x86). (with the normal mapping anyway, the alternative mapping with expensive SC loads and cheap stores is theoretically possible)