Since the logic die in the HBM stack uses different process node, doesn't it make more sense to move such ISA extensions into the memory controller in the processing chiplet (assuming HBM on an interposer package, because nothing else exists yet?). I don't see why data pin routing would be a limit in such an architecture? Further, my proposal toIntroduce atomics as memory instruction primitives would be more generally useful than (linear) maths based extensions. Rather frustratingly, people outside of data structure and database design remain (willfully/blissfully) unaware of the immense cost imposed by cache coherence in such complex architectures You or someone in your staff removed my last comment. I think such censoring sends the wrong message in any context, but most of all scientific discourse