On the AMD subject and the IO Die layout causing extra memcopy, I figured this would mostly be solved since AMD moved the memory from the CCD to the IOD? Is your testing with the cpu configured as NPS=1 (where the memory of nvme could be foreign) or NPS=4 to keep the OSD process local to the memory and nvme? Also, if the path over the IF is the bottleneck you should see a performance difference between similar processors with different GMI bus widths, for example 9334 with GMI-wide and 9354 with GMI-narrow but both 32-cores, has this been the case?