HOTI - Hot Interconnects Symposium

HOTI - Hot Interconnects Symposium

Hot Interconnects is the premier international forum for researchers and developers of state-of-the-art hardware and software architectures and implementations for interconnection networks of all scales, ranging from multi-core on-chip interconnects to those within systems, clusters, data centers, and clouds. This yearly conference is attended by leaders in industry and academia. The atmosphere provides for a wealth of opportunities to interact with individuals at the forefront of this field.

Themes include cross-cutting issues spanning computer systems, networking technologies, and communication protocols for high-performance interconnection networks. This conference is directed particularly at new and exciting technology and product innovations in these areas.

Day 1 15:30: The Head Bubba Memorial and HOTI Closing Remarks

3:11

Day 1 15:30: The Head Bubba Memorial and HOTI Closing Remarks

Күн бұрын

Day 1 13:00: Keynote: Connectivity for AI Everywhere: The Role of Chiplets

1:00:04

Day 1 13:00: Keynote: Connectivity for AI Everywhere: The Role of Chiplets

Күн бұрын

Day 1 11:25: Quality-of-Service Provision for BXI3-based Interconnection Networks

19:22

Day 1 11:25: Quality-of-Service Provision for BXI3-based Interconnection Networks

Күн бұрын

Day 1 11:25: A new Mechanism to Identify Congesting Packets in HP Interconnection Networks

29:48

Day 1 11:25: A new Mechanism to Identify Congesting Packets in HP Interconnection Networks

Күн бұрын

Day 1 11:00: Platinum Sponsor Talk: Universal Memory Interface (UMI)

14:43

Day 1 11:00: Platinum Sponsor Talk: Universal Memory Interface (UMI)

Күн бұрын

Day 1 10:00: Keynote: Keeping: Pace Micro-Models, Large Language Models, and the Co-Evolution ...

1:02:00

Day 1 10:00: Keynote: Keeping: Pace Micro-Models, Large Language Models, and the Co-Evolution ...

Күн бұрын

Day 1 09:00: Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Params

32:14

Day 1 09:00: Rail-only: A Low-Cost High-Performance Network for Training LLMs with Trillion Params

Күн бұрын

Day 1 09:00: Characterizing Communication in Distributed Parameter-Efficient-Fine-Tuning for LLMs

28:44

Day 1 09:00: Characterizing Communication in Distributed Parameter-Efficient-Fine-Tuning for LLMs

Күн бұрын

Day 1 08:50: Introduction and Welcome

5:26

Day 1 08:50: Introduction and Welcome

Күн бұрын

Day 2 15:15: Closing Remarks

10:55

Day 2 15:15: Closing Remarks

Күн бұрын

Day 2 14:50: Invited Talk: Ultra Ethernet Consortium (UEC) overview

34:39

Day 2 14:50: Invited Talk: Ultra Ethernet Consortium (UEC) overview

Күн бұрын

Day 2 14:20: Invited Talk: Chip-let Interconnect Test and Repair

36:02

Day 2 14:20: Invited Talk: Chip-let Interconnect Test and Repair

Күн бұрын

Day 2 14:00 - Nutanix

4:45

Day 2 14:00 - Nutanix

Күн бұрын

Day 2 14:00 - Lenovo

2:16

Day 2 14:00 - Lenovo

Күн бұрын

Day 2 13:00: OHIO: Improving RDMA Network Scalability in MPI_Alltoall

26:24

Day 2 13:00: OHIO: Improving RDMA Network Scalability in MPI_Alltoall

Күн бұрын

Day 2 13:00: Demystifying the Communication Characteristics for Distributed Transformer Models

33:39

Day 2 13:00: Demystifying the Communication Characteristics for Distributed Transformer Models

Күн бұрын

Day 2 11:30: Invited Talk: Can Interconnects Keep up with AI? YES.

29:18

Day 2 11:30: Invited Talk: Can Interconnects Keep up with AI? YES.

Күн бұрын

Day 2 11:00: GigaIO

15:35

Day 2 11:00: GigaIO

Күн бұрын

Day 2 10:00: Keynote: Powering Llama 3: Peek into Meta’s Massive Infrastructure for Generative AI

1:01:55

Day 2 10:00: Keynote: Powering Llama 3: Peek into Meta’s Massive Infrastructure for Generative AI

Күн бұрын

Day 2 09:10: Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU

32:34

Day 2 09:10: Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU

Күн бұрын

Day 2 09:10: Towards a Standardized Representation for Deep Learning Collective Algorithms

19:23

Day 2 09:10: Towards a Standardized Representation for Deep Learning Collective Algorithms

Күн бұрын

Day 2 09:00: Welcome

5:38

Day 2 09:00: Welcome

Күн бұрын

Day 3 10:00: OFI Libfabric, APIs for MPI & CCL

1:33:55

Day 3 10:00: OFI Libfabric, APIs for MPI & CCL

Күн бұрын

Day 3 08:00 Linear Electro-Optical Interface

1:51:48

Day 3 08:00 Linear Electro-Optical Interface

Күн бұрын

Day 3 15:00: ASTRA-sim and Chakra: Co-design Exploration for Distributed Machine Learning Platforms

1:57:35

Day 3 15:00: ASTRA-sim and Chakra: Co-design Exploration for Distributed Machine Learning Platforms

Күн бұрын

Day 3 12:30 - Chip-let Interconnect Test and Repair

2:02:50

Day 3 12:30 - Chip-let Interconnect Test and Repair

Күн бұрын

Day 3 14:00 - Principles and Practice of Scalable and Distributed Deep Neural Networks

2:44:23

Day 3 14:00 - Principles and Practice of Scalable and Distributed Deep Neural Networks

Күн бұрын

Day 3 11:00 - Leveraging SmartNICs for HPC and Data Center Applications

2:30:00

Day 3 11:00 - Leveraging SmartNICs for HPC and Data Center Applications

Күн бұрын

Day 3 08:00: High-Performance and Smart Networking Technologies for HPC and AI

2:28:50

Day 3 08:00: High-Performance and Smart Networking Technologies for HPC and AI

Күн бұрын

Пікірлер

@JeffCole-x4c 6 күн бұрын

Hernandez Thomas Jones Ruth Anderson George

@angelinajeffree7768

@angelinajeffree7768 8 күн бұрын

Hernandez Deborah Hall Larry Smith Nancy

@黃奕鈞-p2q 10 күн бұрын

Hello, are the PDF files of the handouts available? Thank you

@souravzzz 11 күн бұрын

Excellent work! Very insightful.

@MahmutAyabakan

@MahmutAyabakan 12 күн бұрын

Lopez Barbara White Timothy Martinez Margaret

@MahmutAyabakan

@MahmutAyabakan 12 күн бұрын

Johnson Jeffrey Brown Margaret Lee Donna

@LukeRosas 14 күн бұрын

Walker Cynthia Walker Sandra Young Ronald

@NerbuniArmika 15 күн бұрын

Rodriguez Ruth Perez Paul Harris Christopher

@ZhengZhou-n8o Ай бұрын

This video is fantastic for those who wants to know the 101 of HPC network, key concepts are explained very well. The great comparision also help a lot in forming an overview of the HPC network. Thanks to prof Panda and prof Subramoni

@mIbrahim1981 7 ай бұрын

I didn't find slides on the mentioned path .. Appreciate if you can share right link here

@mIbrahim1981 7 ай бұрын

Wonderfull explanation for very interesting topics ... Really Great thanks for this lecture

@thoughtbox 7 ай бұрын

It was mentioned that DGX has an 8:1 BW taper, but is not correct. Each DGX has 72 NVLink Network ports. I only mention it for clarification.

@jebtang 2 ай бұрын

DGX GH200, each DGX Chassis will have 36 OSFP 400Gb port, which belong to 72 NVlink

@brandydogish 11 ай бұрын

From an ASIC SOC technical writer's perspective and as the producer of the Semiconductor Subject Matter Expert Database, your presentation is excellent. Ideal reference material for writing about silicon photonics. It was a good idea to introduce the photonic building blocks, fiber optics packaging fundamentals and associated technical issues. As you say, many in the ASIC domain are new to photonics. Lightmatter's approach, that is replace the 2D arrays of multiply-accumulate units used in present day GPU based natural machine language AI Processors with an optical equivalent, i.e. programmable Mach-Zehnder interferometer or photonic vector matrix multiplication units, doubles down on the benefits of near-zero latency photonic IO ports. I have been researching heterogeneous ASIC solutions with silicon photonic waveguide interconnect as an alternative to wire interconnect and as a way to get around silicon IO latency, power and delay obstacles. This has brought me to other photonic interconnection alternatives like microLEDs, MEMS mirrors, LiFi, nanowire LEDs, photodetector arrays, and lasers.

@levuong8077 11 ай бұрын

Thanks for sharing!

@kbgexplores 11 ай бұрын

great presentation, very insightful and learned a lot, thanks

@xbsong8409 Жыл бұрын

how can i find the slack channel, thanks.

@TaekyungHeo Жыл бұрын

32:42 Impact of RoCE Congestion Control Policies on Distributed Training of DNNs (Tarannum Khan, Saeed Rashidi, Srinivas Sridharan, Pallavi Shurpali, Aditya Akella and Tushar Krishna)

@jn6038 Жыл бұрын

Could you please share the source codes, I am really looking forward to study libfabrics. Thanks.

@nathanieljefferson3951

@nathanieljefferson3951 2 жыл бұрын

😥 ｐｒｏｍｏｓｍ

@grasswater001 2 жыл бұрын

in the middle, the vidio is broken while the audio is still there. Pls check and fix it, thanks

@logiclogic8526

@logiclogic8526 2 жыл бұрын

0:02 The HPE Cassini NIC # Keith Underwood, HPE 35:39 GPU Scaling with Intel oneAPI Level Zero Peer-to-Peer Solution # Jaime Arteaga and Ravindra Babu Ganapathi - Intel 1:02:31 Bunch of Wires: An Open and Versatile PHY Standard for Die-to-Die Interconnects # Elad Alon, Shahab Ardalan, Boris Murmann, Bapi Vinnakota and Venkata Satya Rao 1:23:04 Synchronous and Low-Latency Die-to-Die Interface for the IBM z16™ Telum Processor # Chad Maquart, IBM

@Mr_ST_720 2 жыл бұрын

Big fan sir . One more question can there be possibility of having traffic steering approach on cxl switch..like SDN at interconnect cpu to devices level..

@Mr_ST_720 2 жыл бұрын

Sir do you see that the memory devices will get detached n will be made seperate memory boxes like storage and routers n cxl at interconnect along with rdma/nvme at network would be combinedly used to build this future cloud infrastructure?

@scottschweitzer

@scottschweitzer 3 жыл бұрын

I moderated this panel. A full transcript is below in the comments. I was thrilled we had the following speakers: Cary Ussery (Marvell) - prefers the term DPU, Data Processing Unit Mario Baldi (Pensando) - likes to use DSC, Distributed Services Card Michael Kagan (NVIDIA) - they may have coined DPU Jim Dworkin (Intel) - IPU, Infrastructure Processing Unit Nick Ilyadis (Achronix) - SmartNIC Rip Sohan (Xilinx) - SmartNIC Below, in a series of comments is a full transcript of the panel.

@scottschweitzer

@scottschweitzer 3 жыл бұрын

Here are the timestamps and question summary, followed by a summary of answers: 15:10 On queueing packets into accelerator memory: Cary - Hybrid approach of on-chip vs. off-chip. Data paths are being built with batch packet processing in mind. Jim - Temporal locality of the packets and flows to compute engine, especially during initial flow setup. Mario - SmartNICs need to work at line rate; they should not queue packets, so most queueing is on the chip. Queueing may be more critical when working on packets at the messaging level, and then using off-chip memory is appropriate. Nick - Coming from an FPGA point of view, supporting high-speed external memory is critical for flexibility. Michael - Queues shouldn’t be kept on a chip; use host memory, particularly on entry-level NICs. You need to design your NIC, so it is balanced. The attached memory, DDR, is used on DPU to retain context. On-chip, there should be caches that are as small as possible. Rip - Keep expensive on-chip memory as small as possible. There is a case for slower and larger capacity memory; most systems will leverage tiered memory, where the host memory is the final tier. You’re only as good as the primitives you have for sorting, managing, and moving data.

@scottschweitzer

@scottschweitzer 3 жыл бұрын

27:35 How should classification and action on packets be handled? Jim - In any CPU architecture, you have caches, L1-L3-DDR, and as a result, added latency as you work on data; this is not the case with FPGAs. Based on the temporal locality of the data to the compute, you’ll find that these SoC-based solutions take much longer to process packets. Mario - It should not be fixed logic, as requirements and needs change over time. Our approach is to use ASICs with software running on processor cores to manage packets; that way, things are reconfigurable. Nick - We want to provide the user with maximum flexibility, as we know that protocols keep evolving. The FPGA allows you to change the way the hardware is programmed. Michael - Most things should not be done in the CPU; the majority of the traffic should go through the fast path you’ve designed for. Once you have the experience, you can use the CPU, which is on the slow path, the exceptions, for when you need the flexibility to do something not found in the ASIC. It’s a design that’s balance; the art is using CPUs with ASIC blocks for the best performance for what is required. Rip - Your most precious resource is your CPU, so by the time data reaches the CPU, your accelerators should have done everything they could. That way, the CPU could just do the thing that it's supposed to do. Cary - You don’t want to do classification on a general-purpose CPU. It shouldn’t be hard-coded logic as too much is changing in the market, something that is both flexible and programmable, almost on a minute-by-minute basis. Microcode engines, not CPUs.

@scottschweitzer

@scottschweitzer 3 жыл бұрын

36:00 On-chip network bandwidth, what is your approach, and how is it adapting? Mario - The architecture has to be hierarchical; you cannot possibly have all components talking to all components. You’ll need some coherent caches, and the design is not trivial; you may have more than one network on the chip. On-chip bandwidth has a huge impact on the performance of the DPU. The key is how you put together the various components, how tightly you integrate things. Nick - Flexibility is very important when tying the pieces together. We enable our customers so they can configure their architectures the way they want. We strive to offer the most flexible interconnect to meet their needs. At Achronix, our 2D NoC (Network on Chip) is finely tuned for this purpose. Michael - On the SmartNIC, the way you set up the on-chip network is much more straightforward than FPGAs. The right caching and buffering structure is one of the key things to having a successful product. Rip - With a hierarchical structure, as clock speeds and volumes go up, you’ll replicate more of the same components. We buy into that approach. What we aim to do with our architecture is to move data around faster; it turns out that half the time is spent moving data around. When things need to take a step up, make sure the core logic provides for that. Our challenge is, can we scale the way people write applications so that those applications scale up with increased traffic. A 10 GbE application is completely different than a 100 GbE application. The basic structure is the same, it’s the optimizations that matter. It’s not just about the hardware itself, it's about the software, the way you structure the software will ultimately influence how the hardware is designed. Cary - There isn’t an on-chip network per se; there is a series of optimized sub-systems that are interconnected via a crossbar switch. Taking an SoC DPU approach means getting these interconnects optimized and done right is hard work. Jim - It’s our job to figure out how to handle the internal plumbing. The strongest vendors in the market will be those offering customers the most choice. Working with hyperscalers, we’ve worked hard to tune systems to their specific configurations. We harden certain functions, like crypto, with that hardened data path, we can further tune and balance the system.

@scottschweitzer

@scottschweitzer 3 жыл бұрын

46:05 Balancing what we can do versus what we should do in hard logic as opposed to soft logic or programming, what are three DPU functions do you feel should be in hard logic? Nick - Interface logic like Ethernet and PCIe as well as TCAM and a classifier. Michael - Classification, data processing, compression, DMA. Rip - Assuming PCIe and DMA are hard, with regard to applications packet classification (L3-L4), crypto, both storage, and networking functions, and basic packet processing functions like LRO, checksum insert, checksum verification, those three are must-haves. Cary - Crypto, IPSec, adding something different, work scheduling across multiple components on the chip. Jim - On-chip data movers, compression, RDMA for, say iWarp, RDMA, etc...

@scottschweitzer

@scottschweitzer 3 жыл бұрын

49:25 Now that we’re thinking beyond 100 or even 400 GbE, how do you intend to scale your architecture? Michael - 400 GbE is kind of history; we’re working on 800 GbE, you scale vertically and horizontally. Vertically, you do things that use to be programmable now in hardware. Concurrency and more sophisticated data handling. Rip - Increase bus width, increase line rates, take most common soft logic functions and move them to hard logic. Adding more cores to handle greater packet rates. Cary - Optimizing sub-networks and data movers, diversification of that network to achieve higher rates. Jim - Packaging technologies to keep solutions efficient and as low power to keep power manageable. Software is important to make it easier for customers to move from different deployment models, on-prem to cloud, etc. Having worked at other companies on this panel, I’m not sure if some of the solutions of other panelists will scale as we go to 400 and beyond. Mario - All the devices are programmable, along the lines of software are tools, a compiler can make a huge difference. Also, the architect and the way the components are integrated can make a huge difference. Nick - Our 2D NoC is very capable of 400G, scalable to 1.6 Tbps. So I believe having a high-speed network on the chip allows you to move data between parallel processing blocks to achieve these high data rates. It's not all about the speed of the interface but traffic management and shaping.

@kinnabrittne9945

@kinnabrittne9945 3 жыл бұрын

rt6vd vyn.fyi

@jennieinez2554

@jennieinez2554 3 жыл бұрын

qggir vyn.fyi