Рет қаралды 28
In this session, Argonne National Laboratory postdoctoral appointee Krishna Teja Chitty-Venkata provides an overview of large language interference models, optimization techniques, memory management, and hardware parallelism methods. There is also a hands-on session where they run a sample of LLaMA-3-8B using HuggingFace and vLLM inference. This presentation is taken from the 2024 ALCF Hands-on HPC Workshop.