Рет қаралды 3,858
Episode 82 of the Stanford MLSys Seminar Series!
Democratizing Foundation Models via k-bit Quantization
Speaker: Tim Dettmers
Abstract:
Foundation models are effective tools for many tasks but are challenging to finetune and inference due to their GPU memory requirements. Compressing foundation models with k-bit quantization makes them more accessible with minimal resources, but k-bit quantization can lead to degradation in model quality. In this lecture, I will talk about fundamental insights into how to compress foundation models with quantization while maintaining their predictive performance. We will learn about emergent outliers in large language models (LLMs) and how they affect performance during 8-bit quantization. We will learn how to do effective k-bit compression of pretrained large language models such that we maximize their density of predictive performance per bit. We will also talk about how to do efficient fine-tuning of quantized 4-bit LLMs (QLoRA) and how this helps to build state-of-the-art chatbots.
Bio:
Tim Dettmers is a graduating PhD student advised by Luke Zettlemoyer at the University of Washington in Seattle. He holds degrees in applied math and computer science and has a background in industrial automation. His primary research goal is to democratize foundation models by making them more efficient and accessible through quantization, sparsification, and building machine learning systems that use consumer-grade hardware. He is the creator of the bitsandbytes library. Tim runs a blog about deep learning, GPUs, and PhD life at timdettmers.com.
--
Stanford MLSys Seminar hosts: Simran Arora, Dan Fu
Twitter:
/ simran_s_arora
/ realdanfu
--
Check out our website for the schedule: mlsys.stanford.edu
Join our mailing list to get weekly updates: groups.google....
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford