37C3 - What is this? A machine learning model for ants?

  Рет қаралды 3,198

media.ccc.de

media.ccc.de

3 ай бұрын

media.ccc.de/v/37c3-11844-wha...
How to shrink deep learning models, and why you would want to.
This talk will give a brief introduction of deep learning models and the energy they consume for training and inference. We then discuss what methods currently exist for handling their complexity, and how neural network parameter counts could grow by orders of magnitude, despite the end of Moore's law.
Declared dead numerous times, the hype around deep learning is bigger than ever. With Large Language Models and Diffusion Models becoming a commodity, we ask the question of how bad their energy consumption really is, what we can do about it, and how it is possible to run cutting-edge language models on off-the-shelf GPUs.
We will look at the various ways that people have come up with to rein in the hunger for resources of deep learning models, and why we still struggle to keep up with the demands of modern neural network model architectures. From low-bitwidth integer representation, through pruning of redundant connections and using a large network to teach a small one, all the way to quickly adapting existing models using low-rank adaptation.
This talk aims to give the audience an estimation of the amount of energy modern machine learning models consume to allow for more informed decisions around their usage and regulations. In the second part, we discuss the most common techniques used for running modern architectures on commodity hardware, outside of data centers. Hopefully, deeper insights into these methods will help improve experimentation with and access to deep learning models.
etrommer
events.ccc.de/congress/2023/h...
#37c3 #SustainabilityClimateJustice

Пікірлер: 9
@eldoprano
@eldoprano 3 ай бұрын
Love the Sakamoto at 15:57. A nice detail when talking about MoE
@hackjealousy
@hackjealousy 3 ай бұрын
Excellent title.
@moccamixer
@moccamixer 3 ай бұрын
😂 i wonder how many got it 🤣
@keyworksurfer
@keyworksurfer 3 ай бұрын
@@moccamixerliterally everyone, it's an insanely old and mainstream reference
@jadeaffenjaeger6361
@jadeaffenjaeger6361 Ай бұрын
@@keyworksurfer everyone of a certain age... Not entirely sure how much sense it makes for people under 25.
@mackcross7054
@mackcross7054 Ай бұрын
*Promo SM*
@stuartwilson4960
@stuartwilson4960 2 ай бұрын
This is inaccurate, if a company distributes training weights, they are giving away their training model. Inference is much the same thing as training, the only difference is there is no comparison for expected output, and backpropagation. If you have an LLM inference model, you have an LLM training model.
@Eunakria
@Eunakria 2 ай бұрын
I think they're referring to companies only distributing quantized/pruned weights and keeping the original weights used for training private. it's not to say that you can't train off the quantized/pruned weights, just dialogue about the computational feasibility of either. and there's also something to be said about oversized models being easier to train
@jadeaffenjaeger6361
@jadeaffenjaeger6361 Ай бұрын
The concern is that you typically lack the recipe to reproduce the training weights (practical considerations like required compute aside). So the weights are somewhat analogous to a compiled binary, rather than the actual source code for a program. It's a whole lot better than nothing, but means that significant portions of the training process of the foundational model (and, by extension, everything that is derived from it) are opaque to the public. I hope this clarifies the intent of the remark a little bit. (I'm the speaker)
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 1,8 МЛН
How I'd Learn AI (If I Had to Start Over)
15:04
Thu Vu data analytics
Рет қаралды 646 М.
路飞关冰箱怎么关不上#海贼王 #路飞
00:12
路飞与唐舞桐
Рет қаралды 6 МЛН
ОДИН ДОМА #shorts
00:34
Паша Осадчий
Рет қаралды 6 МЛН
How To Choose Ramen Date Night 🍜
00:58
Jojo Sim
Рет қаралды 52 МЛН
What's the Difference Between AI, Machine Learning, and Deep Learning?
4:07
Machine Learning 101
Рет қаралды 94 М.
37C3 -  Tractors, Rockets and the Internet in Belarus
43:05
media.ccc.de
Рет қаралды 8 М.
Building RAG at 5 different levels
24:25
Jake Batsuuri
Рет қаралды 8 М.
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 222 М.
What are Transformers (Machine Learning Model)?
5:50
IBM Technology
Рет қаралды 337 М.
Real-world exploits and mitigations in LLM applications (37c3)
42:35
Embrace The Red
Рет қаралды 20 М.
37C3 -  Self-cannibalizing AI
53:37
media.ccc.de
Рет қаралды 7 М.
Lea Rain: Nach 2038 kommt 1901 und weitere Software-Kuriositäten
28:38
A Practical Introduction to Large Language Models (LLMs)
14:57
Shaw Talebi
Рет қаралды 32 М.
Introduction to large language models
15:46
Google Cloud Tech
Рет қаралды 645 М.
路飞关冰箱怎么关不上#海贼王 #路飞
00:12
路飞与唐舞桐
Рет қаралды 6 МЛН