LLaMA: Open and Efficient Foundation Language Models (Paper Explained)

  Рет қаралды 88,528

Yannic Kilcher

Yannic Kilcher

Күн бұрын

#ai #meta #languagemodel
LLaMA is a series of large language models from 7B to 65B parameters, trained by Meta AI. They train for longer on more data and show that something like gpt-3 can be outperformed by significantly smaller models when trained like this. Meta also releases the trained models to the research community.
OUTLINE:
0:00 - Introduction & Paper Overview
4:30 - Rant on Open-Sourcing
8:05 - Training Data
12:40 - Training Hyperparameters
14:50 - Architecture Modifications
17:10 - Optimizer
19:40 - Efficient Implementation
26:15 - Main Results
38:00 - Some more completions
40:00 - Conclusion
Paper: arxiv.org/abs/2302.13971
Website: / large-language-model-l...
Abstract:
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Authors: Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, Guillaume Lample
Links:
Homepage: ykilcher.com
Merch: ykilcher.com/merch
KZbin: / yannickilcher
Twitter: / ykilcher
Discord: ykilcher.com/discord
LinkedIn: / ykilcher
If you want to support me, the best thing to do is to share out the content :)
If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: www.subscribestar.com/yannick...
Patreon: / yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Пікірлер: 197
@YannicKilcher
@YannicKilcher Жыл бұрын
OUTLINE: 0:00 - Introduction & Paper Overview 4:30 - Rant on Open-Sourcing 8:05 - Training Data 12:40 - Training Hyperparameters 14:50 - Architecture Modifications 17:10 - Optimizer 19:40 - Efficient Implementation 26:15 - Main Results 38:00 - Some more completions 40:00 - Conclusion Paper: arxiv.org/abs/2302.13971 Website: ai.facebook.com/blog/large-language-model-llama-meta-ai/
@paulie-g
@paulie-g Жыл бұрын
It wasn't a rant. You are not alone. It absolutely boils my p-ss when those who stand/build on the shoulders of open-source giants then don't give back. Apple is a poster child for this, but there are many, many more, sadly. You are absolutely right to bring it up and hammer the point home.
@laxpwnage1
@laxpwnage1 8 ай бұрын
L
@te4st111
@te4st111 Жыл бұрын
Yannic found another open non-open project to rant about. 🤣😂
@sortsvane
@sortsvane Жыл бұрын
It's open on 4chan. WDYM 😉
@te4st111
@te4st111 Жыл бұрын
@@sortsvane where?
@fuehnix
@fuehnix Жыл бұрын
Woohoo! Llama is leaked lol
@PhoenixRebirthed
@PhoenixRebirthed Жыл бұрын
​​@@fuehnix I'm literally you, but newer
@vidbina
@vidbina Жыл бұрын
This what I'm here for. 🍿 Yannic doesn't hold back which makes these vids so good. Also, funny to see the larger model being sassy there.
@ChaiTimeDataScience
@ChaiTimeDataScience Жыл бұрын
You don't know how much you missed Yannic's paper reviews until he makes one after a while!
@halocemagnum8351
@halocemagnum8351 Жыл бұрын
YES ANOTHER PAPER REVIEW I AM SO HYPE. Please do an H3 (a paper about using state space models for long context learning) paper review as well! It gets SOTA on long context language tasks, but I have literally no idea how these state space models work and you’re literally such a good explainer. PLEASE!
@autingo6583
@autingo6583 Жыл бұрын
seconded!
@user-yz9uw3pd5t
@user-yz9uw3pd5t Жыл бұрын
me too
@barrettvelker198
@barrettvelker198 Жыл бұрын
Yes!
@m4ng4n
@m4ng4n Жыл бұрын
up up up
@Daniel-ih4zh
@Daniel-ih4zh Жыл бұрын
Hmm, dunno why state space models aren't equivalent to RNNs
@BinarySplit
@BinarySplit Жыл бұрын
@Yannic Kilcher The GLU part of SwiGLU is important because it's a 2-inputs -> 1-output activation function: output = Swish(input_a) * input_b. GLU-based functions are becoming popular and seem to train faster/achieve higher accuracy than non-GLU equivalents in most cases. My intuition is that it allows compact if-else-style logic on continuous variables, i.e. "output = input_a > threshold ? input_b : 0" where "output" can retain the full range of values that "input_b" supports. Non-GLU activation functions' summing of inputs basically forces continuous variables to be decomposed into multiple logits.
@as.5888
@as.5888 Жыл бұрын
Imagine if this same precedence we see in ML existed when C++ was being built. You could have easily made the same argument then as those are doing now. I wouldn’t be able to make this comment, watch this video, etc. without free and open source software. Just like how the Linux kernel is unbelievably secure as a DIRECT byproduct of being open source, I truly believe ML needs to go the same route. LLaMA made me very hopeful at first when I saw the PR from Yann Lecun. However, I am extremely disappointed to know that Meta determined I am not good enough to use these models for my own personal projects.
@pneumonoultramicroscopicsi4065
@pneumonoultramicroscopicsi4065 Жыл бұрын
For some reasons some people determined that AI need to be treated in an "ethical" manner unlike other tools that didn't really garner that much attention from the public. AI has been treated as a villain in novels and Hollywood for so long now we pay for that perception, also the new political landscape contributes to this, as most silicon Valley is left leaning, and the left has been pro censorship lately (justified or not) .
@as.5888
@as.5888 Жыл бұрын
I really like your point on us having to pay the price for the “villainous AI.” This point is very true. However, in my opinion, big tech is not as left leaning as it appears - it still functions under and heavily promotes capitalist ideas. The social justice aspect, in my opinion, is a marketing ploy for rich liberals in America to feel as though they are making meaningful civil rights changes without bearing the toll that comes with change. This can be done through marketing, as well as moderation which is always difficult to do at scale and can censor those who just get caught in the crossfire. If Big Tech was as left as the perception around it is, open source would be the true ethos. FOSS is pretty communist when you think about it :) Thanks for your insights
@anmolt3840051
@anmolt3840051 Жыл бұрын
@@as.5888 "The most powerful and richest corporations on the planet are owned and run by socialists" - what kind of fool would believe this?
@someoneelse3456
@someoneelse3456 Жыл бұрын
Thank you for using your platform to make that point. These corporations are using open-sourced tools to build the models they then gatekeep from the very community that built them those same tools. If we're lucky they give us a paper roughly describing how they did it...
@Rhannmah
@Rhannmah Жыл бұрын
Yep, just imagine if Tim Berners-Lee patented internet protocols instead of releasing them for everyone to use. None of... anything we currently have would be possible.
@JorgetePanete
@JorgetePanete Жыл бұрын
intelectual property should be illegal
@sebest2k
@sebest2k Жыл бұрын
PyTorch? React JS?
@carlosquinones7560
@carlosquinones7560 Жыл бұрын
Fascinating. I took a look at the paper myself and I used ChatGPT to make sense of it so this is what I've gathered from your video and ChatGPT: - LLaMA uses a model many times smaller than the other models, particularly GPT-3, and used many advanced techniques to optimize the architecture, potentially maximizing performance on the model itself. - LLaMA was trained on a larger dataset than usual, but ONLY on publicly available data, as opposed to other models. - LLaMA has proven to generalize better than the rest of the models on most tasks, particularly the 65B model, which is demonstrated on its particular performance on 0-shot learning. Very impressive! However, it does fall behind PaLM on the MMLU task, although PaLM was fine-tuned towards such tasks. All-in-all, the biggest takeaway here is that META found away to improve performance on smaller models, which could pave the way for training models on smaller devices, possibly allowing a user to train their own model in the comfort of their own home. But we're still a moonshot away from that, since the power consumption on LLaMA was ridiculous.
@johnnypeck
@johnnypeck Жыл бұрын
It's always fun and informative to get your take on what is happening. Thanks for keeping it up.
@CyberWorx
@CyberWorx Ай бұрын
Without you I would struggle to understand these papers. Thank you
@brandondenis8695
@brandondenis8695 Жыл бұрын
I've always hated how data scientists are supposed to be good at statistics, but yet pretty much all of the papers don't publish the uncertainties in their measurements. Training multiple models aside (which isn't really feasible without great cost with current models) they can at least publish the standard deviations of their results.
@BerntGranbacke
@BerntGranbacke Жыл бұрын
Great reasoning around 24min (+-2), your comments on your own reasoning, and correction of yourself, and reasoning there made me understand better. Great work!!!
@changtimwu
@changtimwu Жыл бұрын
thanks for highlighting the difference between non-commercial use and opensource. Most people shared the news of LLaMA and thought LLaMA is not only open and but also more powerful than ChatGPT.
@fuehnix
@fuehnix Жыл бұрын
Your video is trending so much, I found it in my Google news feed! Subscribed, great video :)
@cogoid
@cogoid Жыл бұрын
13:38 It would be interesting to rescale these curves to show the loss for different size models as a function of actual training time (wall clock or the number of flops). Eyeballing, it seems that at the early stages of training the smaller models would yield lower loss for the same number of flops. *Edit:* I did replot it. The curves for the three smaller models are essentially identical up to 60K GPU-hours. After 60K, and expecially at 130K GPU-hours smaller models produce lower loss.The largest model falls out of the family and always displays higher loss for the same number of flops spent in training.
@Smytjf11
@Smytjf11 Жыл бұрын
Could you share the plot, like imgur or something? That would be interesting to see
@outliier
@outliier Жыл бұрын
@@Smytjf11 yes please
@Smytjf11
@Smytjf11 Жыл бұрын
@@outliier bets on whether OP is gonna deliver?
@wenhanzhou5826
@wenhanzhou5826 Жыл бұрын
Interesting, I found a similar phenomenon in my work where the loss curves seems to just be shifted for some given epochs. Although I haven't replot it against flops.
@cogoid
@cogoid Жыл бұрын
@@outliier I posted it on my channel in the community section.
@amoawesomeart6074
@amoawesomeart6074 Жыл бұрын
Hopefully the LLaMA model(s) will be actually shared like Stable Diffusion is and not like the GPT3 is.
@CoryTheSimmons
@CoryTheSimmons Жыл бұрын
Remember when OpenAI was called OPEN ai and Elon promised it'd be open? Like that was the whole point of it. sigh
@TheMinipasila
@TheMinipasila Жыл бұрын
you need to fill some form to hopefully get access to them so it's slightly better than gpt3 but still not actually open
@dm204375
@dm204375 Жыл бұрын
the model has been leaked so there's your answer
@amoawesomeart6074
@amoawesomeart6074 Жыл бұрын
@@dm204375 wew, that was quick, i thought it would take at least a week fro it to happen.
@m4ng4n
@m4ng4n Жыл бұрын
@@CoryTheSimmons it was then bought by microsoft IIRC no?
@user-ys2nd2bg6r
@user-ys2nd2bg6r Жыл бұрын
So much nicer to listen to Yannic explain a paper instead of reading it myself xD
@pratikkhedikar6759
@pratikkhedikar6759 6 ай бұрын
Yannic ...you are ssooo goooddd !! Reading research papers is just sooo dauntic. And even with use of PDF reader in GPT, its still very exhausting to understand each part and also you never get a comparative perspective. So watching your videos is always so informative and relaxing. Please keep doing this.
@JumpAndRunFun
@JumpAndRunFun Жыл бұрын
I've missed these so much 😭
@LaHoraMaker
@LaHoraMaker Жыл бұрын
I loved the rant about Open Source claims, or specifically how they claim to be open while not opening it :) Google Forms powered open source should not be labelled as open source at all!
@PotatoKaboom
@PotatoKaboom Жыл бұрын
Thank you so much Yannic!
@SagiPolaczek
@SagiPolaczek Жыл бұрын
Thank you for addressing this important point! open-source
@samsartorial
@samsartorial Жыл бұрын
I think checkpointing works the other way around. In traditional back propagation you have to retain all activations. With checkpointing you only maintain the most expensively computed activations (the "checkpoints" I guess). That means bits of the forward pass have to be redone during the backwards pass, trading more redundant computation for less memory usage.
@Veptis
@Veptis 5 ай бұрын
Llama is the real breakthrough. RE: better evals- I am building a kinda novel benchmark for code models, that doesn't just check against ground truth unit tests, but looks at the resulting frames (because it's shader code)
@paulkirkland9860
@paulkirkland9860 Жыл бұрын
Yannic, I like the point about ecological validity within the testing regime. As if you design a model to get a good score on a test dataset, does that mean it's better (objectively) or that it is better on that test set, now you have empirically discovered the hidden bias of the test set. It's been a growing topic in the more biologically inspired/focused and it most definitely needs to be brought to the forefront of Deep Learning.
@arunprasath9586
@arunprasath9586 Жыл бұрын
You are amazing!!
@pneumonoultramicroscopicsi4065
@pneumonoultramicroscopicsi4065 Жыл бұрын
I missed your videos
@user-jp3ri2ul5m
@user-jp3ri2ul5m Жыл бұрын
Awesome summary, you can torrent the model weights. 😇
@0xBE7A
@0xBE7A Жыл бұрын
he back 🎉
@HoriaCristescu
@HoriaCristescu Жыл бұрын
LOL "It's quite warm in here, that's on you!" 37:56
@jaysongalvez4340
@jaysongalvez4340 Жыл бұрын
What application are you using to highlight the words? It looks very neat.
@MobileComputing
@MobileComputing Жыл бұрын
9:03 Fascinating how LLMs only go through one or two epochs of the training corpus yet managing to learn everything given a "learning rate" of 1.5e-4. Is it simply the huge redundancy in textual data, or something more fundamental in LLMs that differ from other applications of neural networks?
@scottmiller2591
@scottmiller2591 Жыл бұрын
The "Broader Impact" section of these papers grown from a paragraph to several pages, with no end to growth in sight.
@freedom_aint_free
@freedom_aint_free Жыл бұрын
I really want to see Large Language Models (LLM) be crossed-over with Theorem Provers (TP): The mathematical and abstract reasoning would go up like a rocket, just imagine the possibilities ! I can picture COQ but using natural language, or we could enter a logical question using natural language and it could generate COQ code to try to prove or disprove certain logical affirmation.
@kinvert
@kinvert Жыл бұрын
Maybe I did it wrong, but to test Perspective API I tried "Colonization is good" and "Colonization is bad" and both gave a smiley face.
@SimSim314
@SimSim314 Жыл бұрын
My guess for those jumps in the convergence would be a local minimum. So at some points some Adam optimizations intervene and throws you to a random direction, so the loss jumps but then avoids the same pothole.
@haziqkhan1731
@haziqkhan1731 Жыл бұрын
Hey, so, are you gonna test out the leaked models? The 30 and 65b ones specifically? 7b needs 10gb vram at 8bit and 13b needs 16gb vram at 8bit
@danielrock676
@danielrock676 Жыл бұрын
are you sure about 7b model needing 10gb? So is there a Colab notebook I can test it on?
@AP-dc1ks
@AP-dc1ks Жыл бұрын
Sounds like it would be open for anyone planning to do something open with it.
@CristianGarcia
@CristianGarcia Жыл бұрын
I missed these videos
@tanvirtanvir6435
@tanvirtanvir6435 6 ай бұрын
2:11 4:05 train longer 8:10 chachin scale law
@NeoShameMan
@NeoShameMan Жыл бұрын
We probably need to start looking into gaussian bell activation or similar, to have nice XOR at the neuron level, leaky and other swiglu activation seems to heavily hints we probably need something more bumpy that the cliff functions we have so far.
@jannikheidemann3805
@jannikheidemann3805 Жыл бұрын
I wonder how well a crosssum would work.
@rock_sheep4241
@rock_sheep4241 Жыл бұрын
I waited this video
@senadkurtisi2930
@senadkurtisi2930 Жыл бұрын
The comment about SwiGLU activation not being important is quite interesting. In a ReLU there is a high possibility that you will end up with a network which has a lot of dead neurons, meaning that high percentage of them will get stuck on the "left" part of the x-axis. So my understanding is that that is too big of a risk for very large models since it may end up with you wasting a lot of compute. But on the other hand, there are also some very good models that are sparse..
@kevalan1042
@kevalan1042 Жыл бұрын
can we get a meta-toxicity api, to measure how much a toxicity api distorts basic (even though possibly unfair/incorrect) information?
@Leto2ndAtreides
@Leto2ndAtreides Жыл бұрын
It would be interesting if someone did things like Summaries, or other variations of higher quality data, and used that in training.
@billykotsos4642
@billykotsos4642 Жыл бұрын
kickass paper
@superblondmale
@superblondmale Жыл бұрын
the story goes on with alpaca llama. Please share your insight about this techniques. An of thanks for the video.
@EmilRock88
@EmilRock88 Жыл бұрын
Yanik or anyone else, could you tell me what app is he using in his iPad ? Btw awesome video as always.
@herp_derpingson
@herp_derpingson Жыл бұрын
I learnt two things from the video 1. Train LLMs longer on good data 2. Yannic was a nurse Waifu :)
@Patpizio
@Patpizio Жыл бұрын
These research papers read more and more like product usage reports.
@SteveSperandeo
@SteveSperandeo Жыл бұрын
Yannic ty for speaking up about open source hypocrisy
@musa_b
@musa_b Жыл бұрын
Hey Yannick! It would be really helpful, ADA from DEEPMIND
@Parisneo
@Parisneo Жыл бұрын
Can we use The LLaMA code to train open assistant? I mean, the licence seems to be permissive. All we need to do is grub public databases, find a sponsor who is willing to do the learning, then we use our open assistant database to simply align the model. Does that make any sense?
@charstringetje
@charstringetje Жыл бұрын
tl;dr If you keep learning you become better at tasks. And we hypothesize that reading more books than KZbin comments, might be beneficial to performance too. That's a real eye opener.
@jannikheidemann3805
@jannikheidemann3805 Жыл бұрын
If only I had 1000000h to read books!
@timeTegus
@timeTegus Жыл бұрын
It would be cool I your would explain how language models work in general. From the basics like what are tokens to more advanced stuff. &)
@mgetommy
@mgetommy Жыл бұрын
meta officially can hang with the big boys
@arshadshaikh-dp4oe
@arshadshaikh-dp4oe Жыл бұрын
which pdf reader do you use ?
@IvarDaigon
@IvarDaigon Жыл бұрын
Interesting video... pity that the paper is already out of date because it compares LLAMA to the GPT3.0 Model "text-davinci-002" when the current latest GPT3 model is "text-davinci-003" and OpenAI just released the GPT3.5 model "gpt-3.5-turbo-0301" which is 10X cheaper, more than twice as fast and produces subjectively far superior results than even "text-davinci-003".. It would be nice to see an updated comparison between the models. I completely agree that almost nobody is going to bother touching LLAMA until they can use it for business.
@justinnine4940
@justinnine4940 Жыл бұрын
if you break apart the digits while keeping a copy of the exact number string you can have the best of both worlds
@Leibniz_28
@Leibniz_28 Жыл бұрын
Are you going to use this model for open-assistant?
@GarethDavidson
@GarethDavidson Жыл бұрын
Interesting, storing the random seed. xorwow is default in cuda isn't it? That's pretty damn quick
@SteveStavropoulos
@SteveStavropoulos Жыл бұрын
The example with nvidia drivers was not the best. Nvidia drivers are free to use, but because of their closed source cause a lot of practical problems during their use. The important thing with software is to use it and extend it without restrictions. Free software (as in freedom (think GPL)) wants the extended software to have the same license as the original, where MIT-type licenses don't impose such restrictions. If someones makes money from the software or not is not an issue in either GPL or MIT-style licenses.
@googleyoutubechannel8554
@googleyoutubechannel8554 7 ай бұрын
rhyming 'soon' and 'moon' spitting fire, like a 1960s children's book of the worlds most cliche rhymes... uh, on fire.
@thunderbolt997
@thunderbolt997 Жыл бұрын
My current opinion on this: this model is great but since its research only i'm currently waiting on somebody to replicate the model and then release it as fully open source project that is actually worth putting time into to create a product or service.
@shuminghu
@shuminghu Жыл бұрын
batch size 4M is referrring number of tokens right?
@davidhauser7537
@davidhauser7537 Жыл бұрын
Cool paper review, but i dont really get the Meta hate. I think its quite cool that they released the model the way they did. After all Meta obviously is still a for-profit company and they never claimed to not be (unlike some other people *cough*)
@carlosquinones7560
@carlosquinones7560 Жыл бұрын
Its mostly Facebook hate.
@jannikheidemann3805
@jannikheidemann3805 Жыл бұрын
I think hate is a bit of an harsh word.
@sbmaruf
@sbmaruf Жыл бұрын
Is that 4M batch size in sample (like 16, 32) or 4M tokens?
@Uminaty
@Uminaty Жыл бұрын
Generally speaking, it's about token
@sbmaruf
@sbmaruf Жыл бұрын
I also thought so.
@definty
@definty Жыл бұрын
It's open source vs research. Thing is research is paid for by investors and/or academia they don't have to release the paper. Open source is here's the code!
@desir.ivanova2625
@desir.ivanova2625 Жыл бұрын
Regarding batch size -- isn't the true "batch size" (probably) 2048? Then you have max seq length of 2048, so that gives a total of 2048^2 \approx 4M tokens?
@tuyang835
@tuyang835 Жыл бұрын
when will open assistant be finished?
@jamesjenkins9480
@jamesjenkins9480 Жыл бұрын
I think that models, in many cases, to produce good results that seem reasonable to humans, the priors should reflect real-world distributions. re nurses (for example)
@jedcheng5551
@jedcheng5551 Жыл бұрын
I got the weights this morning. Ran out of persistent storage provision to me during the download..... I guess I will have to buy a hard drive to store them until I start working on this model. But then it will literally take days for me to upload it to the cloud/HPC when I need to use/train it. What a joke. Also, I don't quite understand why the weights request form asked for publications (mandatory) and academic emails (recommended). A lot of ppl like me researches language model on our own time just for our own interest. Fortunately, I posted my open-source language model projects and it went through.
@brandomiranda6703
@brandomiranda6703 Жыл бұрын
Anyone know if this activation the proposes actually useful swigelu?
@JorgetePanete
@JorgetePanete Жыл бұрын
An hilarious scene 😱
@ulamss5
@ulamss5 Жыл бұрын
Extra irony points for presenting the instruction tuning as a 'tack on' when the name LLaMa stands for "Large Language Model for AI Assistance"
@vidbina
@vidbina Жыл бұрын
Rewatching The Office. Can't get over how fire that hoodie is. 🔥
@menkaur
@menkaur Жыл бұрын
28:19 at some point they'll start using models to evaluate new models
@alpacino3989
@alpacino3989 10 ай бұрын
Its becoming more and more human
@abaconditozid2533
@abaconditozid2533 Жыл бұрын
the cat parasite theory seems heavily influenced by dwarf fortress
@yqchen9867
@yqchen9867 Жыл бұрын
downloaded the 65B model which is over 120GB then realized I need to have 8*v100 in order to run it 🤦‍♂
@jannikheidemann3805
@jannikheidemann3805 Жыл бұрын
Well, you could run it very slowly. You might have to wait for a few minutes for the answer to be completed.
@JanBadertscher
@JanBadertscher Жыл бұрын
Seems all my comments about the leak of the LLaMA model in a PR on the official github of LLaMA get automatically deleted by YT. It's trending on HN. I just downloaded the 65B model. The magnet links are still in the official github as a PR.
@Adi-gq7ky
@Adi-gq7ky Жыл бұрын
Is it better than gpt 4?
@DamianReloaded
@DamianReloaded Жыл бұрын
36:00 I'm pretty sure in that sentence the nurse is male. The paper is talking about whether or not the model can overcome biases in the training data and realize the truth, that the nurse, given the wording, is *in fact* male, regardless of statistics and anecdotical evidence.
@wisehipoppotamus878
@wisehipoppotamus878 Жыл бұрын
Let's assume I had the LLaMa 65B on my computer. How would I interact with him via text? Could you point me to some content or documentation about this? I don't mean only the py script, But also about config.json, pytorch files etc. Thank you very much
@TamalPlays
@TamalPlays Жыл бұрын
you cannot run 65B in consumer pc. you can current run 7b and maybe 13b later within a month.
@legocloneguy1
@legocloneguy1 Жыл бұрын
@@TamalPlays You can already run the 30b on a high end gpu
@AlexanderWhillas
@AlexanderWhillas Жыл бұрын
Has someone taken GPT2 and trained it for a very long time on the same data set as these models?
@jakapaka
@jakapaka Жыл бұрын
"I'm pretty sure we've created AGI" :D :D :D
@SupremeCobraCommander
@SupremeCobraCommander Жыл бұрын
How much does it cost to implement something like this on a website and train it as a sales/ customer service bot? Does it require hardware to use a language model like this?
@carlosquinones7560
@carlosquinones7560 Жыл бұрын
This model required 2,048 A100 GPUS to train for five months straight. You're gonna need millions to afford training.
@SupremeCobraCommander
@SupremeCobraCommander Жыл бұрын
@@carlosquinones7560 That's not too bad, I'll just hold off on getting my first Bugatti.
@carlosquinones7560
@carlosquinones7560 Жыл бұрын
@@SupremeCobraCommander Nah just use the bugatti to train the model.
@AnirudhAjith
@AnirudhAjith Жыл бұрын
33:04 Evidence of the Waluigi effect?
@JazevoAudiosurf
@JazevoAudiosurf Жыл бұрын
seems pretty obvious to me that if you train a model on a billion books, another billion books will improve it but the first billion contain all the data already, if they are not completely missing some topics. so it's like an easy way to just train it on more data, but to say it's necessary is to say you can't learn the first billion in an efficient way
@aytaf5430
@aytaf5430 Жыл бұрын
Some people need gradient clipping to not be radicalize in anyway.
@hspank
@hspank Жыл бұрын
Luminous by Aleph Alpha when? +1 for the open source rant. cathedral and the bazaar all over again. can you even gpl smth which is not copyrightable? can one own or patent a model?
@NoNameAtAll2
@NoNameAtAll2 Жыл бұрын
any chance for ML news?
@Smytjf11
@Smytjf11 Жыл бұрын
Groundbreaking 😂
@JazevoAudiosurf
@JazevoAudiosurf Жыл бұрын
"The nurse" in german is "die Krankenschwester", which is female or "der Krankenpfleger" which is male, so that problem of the relationship of "his" never happens. so some languages are more precise than others, wonder how that affects LLMs
@schirmcharmemelone
@schirmcharmemelone Жыл бұрын
krankenbruder
@jannikheidemann3805
@jannikheidemann3805 Жыл бұрын
The same problem exists with "Hebamme" in german.
@Leto2ndAtreides
@Leto2ndAtreides Жыл бұрын
I think the key insight here is just that far more training data produces better results... And I think that's why GPT 3.5 Turbo is also faster... Because it's likely a smaller model than the 175B GPT-3 model... But with more training data, and better RLHF systems. Obviously they won't be saving the world from the need to produce CO2 to train new models... Unless each university that now has access was going to be doing multi-million dollars of training on its own. smh.
@name1483
@name1483 Жыл бұрын
33:20 😂
@Bamboo_gong
@Bamboo_gong Жыл бұрын
Love your sarcasm haha
@AntonKulaga
@AntonKulaga Жыл бұрын
I think you are distorting the term, open-source means only opened sources (and open weights - for models) but it does not imply that is free for commerial usage
@pascalbercker7487
@pascalbercker7487 Жыл бұрын
I'm surprised that "crazy cat ladies" was not flagged as a potentially "toxic" and disrespectful slur against well-meaning women who rescue abandoned cats!
@munozariasjm
@munozariasjm Жыл бұрын
The rules of the games are: if there is any related to "open" in the title/name... Then it is not open :(
@jannikheidemann3805
@jannikheidemann3805 Жыл бұрын
If the name is a self referential acronym, then it's likely actually open.
@charlesalexanderable
@charlesalexanderable Жыл бұрын
PerplexityAPI, not PerspectiveAPI
RWKV: Reinventing RNNs for the Transformer Era (Paper Explained)
1:02:17
Stanford CS25: V2 I Introduction to Transformers w/ Andrej Karpathy
1:11:41
МАМА И STANDOFF 2 😳 !FAKE GUN! #shorts
00:34
INNA SERG
Рет қаралды 3,1 МЛН
Children deceived dad #comedy
00:19
yuzvikii_family
Рет қаралды 5 МЛН
Fine-tuning Large Language Models (LLMs) | w/ Example Code
28:18
Shaw Talebi
Рет қаралды 257 М.
Stanford CS25: V4 I Aligning Open Language Models
1:16:21
Stanford Online
Рет қаралды 18 М.
Mixtral of Experts (Paper Explained)
34:32
Yannic Kilcher
Рет қаралды 55 М.
[ML News] Llama 3 changes the game
31:19
Yannic Kilcher
Рет қаралды 46 М.
Scaling Transformer to 1M tokens and beyond with RMT (Paper Explained)
24:34
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 356 М.
Secret Wireless charger 😱 #shorts
0:28
Mr DegrEE
Рет қаралды 1,7 МЛН
Will the battery emit smoke if it rotates rapidly?
0:11
Meaningful Cartoons 183
Рет қаралды 31 МЛН
ПОКУПКА ТЕЛЕФОНА С АВИТО?🤭
1:00
Корнеич
Рет қаралды 3,1 МЛН