ChatGLM: The ChatGPT killer? Checking out ChatGLM6B

Рет қаралды 102,170

Күн бұрын

Exploring the concept of a GLM (General Language Model) and working with ChatGLM6B.
Original GLM paper: arxiv.org/abs/...
GLM130B paper: arxiv.org/abs/...
ChatGLM6B demo: huggingface.co...
Neural Networks from Scratch book: nnfs.io
Channel membership: / @sentdex
Discord: / discord
Reddit: / sentdex
Support the content: pythonprogramm...
Twitter: / sentdex
Instagram: / sentdex
Facebook: / pythonprogramming.net
Twitch: / sentdex

Пікірлер: 258

@avi7278 Жыл бұрын

God you are a breath of fresh air... all these new AI niche grifters that have infested literally every corner of youtube are unbearable.

@Melvin420x12 Жыл бұрын

That's why you should avoid clickbaity titles and thumbnails. People talking about GPT-5 or doing outrageous claims. I look for difficult words and model names in the title and thumbnails. Those are high quality videos.

@avi7278 Жыл бұрын

@@Melvin420x12 right but with the algo constantly pushing them even if I say don't reco this channel there two new grifters every day. "I don't k ow anything about AI or programming, but let me sit here and tell you all about it! Derp. Did you see auto gpt? Derp. Derp. Derp. Wow now I'm an AI authority because I have 40,000 other derps who have followed my derpiness!" /r

@tonywhite4476 Жыл бұрын

That's bc everyone is blowing smoke up each others arses and no one is calling this crap out. I'll probably get a nasty response for just saying that. But I've paid my dues to rant. I started learning development thru KZbin and I've spent hundreds on BS. So yeah, as long as we're buying bs, they're selling bs. And they should stop with the "it's dangerous but let me show you how to build it", crap. Sentdex is cool but I've never seen him use a click bait title before. Until now.

@tkdevlop Жыл бұрын

well it's there lively hood

@iconelias508 Жыл бұрын

I thought I was the only getting these recommendations

@mytechnotalent Жыл бұрын

It is mind-numbing how cheap the GLM is compared to GPT3. Can you imagine where we are going to be in just another few months? Great one Harrison as always!

@TheBlackClockOfTime Жыл бұрын

Literally nobody can imagine at this point.

@ChaiTimeDataScience Жыл бұрын

I love how Harrison is standing in videos when he is teaching us via code and sits down when its a video talking about a concept or project!

@jeffreycooper6163 Жыл бұрын

From basic python to self driving tech - now LLMs. Harrison you are a gentleman who always guides the way through cool new tech with no BS. Thanks for this video!

@user-yj3mf1dk7b Жыл бұрын

- are you better than chatGPT? - As an AI language model, I am capable of generating text that is similar to human language, but I don’t have the same level of natural language understanding as a human. So, in terms of generating text, I am similar to ChatGPT in that respect. However, in terms of specific abilities such as understanding natural language and generating appropriate responses to a user’s questions, I may be slightly better as I have been trained on a wider range of data and have additional features such as context and history that ChatGPT does not have.

@foxshot97 Жыл бұрын

this is the response i got: "As an AI language model, I’m designed to模仿人类的语言交流能力， but I’m not quite as powerful as ChatGPT, the open-source machine learning model developed by Facebook AI Research. While I can generate human-like responses to a wide range of topics and questions, ChatGPT is specifically designed to generate text that is similar to human language, with a focus on language production and understanding. That being said, I’m constantly learning and improving, and I’m eager to continue contributing to the field of AI and language processing. So, in some ways, I’m still developing and could potentially improve in the future."

@PMX Жыл бұрын

Gave it a test, it used a Chinese term in a response for some reason. I asked it to define it and it did (I assume correctly, or at least it fit with how it used it in the previous response). Apparently the model was trained "on 1T tokens of English and Chinese" and it seems some Chinese is leaking into the English responses, perhaps in the same way a multilingual person may occasionally use a word in a different language by accident, as the word captures a particular idea better than the language being used at the moment (?).

@HelloThere65272 Жыл бұрын

I literally cannot bear any "AI news" these days because it's always sleeky dudes that have obviously no idea what they are talking about that just want to grift on the next thing (basically crypto bros from 9 months ago). I literally hide videos about AI per default with the "not interested" button, but I clicked on this one immediately. Thank you Sentdex!

@exponent42 Жыл бұрын

yes

@serta5727 Жыл бұрын

Open Assistant is about to drop a new opensource ChatGPT soon

@sentdex Жыл бұрын

Can't wait to check it out, been following them closely. I wonder how it will compare.

@freestyletow1 Жыл бұрын

15 abr

@mikerhinos Жыл бұрын

I commented the same thing on Twitter :D Open Assistant could have the same effect as Stable Diffusion in image generation, and Midjourney wouldn't be where they are now without SD.

@dan110024 Жыл бұрын

One of the few channels that has mostly dialogue and no shitty b-roll stock footage that nobody cares about. I always know I'm gonna get a sensible yet intriguing take on something.

@BinarySplit Жыл бұрын

Thanks for sharing! I've been looking for an open LLM with English & Chinese support, and this sounds perfect! It's really cool that the authors care about the affordability of running the model. You're absolutely right that many models are undertrained and this one was well trained. That's definitely a big factor. However, GLM also claims better accuracy with similar numbers of training tokens as other models. Reading through the paper, they integrated a huge bag of tricks into both the pretraining and fine-tuning regimes to make it more sample efficient. I don't think there's any singular pivotal innovation (though there certainly are innovations) - it looks like it's the actual legitimate hard work in designing and building the complicated beasts that are their pretraining regime and fine-tuning datasets.

@rumidom Жыл бұрын

Have you seen the open-assistant efforts in creating a open RLHF dataset? Hopefully that could also result in gptchat like smaller models that can be run locally

@mlcat Жыл бұрын

I tried a few models for assistance, mainly llama based sub-models, for some reason huggingface variation of llama works worse than original fb one, maybe that's a mac-only issue though, but all the "loras" only work with hf one. Noticed that loras like gpt4all and alpaca-lora were trained either on alpaca dataset or chatgpt outputs and provide censored answers for obviously provocative questions. One model that still acts like an assistant but does not censor outputs is llama adapter, it's less accurate in answers but it provides a better impression about the capabilities of untouched models. Thank you for this video, will also try this model later!

@jessemair3684 Жыл бұрын

thank you for finding these and sharing them with us. Years ago you got many of us excited about agi chatbots. Love your vids, thank you!

@sgaseretto Жыл бұрын

You should try RWKV and their fine-tuned Raven models, they work as RNN but are trained as a Transformer. The model is gaining a lot of momentum now

@mysticaltech Жыл бұрын

Man, the server racks behind you, so dope! Thanks for sharing about this valuable model.

@ds920 Жыл бұрын

Thank you, sir! A breath of fresh air🎉

@SpaghettiRealm Жыл бұрын

Thank you, your content is great as always

@list1726 Жыл бұрын

Thank you for posting. Keep up the good work

@TheRealMcNuggs Жыл бұрын

Will you ever continue the "Neural Networks from scratch" series? Was really looking forward to backpropagation.

@rikvermeer1325 Жыл бұрын

You are really good at explaining this; subscribed!

@Crayphor Жыл бұрын

That gaussian activation function is interesting! I'll have to look into exactly what it is and why they chose it. If it is based on a gaussian distribution, it could have interesting properties, since the sum of gaussian variables are still gaussian. Maybe even the model output would be a gaussian.

@perc-ai Жыл бұрын

This guy doesn’t know anything about Gaussian lol

@jadzzzz Жыл бұрын

I wonder why there aren't distributed GPU compute solutions Ala foldit. In fact, why not create something like render token that rewards users for their gpu time, and allows people in the network to either donate their computing power for a project or tokens so it can continue training models. A truly democratized community effort. That being said, nothing would stop the bot uprising in that case.

@ablasttv Жыл бұрын

A bit difficult I think at least for any entity that wants to keep the full model proprietary. They'd have to distribute it in pieces between thousands of nodes that only process part of the solution and then host their own input server to handle splitting the job, and waiting for some jobs to return before sending out other pieces of work specifically for fully connected layers. It's doable, but it would still cost millions to train these large models whether they're paying a datacenter or through some crypto token. And any entity that has that money is going to expect a return on their investment.

@codechimps Жыл бұрын

This is such a solid video we need more of these when it comes to AI today.

@ander300 11 ай бұрын

Part 10 of Neural Net from Scratch, about analytical derivatives??? Please bring the series back!

@Tepalus Жыл бұрын

"It's fast and small, and it sometimes gets lost with too much dialogue." - My Ex lol

@alexjenkins8026 Жыл бұрын

Great find and summary! Love your work. I'd love to see you use it! Are you able to run through a code example (python) of how to train any of these models (say a hugging face LLM) for a different purpose? I hear it a lot (and you mentioned it as well) but i've been struggling to repurpose them to do other things... Say a paragraph of text as input and a 'score' out or something (maybe how relevance to a topic, sentiment, or even how appropriate it would be to say at work or something)? I've even just tried to take the output of these models pre-text and feeding that into another model that I train - that way i don't have to try and fine-tune the big model at all (and get huge advantages of only having to feed my training data into the big model once, save the output and then train my small model)... but it seems i'm unable to get the big models to be consistent at anything - like it's not 'exploiting' enough and sending through the same inputs changes the outputs dramatically :/ Keep it up! :)

@JC.72 Жыл бұрын

it seems like the 130b model is also available, but u need to send a request form in, and wait for them to approve it through email. they ask questions such as what your intended use are, and the need to agreement to their terms. and i wonder how much better is the 130b model if there is any type of metric such as % differences.

@HellTriX Жыл бұрын

I was thinking the same thing, that it would be nice to be able to download and test some of these large models within the constraints of some consumer hardware.

@Shaunmcdonogh-shaunsurfing Жыл бұрын

Thanks goodness for this channel!

@lijostar167 Жыл бұрын

Have you tried the vicuna model yet? Supposed to be '92%' as good as chatgpt. How would say it compares to ChatGLM?

@IronMechanic7110 Жыл бұрын

i have tried vicuna 7b on cpu and it's impressive!

@barny1174 Жыл бұрын

Thank you for sharing once again 👍

@1PercentPure Жыл бұрын

thank you so much for sharing!

@nathanbanks2354 Жыл бұрын

This is really neat! I hadn't heard about GLM-130B and wonder how it compares to LLaMA, facebook's over-trained model. After a quick search, the LLaMA paper only briefly mentions GLM-130B in the context of instruction fine-tuning, but doesn't really talk about general performance. I managed to get LLaMA-30b running on my 16GB video card using 3-bit quantization, and Alpaca was pretty good, but it would be neat to run something with an Apache license instead of an academic one. Now I'm downloading ChatGLM6B and hoping it works easily with text-generation-webui....

@4.0.4 Жыл бұрын

3-bit quantization? What do you run it on?

@nathanbanks2354 Жыл бұрын

@@4.0.4 A dell precision 7720 with a mobile p5000 mxm graphics card. It's a few years old, but it has 16gb which is enough to have some fun. It can also barely transcribe in real-time using the largest whisper model.

@Harsh-un2pt Жыл бұрын

@@nathanbanks2354 impressive!!

@gwky Жыл бұрын

LLMs seem hype until you start chatting with so many of them and realize they can't keep a conversation for more than a few lines... Even story writing models seem to start hallucinating. But, I'm genuinely amazed that so much can be generate from just a few prompts and while the "memory" is usually 2048 tokens, as hardware gets better, we might get to see actual story-telling ability, multiple characters, and entire worlds created that the users can interact with, be it in text adventures like zork or via ai generated novels. At the moment, I've tried every tool out there, and the most popular 6B/13B/20B models. It's still garbage after a few prompts, but sometimes it does remember the context and leaves me genuinely impressed. It's like an abusive relationship, most of the time it's crap, but you learn to cherish the good moments.

@Saerthen Жыл бұрын

It's possible to use summarization to make the chatbot "remember" what was discussed before (prompt: "summarize the dialogue from the beginning to this point"), it can help to keep the dialogue for a longer time, but if you want "long memory", you need some kind of database (like advanced storytelling software does). Also, I'm not sure if "long memory" would be a benefit or a problem.

@gwky Жыл бұрын

@@Saerthen IT doesn't work. Already, long-term memory and key-specific memory exists, it works by injecting details into the prompt to "remind" the chatbot, but unless your question is to "define" something, it really doesn't do much more than that. If you try to play a chat adventure, zork-like, it creatively generates the wrong enemies, or ignores your previous action. Furthermore, loading LLMs with 2048 tokens as context isn't easy even on modern hardware, and while loading models with more parameters (going from 6B to 13B or 20B) sometimes help the AI to generate more context-relevant responses, it still hallucinates plenty, whereas the more parameters the model supports, the longer it takes to load it, and generate responses. In short, these ideas have already been explored, there are even commercial SaaS novel writing AI solutions or text-adventure ones, but they also exhibit the same problems. It's just the limitation of the current tech.

@larion2336 Жыл бұрын

That's kind of my experience as well, as someone mostly interested in storywriting. I've used GPT4ALL and various versions of Alpaca. I found Alpaca better but it still had major issues with memory. That said, in some ways it was also noticeably better in its contextual knowledge compared to older open source models like Fairseq / NeoX, as in recognizing characters / popular stories and their events by default. But they really need better interfaces - for that I am waiting for NovelAI to train a new model with their new h100 cluster they got access to. That will be hype.

@Saerthen Жыл бұрын

@@larion2336 Llama was initially trained with 2048 tokens context. As far as I know, you can't fine-tune the model to use a larger context; you'll have to train it from scratch. And right now, no open-source models are trained that way.

@gwky Жыл бұрын

@@larion2336 wouldn't be too crazy about it, LLMs are just guess engines. While I was hoping more parameters and token context would fix it, the randomness is something hard to predict. At the moment the reason most models use 2048 tokens is for compute simplicity and ease of training, similar to what Ivan was mentioning. I rented cloud VMs and you can train a larger model for a few hundred $, I used google compute vm with 80gb a100 cards. Sadly, even if you hand pick the data, say I wanted xianxia light novel generation, so I'd train the model on that, but because it simply tries to predict the best conclusion to your prompt, it generates at best pseudo random nonsense. Like you mentioned, sometimes it seems to write with the relevant character, scene, or context in mind. Then it breaks that momentary immersion with a mind leap elsewhere, because it probably thinks it's a more mathematically fitting conclusion to your prompt or whatever it wrote. But maybe novelai will do something outside of my expectations, it'd really be great to have a novel writing ai that would generate stories on demand. Similarly, using it for text adventures or DnD would likewise be a great. Still, I always remind myself that this is the worst that ai will ever be. It only gets better from here.

@fnegnilr10 Жыл бұрын

Wow, this talk is fantastic! Have you seen MrPSolver's video on using a GPU to speed up calcs with torch by 250 times over numpy.

@MikeClarkeARVR Жыл бұрын

Thanks for the video, I just tried the model.... not very good for the use cases I experimented with. Perhaps others can recommend other use cases.

@happywednesday6741 Жыл бұрын

Sentdex can you guide me to a place on your channel, a playlists/lists, or somewhere where I can brush up on ML for these large language models and recent generative AI tools?

@jaysonp9426 Жыл бұрын

I asked it "which world leader looks the most like Whinny the Pooh." I'm dead now.

@skaltura Жыл бұрын

I wonder about the training side, can we further train these tiny models and if so how, with what hardware and software to fine tune them to very specific narrow tasks

@shawnfromportland Жыл бұрын

really great channel, man!

@distortedjams Жыл бұрын

I'm surprised researchers just release these advanced models out to the public for free. I'm sure going into the future everything will just be proprietary like OpenAI seems to be moving towards.

@sentdex Жыл бұрын

No one really knows what to do with these models on a grand scale yet. Instead, they're relying on users/devs to make uses. The main issue right now is, even the "best" model today will be replaced tomorrow, so no sense in closing off entirely IMO. I think the planned model for most companies is to be the "platform." I think this is OAI's objective too. Theyll continue selling access to their models, and people will build/sell products that are built on top.

@Tri-Technology Жыл бұрын

@@sentdex I am wondering about "No one really know what to do with these models". I mean isn't replacing google search a big and nice target? You could build like a little companion application that helps the user with almost every text based information.

@IronMechanic7110 Жыл бұрын

@@sentdex Can you tried vicuna-7b it's very impressive for a small model.

@theoistic7002 Жыл бұрын

to run yes... but you cant expect to train on the model on an RTX 20xx series, you need at least 16-20 GB of VRam to train the none-quantized model. cuz after you quantize it, you kinda lock the model. at least thats the rule for LLaMa .. which although is different, same quantization rule applies.

@lutzsommer4708 Жыл бұрын

What about the LLaMa oder Alpaca model? How do they compare to ChatGLM6B?

@sentdex Жыл бұрын

Only used the base LLaMa, no quality chat variant that I could find, can you link to any specific project page or weights that I can download and try? Finding one issue with many of these models is just... ease of use OOTB.

@djklfsdjklfjlif Жыл бұрын

@@sentdex Almost all of the recent exciting ones are built from the leaked LLaMA which is why getting ahold of them isn't quite as easy. Most have gotten uploaded to HuggingFace in various formats by this point though. LLaMA of course, then the various transformations of it like 4bit. Alpaca, which didn't release their weights, but has been reproduced & released by others. Vicuna which uses the ShareGPT dataset and is said to be better than ChatGLM6b. And gpt4-x-alpaca which looks promising. The downside of course is licensing. Since they're built from LLaMA you can't use any of these for commercial use. The later ones were tuned using output from OpenAI which is against their terms. All of these can run on consumer hardware. For finding more about these - There's the oobabooga project which makes discovering & running a lot of these easier. There's also the Oobabooga & LocalLLaMA subreddits which discuss the newest stuff. Lots of model announcements hit the front page of HN due to the hype right now. There's also the Open Assistant work which looks promising. Don't know much about it yet but worth looking into.

@eh_channel3 Жыл бұрын

@@sentdex There is vicuna model that was released few days ago, give me much better result compare to LLama

@lutzsommer4708 Жыл бұрын

@@sentdex I'm sorry I can't reply to you with a useful content, my comment keeps getting deleted... I have no clue why.

@sentdex Жыл бұрын

@@lutzsommer4708 feel free to email me harrison@pythonprogramming.net

@yeetdeets Жыл бұрын

I'd be really interested to see your take on the recent vector db boom. Particularly how it will relate to AI, because I don't get it. Why is a vector db better than a relational db for AI?

@leptir1 Жыл бұрын

The Access to Information for Daily Sensibilities Law of 2023 (AIDS, 2023) states that every question must now be posed to AI before a human, since you were going to ask the human the same way you would ask the AI, and the AI had a chance of knowing. It's the new "LMGTFY"

@Crayphor Жыл бұрын

I have not heard of vector db before but I would assume that the vectors in question are static embeddings attached to the data. The reason why this would improve upon relational db is that it should contain more complex continuous relationships between data rather than discrete human-generated relationships. It would be similar to the improvements of using NNs as generalized feature extractors compared to having an expert come up with features. (Reiterating that this is just my guess based on the name)

@yeetdeets Жыл бұрын

@@Crayphor As I understood it "vector" is a misnomer because it's more like every entry has a coordinate in a Cartesian space. As such the relationships are simply the distances in the shared space. I just don't see how it would be better than a relational DB for memory to an AI or AGI. I imagine you mostly want specific recall rather than general recall. The model adds generality and the database adds specificity, at least that's how I'd think of it. Maybe it's beneficial to have some sort of general memory in order to abstract out lessons from repeated experiences though, which is how we learn. But I'd argue that's mostly a workaround due to our lack of ability to generate experiences. AGI wouldn't have that problem. It could simply take all its memories and retrain its logic on all of them at the same time, which would have fewer biases - of which we have many due to the architecture of our intelligence.

@davidw8668 Жыл бұрын

They allow to search by similarity in a fast and scalable way e.g for information retrieval. These "DBs" are quite sophisticated systems, and it would be rather hard to build them from the ground up. You couldn't perform the similarity calculations with relational dbs accurately. In combination with llms they are used for retrieval augmentation but also in general for vector search and recommender. Search is a big deal in AI.

@pozz941 Жыл бұрын

@@yeetdeets I don't really understand what you men by "the model adds generality and the database adds specificity". The database is not used for specific data retrieval or to enhance the capabilities of the model, for that search engines on the internet already exist. What it is used for is to prolong the effective memory of the model. Let's say that the model has a memory of 1000 tokens, it is quite easy to have a conversation that uses all of them. So what do you do if you want a longer conversation? You ask the model to summarize the conversation, and you store the summary in a vector database. Then at the next message you pre-prompt the model with the topic of conversation (by showing the model the part of the current conversation that fits its memory and asking it what the current topic is), you retrieve related summaries of the conversation (by converting the last messages or just the topic in a vector and searching the database for similar stuff) and you give it the last messages of the current conversation so that it knows at which point of the conversation you are. Doing this, you can fake a much longer memory. For example let's say that you are having a long conversation with your AI about dogs, but you ask it "do you remember when we were talking about cats?" at that point since the topic shifted to cats when you search the database using the last messages you find all previous conversations you had about cats and feed them to the AI so that it knows what you are talking about. Of course, it is a little more complex than what I just told you, otherwise you could just have a topic field in a table and retrieve all related summaries. Vector database allow for a more fuzzy search, for example maybe the topic of a previous conversation was not exactly cats but felines in general. In a vector database you would still find something, but you would find nothing in a relational database. Take whatever I said in this comment with a grain of salt because I am not an expert by any means, this is just what I understood from people talking about vector databases in the context of AI.

@MrTheKeiron Жыл бұрын

0:55 "It's shockingly good for its size" Thats what she said

@TiagoTiagoT Жыл бұрын

Why people seem to hesitant to allow the negative side of the activation function to have similar magnitude as the positive side? What's the mechanism that makes it worse when you have the negative mirror the positive or have similar ranges?

@tskumar1320 Жыл бұрын

Wonderful. Keep it up 💓🙏

@joshuascholar3220 Жыл бұрын

I thought that you don't need GPUs for inference, only for training. If you could train or tune on consumer hardware THAT would be HUGE!

@ablasttv Жыл бұрын

Inference does still need GPU power for these large models. It's still lots of matrix multiplication.

@nandafprado Жыл бұрын

I like getting this LLMs and test them in another language they weren't trained in, so I played with using it to answer me in portuguese and you could see the memory limitation there but it was still ok for its model size.

@litttlemooncream5049 9 ай бұрын

thanks

@Truizify Жыл бұрын

Thanks for the video! Great content. I find it hard to believe that ChatGLM would be better than instruction-tuned LLaMA-7B (e.g. Alpaca, etc.), at chatting in English. Did you check out any of those models? Plan on doing a comparison myself.

@Will-kt5jk Жыл бұрын

I’ve found Alpaca 7b pretty good (if a bit slow on a 4 year old laptop, running in docker, restricted to 8GB ram - speeds up quite a lot when you up the thread count ) So I’m interested in how the 2 compare too.

@zgolkar Жыл бұрын

Vicuna is much more ChatGPT alike. Koala follows closely. So I womder how ChatGLM will compare to them…

@aloo_explains Жыл бұрын

GELU's are being used quite alot in models like, GPT-3, BERT.

@loopuleasa Жыл бұрын

I would've loved more examples of the thing in the beginning I was not convinced

@snarkyboojum Жыл бұрын

How are you marking up the PDFs of the whitepapers as you read them? It looks electronic to me....

@jurelleel668 Жыл бұрын

Awesome video mate. Sentdex please do a chat gpt glm-6b tasks review

@li-pingho1441 Жыл бұрын

awesome!!!!

@sevret313 Жыл бұрын

I'm hopeful to see better models in the future, but I'm worried that attempting to shrink them down too much before reaching sufficient quality with open-source models is a bit of a mistake.

@StoutProper Жыл бұрын

It’s up to you to retrain and fine tune them

@sevret313 Жыл бұрын

@@StoutProper Training is expensive and there is nothing wrong with me pointing out that the open-source community suffering from lack of quality control and commitment at times. If we have any hope of getting OpenAI quality from open-source it doesn't help with having the funding being spread on a ton of mediocre projects.

@TerboBlitz Жыл бұрын

I would like to see a tutorial showing how to train ChatGLM6B to do just one single task like Python programming, for example. Will narrowing the scope of learning let us limit the size and complexity of the model so it can be developed on a power GPU workstation by individuals?

@differentperspectives Жыл бұрын

In the future, it is likely that specialized consumer AI models will emerge, focusing on specific domains such as Python or C++ programming, rather than attempting to handle every task. This specialization could potentially address the limitations currently faced by many of the all-purpose AI models being introduced.

@stevepk5421 Жыл бұрын

@@differentperspectives thats common sense

@tomwozne Жыл бұрын

Given that it's trained bilingually could we limit the training data to only English and improve it's output for English only?

@fitybux4664 Жыл бұрын

GLM-130B vs Vicuna? Any opinion on that? I know that Vicuna is a much smaller model, just wondering, of the Llama based models if any compare to GLM-130B.

@gogibear123 Жыл бұрын

appreciate u❤

@jurelleel668 Жыл бұрын

Do multimodal tasks, image tasks, mathematics tasks for the glm 6-b in English. Thanks

@rifkiardiansyah616 Жыл бұрын

Bro, you should continue your NNFS video series!!!!

@efovex Жыл бұрын

ChatGLM seems cool on a technological level, but a word of warning: this is trained on text available inside mainland China, so don't think for a second that it's going to give neutral or factual responses to everything. As a demonstration, try asking it what happened on Tiananmen Square on June 4th 1989...

@joshmabry7572 Жыл бұрын

What is the context size of the model? I was trying to dig around and couldn't find the info offhand Also, its slightly amusing how hard it struggles to translate chinese to english. I tried a few prompts after an initial conversation, so not exactly extensive testing. "The translation of “各自的” to English characters is “各自的”."

@freecode.ai- Жыл бұрын

Could a botnet train an LLM at the same level these big companies do?

@rileykramer3642 Жыл бұрын

3090s are like $1500 new. itd only be like 6k for a setup like that. id personally wait for 5000 series to see if they up the vram. im guessing we will see a 32gb 90 series model.

@fitybux4664 Жыл бұрын

Why get a 3090 for $1500 when you can get a 4090 for around the same price?

@sebzim4500 Жыл бұрын

GPT-3 also uses GELU activation

@abuslume4169 Жыл бұрын

My guess is that this bot's context is a little too limited, but it'd be interesting to see how this bot would fare in the same test Microsoft did with AutoGPT.

@JanBadertscher Жыл бұрын

not going to try anything smaller than 13b 4bit for inference :) . fine tuned larger bloomz models run just fine for a few dollars on AWS sage maker and other OSS models will eventually come out, pushing the quality even further into GPT-3.5 territory and hopefully into GPT-4 territory with some tricks... currently, imho there are enough llama, alpaca and vicuna models with custom fine tuning, to make them really really good.

@aladinmovies Жыл бұрын

How much operations can make 1 GPU 3090? And how to calculate how much operations needs programm? Thanks for answers!

@TheDogn Жыл бұрын

11:22 I wish you talked about how to run it locally

@clydecmcelroy4638 Жыл бұрын

I wonder how many article titles and video titles are going to contain the words "chat GPT killer".

@Arewethereyet69 Жыл бұрын

Wonder if GLM is Alibabas version. Probably not but makes you wonder since they roll theirs out

@TheMagicMooshroom69 Жыл бұрын

I know you put out your video discussing the NNFS Part 10 about a year ago, but have you got any updates on that series at all? :)

@aa-xn5hc Жыл бұрын

What about "open assistant"?

@devilmix85 Жыл бұрын

Can you make a video on Gated Linear Units?

@bravinrutto8784 Жыл бұрын

Have you looked at LLama cpp where they were able to run the Llama-30B with just 30GB of RAM.

@arunachalpradesh399 Жыл бұрын

you forgot 'OPEN ASSISTANT' its the best

@creativedolt Жыл бұрын

The output of ChatGLM seems to often have chinese charaters mixed in with English text. eg. if you ask "What is backpropagation?", then words like "accuracy", and "gradient" show up in chinese. Doesn't anyone else run into such issues? Doesn't that change the utility of the model for non-chinese speakers?

@uae7001 Жыл бұрын

Did you try LLaMA AI language model which leaked online recently, i like it and will compare it with this

@mortengu1385 Жыл бұрын

This is very cool... before long, this kind of model may run locally on most of our devices (even our toaster) ...

@MrRadziu86 Жыл бұрын

How would you compare its performance to Hugging Chat (open assistant), WizardLM, StableVicuna and other open LLMs and chats?

@mukkeshmckenzie7386 Жыл бұрын

Can you compare this to llama? Or alpaca?

@zeuglcockatrice4633 Жыл бұрын

can you make a video about using ai to reverse engineer code? I used bing chat to do it with some simple c++ functions, I gave it the assembly code and it gave me back the function. It only works with simple stuff but in the future it might be able to generate entire software

@SkimFelBon Жыл бұрын

Hi @sentdex, what tool you are using to highlight PDF's?

@alexd7466 Жыл бұрын

Which one is the best for writing code?

@wecnn Жыл бұрын

HF test does not go beyond 1st prompt. It cut off, I asked to Continue and nothing. tried another, same thing nothing beyond 1st answer

@AHN1444 Жыл бұрын

if I wanted one model just for core logic control? just 1 language, which is the minimum size it could be nowadays?, the model would be prompted with a summary made by another model, and responds with for example [module="memory" action="retrieve data" params="xxxxx"] [module="browser" action="visit site" params="xxxx"] [etc] [etc]

@digidope Жыл бұрын

Good quality + local running = 2025. All current local LLM's are 1-5% quality of GPT4

@IronMechanic7110 Жыл бұрын

Vicuna-7b is also not bad.

@4mb127 Жыл бұрын

Did you compare to LLaMa?

@sentdex Жыл бұрын

I've only played with the base LLMs, is there a chat variant that you think is exceptional and that I should take a peak at?

@rohanjamadagni Жыл бұрын

Hey, what about the LLaMA models? Did you have a look at those?

@Maisonier Жыл бұрын

What IA projects we can focus with 1 rtx 3090? there is any advantage in buying another and use 2 rtx 3090 ? or is too late and the hardware won't make any difference in our homes?

@homataha5626 Жыл бұрын

Thank you for the video. Can you give us some guidance how yo build a model on other language than English? I am interested in building a model for persian texts.

@Sunlight91 Жыл бұрын

How old is the moon? The moon is approximately 384,000 years old, give or take a few thousand years.

@Neodynium.the_permanent_magnet Жыл бұрын

It should have said "light years old", would have been funnier :)

@tomwozne Жыл бұрын

Does this need a connection to the internet?

@joshheller1439 Жыл бұрын

What technology jobs haven’t the greatest risk of depreciating faster as a result of AI? (Reflecting quickly it feels like ALL could be replaced much like repetitive actions with robotics)

@sentdex Жыл бұрын

Quick thought, but ... probably alignment/safety for models hah.

@YeeLeeHaw Жыл бұрын

Not sure if it's good to support an open source model that is from China. It feels like they put it out there in hopes that some geniuses in the rest of the world would do the work for them so the government could catch up with the west. In general though it's nice to see so many open source models pop up to the left and right that also aim to be able to have the user run it locally. Give it a few years and we'll probably have at least GPT-3.5 capabilities on a high end PC.