Check out our updated course on running private AI chatbots on your computer. bit.ly/skillleap
@mal-avcisi97834 ай бұрын
Are you serious ? Using MacOS ? No one is using MacOS. Stop making videos showing howtos with MacOS.
@joeldowner29914 ай бұрын
try AIJoel - Multi Generator: text, code, image (create sticker, remove image background, add color to black white image, image to video, logo design) and (music and video are in beta mode)
@IndieGamesSpotlightАй бұрын
so I installed ollama and asked it, are you cloud based or are you running locacly on my computer? It replied that is cloud based Ai. Why is that?
@Alvin-i2t7o2 ай бұрын
One new SSD and a full install later.... It works!! Docker was giving me an error which I couldn't resolve but it was all straightforward on a clean installation!
@AliHassan-wc6nbАй бұрын
Any luck?
@mr.cannon4 ай бұрын
PC USERS IF YOU ARE GETTING THE WSL ERROR WHEN RUNNING DOCKER - Enable virtualization in BIOS: This process varies depending on your computer manufacturer and model. Generally, you'll need to restart your computer and enter the BIOS settings (often by pressing F2, F10, or Del during startup), then look for an option related to virtualization or VT-x and enable it.
@joeduffy523 ай бұрын
I get this error but can't see anything in BIOS like you mention. My M/B is the Gigabyte X570 Aorus Elite.
@mr.cannon3 ай бұрын
@@joeduffy52 Enter the bios in advanced mode - go over to Tweaker tab - go down to advance CPU settings - and enable SVM mode
@faridgasimov17423 ай бұрын
Just returned to leave this comment
@marcusstone62734 ай бұрын
Hey bro I just want to say that you grind is on another level. So good that you can go so for many years and sitll create new channels and succeed. Nice transition to AI content and your views are great too. Hope you get a lot of sponsorship and affiliate deals.
@SkillLeapAI4 ай бұрын
I appreciate that!
@muhammadasad85494 ай бұрын
Brilliant. I have been looking for this videos since meta announced 3.1. Hats off.
@muhammadasad85494 ай бұрын
@SkillLeapAI I should be really thankful if u can make a video to deploy it on server.
@Hilmz4 ай бұрын
Not private, its a hybrid model, caches data when offline then when connected back you can see it sends data. Use wireshark it will show you its sending data
@longboardfella53063 ай бұрын
This is an important comment because many channels are saying OpenWebUI and Llama3 is private to your machine. Is there any way to turn off the cache sending process? I would like to analyse documents that are not permitted to be publicly uploaded
@MihaMartini3 ай бұрын
@@longboardfella5306 Ollama and Llama3 are private, but OpenWebUI might send some analytics and other data.
@bryanjuho3 ай бұрын
Is this true? Any resource that supports this statement?
@DudethatGross3 ай бұрын
@@longboardfella5306 run it in a VM or container that has the network adapter disconnected from the internet, or wifi off entirely
@andresshamis43483 ай бұрын
Llama is literally private, maybe openwebui…. However what would the ui need to cache and send over the internet? Doesnt make sense to me
@tikkivolta28544 ай бұрын
the really interesting part around 07:15 is that in a few months computing power and size of these models will make it possible to run them on practically anything. when they get more effcient we'll have them in our phones.
@SantiagoAbud4 ай бұрын
That's the future of this technology. Not that I endorse it or judge it in any way, but it's the way the development is heading.
@thinhngo72444 ай бұрын
OpenELM from Apple is already runable on mobile devices I think
@tikkivolta28544 ай бұрын
@@thinhngo7244 i ran LLAMA 3.1 8b on my laptop with docker and ollama. like a breeze.
@skywalkerjedi954 ай бұрын
Thanks! This video was awesome and really detailed! Can’t wait to start trying this.
@Ilan-Aviv3 ай бұрын
great video for local ai. easy and clear explanation. thank you!
@iskandarhussain4 ай бұрын
Perfect ‼️ Just the video I was looking for with intro to how to upload files to the model‼️ THX‼️
@GmanBB4 ай бұрын
You have great teaching skills. Thank you for making it so simple!
@pertsiya4 ай бұрын
Thank you for your will to share with us!
@womble_10344 ай бұрын
subscription incoming!! great content, keep up the good work
@FastWReX4 ай бұрын
No joke. I've always hated Docker. Hated everything about it. However, seeing you run the Openweb UI command and it just randomly showed up in the Docker app is making me reconsider. Holy moly! Might have to reinstall it on my Raspberry Pi 5.
@southcoastinventors65834 ай бұрын
Sounds like step one is buy a $5,000 M3 if you want to run it locally now at least before they release smaller quantizations of Llama 70B and 8B
@melaronvalkorith13014 ай бұрын
Llama3.1 8B runs much faster than you can read on an RTX 2060 Super. Not dirt cheap, but I built my PC for $1.4K 4 years ago - should be cheaper now, and I built it for gaming, not AI. You can host it on a desktop and connect your other devices with a VPN like Tailscale. Don’t spend extra money for less performance by going for small form-factor/laptop.
@HitsInSandbox4 ай бұрын
Go grab one of the older servers that can be had for $100 - $150 with the dual XEON 10 core cpu's and boost it up from the 64gig to 128gig running 20 cores it runs well but you won't be able to stick a modern video card in it. But it still does a good job for a system that can be built for $200 Much better than 3 i10 systems running. Also warning get ready for a bigger Hydro bill though.
@CodyAvant4 ай бұрын
I run 8b on my 2020 M1 MacBook Air (8gb ram and 7 core GPU) and the output token rate is much faster than a normal speech cadence.
@mcombatti4 ай бұрын
I have a 9 year old windows computer that runs 13b and 8b models the same speed as your brand new $5000 Mac 👀 it was purchased at a Walmart for $270 in 2015 😂
@bassamel-ashkar40054 ай бұрын
Bullshit detected @@mcombatti
@b34k974 ай бұрын
"We need to go to an app that a lot of people have never used before.... its called 'Terminal'". OMG that line had me dying!
@SkillLeapAI4 ай бұрын
I’ve made videos for 8 years and 99% of people never used terminal
@b34k974 ай бұрын
@@SkillLeapAI No I understand, and it makes perfect sense. Just as someone who's used a terminal at school and at work for the past decade... the delivery of line just tickled my funny bone
@ZavierBanerjea4 ай бұрын
Absolutely hilarious...
@danielrossy74533 ай бұрын
Same dude : ) And then my wife came into the room and said "are you really?" HAHAHA
@kamaboko115 күн бұрын
Awesome! Thanks for the post!!!
@micbab-vg2mu4 ай бұрын
Great - I will try 70B - ) thanks for instruction how to do it:)
I hope they give access to internet. Thank you for this
@wettingfairy67644 ай бұрын
讲的很清楚,很棒的入门指引
@hydron71504 ай бұрын
wanted to try 70b with 4070ti 12gb ryzen 5 7600x 64gb 6000mhz ram and it is pretty slow, it takes 20 seconds to response "hi" prompt 😄
@tikkivolta28544 ай бұрын
i would love for you to create a tutorial how to train these models on specific data. any chance?
@SkillLeapAI4 ай бұрын
Adding to the list
@nohandle80082 ай бұрын
@@SkillLeapAI awesome, thank you. I have a specific use case requiring very specific data to train the model, would love to see how effective it can be. Also concerned with the data flowing back online, can you elaborate on what is sending date back out when the machine is reconnected to the internet as others have mentioned?
@digigoliath4 ай бұрын
I do appreciate this informative walkthrough though! TQVM!!
@vladyslavklochan41814 ай бұрын
Thank you for tutorial.
@Kevin-fp5zo4 ай бұрын
Hello! What kind of hardware setup do you have to have to run OpenSource LLMs? LLMs are actually quite heavy and they require high GPU, RAM, and CPU power. Can you do a KZbin video about which computer power parameters or PC brands are optimal for running them smoothly? I love your content. Keep up the good work! :)
@betabishop31442 ай бұрын
They are indeed quite heavy but in case you haven't noticed, in each of the LLMs documentation there should be a section dedicated to the hardware requirements, google has it, meta has it and the other probably do too. You could look them up like "Llama 3.1 hardware requirements" and the first link should take you there
@yetkindev3 ай бұрын
you are the perfect man thank you
@fangeming14 ай бұрын
How much vram is needed to run the model depends if the model of quantised or not. This should be explained in this video instead of giving contradictory information.
@tikkivolta28544 ай бұрын
as much as you are correct i am fairly certain "giving contradicting information" wasn't the intent.
@moonduckmaximus64044 ай бұрын
Hey do you know where we can get a comprehensive explanation to what we downloaded? i cant afford his course
@naeemulhoque17773 ай бұрын
Nice, straight forward video. Can you make a Hardware buying guide please?
@II-qh7xn3 ай бұрын
worked with issues hats off
@MrAshwin27Ай бұрын
Huge respect
@null46242 ай бұрын
thanks dude, I was able to run llama3.1 8b on a linux laptop with 8gb ram and am impressed..
@shampaghosh12412 ай бұрын
Wow did you do any special modifications? My device is also has 8GB ram and an Intel i3 processor do you think I can possibly run it with decent speed?
@null46242 ай бұрын
@@shampaghosh1241 No special mods, just selected the smallest model and followed the steps from this video and ran some prompts.
@HitsInSandbox4 ай бұрын
The 405B version needs at least 768Gig of Ram as it uncompresses the 200+ gig in memory and runs uncompressed vector links from memory to run effectively. But should you do it it beats OpenAi hands down. Fully uncompressed it could be from 2 to 4 terabytes in size.
@gaathastory4 ай бұрын
Reminds me of last two seasons of TV series Person of Interest…. Store “AI” on massive RAM capacity sticks
@yetkindev3 ай бұрын
you have a great internet :D
@mikemaldanado6015Ай бұрын
Nice video. I dont like u have to sign up for webui and don't like docker for security reasons but your fist two steps helped alot.. btw you can upload or pass large files on the command line. finally microisoft llama does not meet the traditional definition of open source, because it's not. What they did is create a new definition, their definition, and put it in the terms of service..... must be nice to be able to change the def of words willy nilly. also nothing we have today meets the official comp sci definition of AI, not by a long shot.
@rafaeel731Ай бұрын
Thanks for the vid, a couple of confusions to share: 12:18 how can a LLama 3.1 model not know anything about LLama 3 because of delay? it doesn't make a lot of sense Plus you compared 8B on a text exchange while you gave the 70B model a python code to decipher, then you gave the 8b a text file to summarise. We can't compare execution times unless the task is identical
@puccaso4 ай бұрын
9:20 i believe that there is also a docker credentials package that also works, and doesn't require the GUI bloat.
3 ай бұрын
Perfect. Thanx
@mediatechtube2 ай бұрын
Nice video. What are the use cases for running AT locally on your computer at home? Whats the purpose when people can get a subscription? I can think of a few but i would like to know what others think. Cheers!
@MoonyongKim4 ай бұрын
Hi. First of all, thanks for the video. It's really useful and easy to follow step by step. I am running M1 mac air and seems like it's not good enough to run llama 3.1 as it seems to freeze my computer. Which model would you recommend for M1 mac air?
@blackgptinfo4 ай бұрын
Great video. Did you test how many documents you can upload at once and have it summarized?
@SkillLeapAI4 ай бұрын
I haven’t yet but it think it’s a good amount
@sabuein4 ай бұрын
Thank you.
@andyli5414 ай бұрын
Is there a way to bring this local running Llama 3.1 onto my website? I want to share my trained AI with other people. Thanks!
@bilza20233 ай бұрын
There are special server on digital ocean .. but simply you installon your website and make it available throught an API.
@robwin00723 ай бұрын
Great video. Question: can I redirect all models downloaded (installed) in Ollama, to a secondary drive inside my laptop? C: Primary System M.2 SSD 2T D: Secondary SSD 2T
@walter36633 ай бұрын
Thanks for the great tutorial. Can you let me know the defaut path to store chat histories? Is it possible to change it?
@NVX_Ink4 ай бұрын
What would be an affordable, yet ideal, desktop workstation
@terrysh72643 ай бұрын
Hi. TY for this video. I'm wondering, do I need to train the model that I install?
@SkillLeapAI3 ай бұрын
No you can just use it after install
@ArekMateusiak2 ай бұрын
Hi, does anyone knows what is needed to run well full 70B Llama 3.1 model? so it responds quickly?
@RiftWarth4 ай бұрын
Wow the 405B model file size is still smaller than some of the Call of Duty games. LOL
@SkillLeapAI4 ай бұрын
Yea that’s true but it’s basically text. If it was video, it would a million times bigger
@HitsInSandbox4 ай бұрын
It is layer compressed and thanks to Ai is the tightest compression of any type of computer files to exist is AI. Just because it is 250G that is compressed as fully uncompressed it might be more like 2 to 4 terabytes and still using a pointer index for all the words in all the languages it supports. The same compression would take a Blueray movie of 4.5Gig and compress it down to like 50 or 60 meg space. But the layout of data is totally different for different needs.
@hiteshdesai21524 ай бұрын
this is great, thanks for puting in such simple and understandable way. I can run locally now, is there a way where I can point it out this local models to my python code, or my langchain/llama_index application code?
@karthikb.s.k.44864 ай бұрын
Inference looks fast in local what configuration of Mac laptop are using please let me know. Thank you for nice tutorial.
@SkillLeapAI4 ай бұрын
It’s the top of the line m3 Mac with 64 gig of ram
@gRosh084 ай бұрын
Crazy cool.
@gaganmadhan7332 ай бұрын
Can we store the files on cloud storages like AWS S3 and then run it or deploy it ??
@Kingkimabdu40902 ай бұрын
Everything seemed fine until I clicked the link in Docker. The website page opened with an error message stating, "This page isn’t working." Can anyone offer assistance?
@arnolda71504 ай бұрын
Than you so much. Can you tell me how to acces openAi with WiFi on or off?
@kick_kisu3 ай бұрын
Great video. How can I run the docker for open webui on my Linux local server, but ollama running on my windows pc?
@HitsInSandbox4 ай бұрын
You can use msty without docker
@NeoDon14 ай бұрын
Those specs for the 405b are not right. I have 64gb's ram and mine flies.with 4080 super and AMD 5900x
@piska4f2 ай бұрын
Well, if you try to use the same token length that's suggested in their website, it would require much more computing power than what you already have... it's set to 2048 by default...
@Eldorado66Ай бұрын
Well it's probably walking, not flying.
@HolographicKode2 ай бұрын
What's your hardware setup you run this model on?
@SkillLeapAI2 ай бұрын
I’m on m3 Mac 64 gig ram. I can run the 70B model and the small models respond almost instantly
@HolographicKode2 ай бұрын
@@SkillLeapAI mac book pro? mac pro? how much VRAM? (complete configuration). this has to be a $5K+ setup i suspect.
@frankvasquez48273 ай бұрын
If I installed the 8B model first and then I want to install the 70b, Will I have both installed or the largest one will overwrite the 8B? Can I uninstall the models, just to save up some storage space 😅 (asking because I'm not too technical about it, I'm using Windows btw). Thanks in advance.
@SkillLeapAI3 ай бұрын
It will install both and you can choose between them. It won’t override. And yes you can remove. My new video covers that and some new upgrades
@frankvasquez48273 ай бұрын
@@SkillLeapAI Thank you, I will watch it. I managed to install them with this video!
@0reo24 ай бұрын
I see chatgpt is becoming the edge of LLM: hey Chatgpt, how do i install this other LLM i want to have?
@gRAVItation19884 ай бұрын
Great job! I have a M1 mac with 16gb ram. Can i run 8b?
@SkillLeapAI4 ай бұрын
I think so
@tikkivolta28544 ай бұрын
@@uncannyrobotdo you also train models and would care to elaborate? i'd be all ears
@tikkivolta28544 ай бұрын
@@uncannyrobot i will find one, thank you!
@naveenkumarmurugan19622 ай бұрын
lovely
@matrixpredator23 күн бұрын
you can run 70b models with 30gb ram.... and for 8b models you dont need 32. ofc i mean quantized models
@andreac73894 ай бұрын
Hi, is this model multimodal like ChatGPT 4 Omni? I mean, can it generate code, solve mathematical problems, etc., or is it purely a linguistic model capable of easy conversation but unable to handle complex issues? In other words, my question is, do only the models hosted on the servers of OpenAI, Anthropic, or Meta have the capability to manage complex problems, or does this offline model also have that capability? Thank you.
@SkillLeapAI4 ай бұрын
They have some of that capability but online models are going to much better. It’s very difficult to run the better models offline. The best version of llama 3.1 is too complex to run on a computer and OpenAI and Anthropic dont have an open source model that you can run locally
@thevoice6853Ай бұрын
can you do a tutorial on how to do it on windows, thanks
@billl17152 ай бұрын
The steps doesn't show how the models gets loaded onto WebUI. When I'm in WebUI, there are no models.
@SkillLeapAI2 ай бұрын
They should automatically. I didn’t do anything in webui to add them. They were added through terminal in easier steps
@Xiaoklunar27 күн бұрын
On the top corner there is choose model
@zunairakhalid73584 ай бұрын
Can we do a APi call of this local LLM in my code ?
@frankbaron16082 ай бұрын
terminal is called "cmd" on windows.
@christerjohanzzon4 ай бұрын
So, you don't need a fancy GPU to run Llama locally? It does say that you need an Nvidia GPU...but you're running a Mac? Please elaborate.
@SkillLeapAI4 ай бұрын
I have the built in Apple GPU. These are my specs. Chipset Model: Apple M3 Max Type: GPU Bus: Built-In Total Number of Cores: 40
@SkillLeapAI4 ай бұрын
The 8B model should run on variety of GPUs
@christerjohanzzon4 ай бұрын
@@SkillLeapAI Ah, I see. Thanks for explaining. :)
@longboardfella53063 ай бұрын
I believe modern Macs run a unified memory model which combines and distributes the GPU and CPU memory as needed. PCs don’t do this so need a specific amount of GPU ram on dedicated Nvidia GPUs to run models. I have an RTX8000 which is 24GB VRAM which runs all 8B models fine - but completely chokes on 70B models regardless of quantisation. For Macs it’s all about your total memory available and having enough modern GPU cores to do the CUDA processing as I understand it
@filipskerik14773 ай бұрын
The Docker "thing" is only for embedding the model installed before to the webUI? Or its something that pushes all of the stuff to cloud and I don't need that NASA computer? Thanks
@sairajulu43954 ай бұрын
I gave prompt to give a word without vowels, lamma3 gave START then I said its wrong then final output is STRT, still it required more training lol,
@WIWUPRODUCTIONS12 күн бұрын
My docker won't show the address/ports after installing everything. What can be the problem? I'm trying to google it all day and can't find an answer
@Caged_Monuments-x6p4 ай бұрын
So I got it working and My CPU is running 75% and I have a 16 core amd 3950x. My GPU is barely running.
@SkillLeapAI4 ай бұрын
Nice
@Repz984 ай бұрын
*Requirement* at 7:56
@Repz984 ай бұрын
Get 25 of "AMD Radeon RX 7600 XT 16GB". When not using the Llama go back to mine crypto.
@extremelylucky9994 ай бұрын
Would like to learn to do Llama + Groq + iPhone shortcuts to run llama.
@ndidiahiakwo74124 ай бұрын
Will the website version be capabe of uploading documents anytime soon? My computer isn't large enough to support running the offline models.
@SkillLeapAI4 ай бұрын
Not sure. I hope so
@moonduckmaximus64044 ай бұрын
I can try to help you find information on the web, but I'm a large language model, I don't have direct access to the internet..
@sumitksrivastavaКүн бұрын
Why do I get 400: 'NoneType' object has no attribute 'encode' error anytime i try to upload a document?
@ai_snilletАй бұрын
When I try to run open-web ui, Flowise opens up. I guess they are listening to the same ports =/ ? Can I solve this in any way?
@jeremy45102 ай бұрын
Can you do a video like this for using llama in Python
@jitendravishwakarma79493 ай бұрын
gemma2:2b is insane model compared to its size
@nessim.liamani4 ай бұрын
Can we locally remove restraints on LLaMA models, including ethical safeguards? Thanks
@moonduckmaximus64044 ай бұрын
Thank you for your effort. Any reason why my LLAMA3.1 would respond one letter @ a time? 70B
@HitsInSandbox4 ай бұрын
YOu need more memory and likely a AMD or Nvidea video card with at least 8 gig ram 12 to 16 is better. But the 70B needs at least 64Gigs of ram not 32 or less. Otherwise your just using Virtual ram which will eventually crash it as it chokes on larger inputs. If you have the specs then your Anti Virus is likely bogging it down checking all data movement in memory.
@moonduckmaximus64044 ай бұрын
@@HitsInSandbox 4090 with 128 gigs of ram....i dont use an antivirus
@NeptuneGadgetBR4 ай бұрын
Hi, I couldn't run docker on my GPU, I have RTX 4090 which should help a lot, while on CPU is slow, do you have any idea how to enable my GPU on Docker - Windows 11?
@fl0283 ай бұрын
Use --gpu all Option :)
@moonduckmaximus64043 ай бұрын
hey i keep getting a notification that llama 3.5 is ready to update from llama3.1? i click the notification in windows 11 but nothing happens. how can i verify or update llama 3.1?
@volebien4 ай бұрын
thanks a lot. That's what i wanted to do. Now i can upload files and not pay for chatgpt plus. But anyway, it is very slow and uses a lot of cpu. Do you know any tweaks where i can share the workload with the gpu?
@SkillLeapAI4 ай бұрын
Only to use smaller models. Not llama. But like phi 3 or the smaller ones
@SimonFeayАй бұрын
when I go to workspace I don't seem to have the "Documents" tab.
@nosuchthing8Ай бұрын
Wait, what does docker do if the LLM is running?
@dawnbunty74 ай бұрын
i have macbook m3 pro with 18 gb ram and 500 gb harddisk will this be sufficient
@lucifergaming94914 ай бұрын
i use ubuntu my web ui doesnt show any models after correct installation
@IbrahimAli-hf7mq4 ай бұрын
+1
@swarnimdubey3 ай бұрын
How much total of the storage it requires anyway?
@Carlzora4 ай бұрын
Are you able to upload images to use with prompts?
@SkillLeapAI4 ай бұрын
Most open models don’t have vision and if they do, they are not good. I would use ChatGPT for that
@longboardfella53063 ай бұрын
Lava models are pretty good for image analysis
@mohdalki7271Ай бұрын
Can I train my data?
@SelvamuthuMR4 ай бұрын
hugging face llama 3.1 model repo storage is 60 gb but it run very slow for one response and ollama run same llama 3.1model faster but size of llama 3.1 is around 5 gb. what is the difference
@prajwalm.s79764 ай бұрын
Can I fine tune the 70B model and use the open web ui?
@HaraldBendschneider4 ай бұрын
I downloaded the Windows file "OllamaSetup.exe" and installed Ollama. What now? After clicking on the app-icon nothing happenes. Is there any tutorial for Windows out there? Running the app in CMD, I cannot use the shortcuts: C:\Users\user\AppData\Local\Programs\Ollama>show The command "show" is either misspelled or could not be found.
@fotszyrzk794 ай бұрын
Hey! Open cmd once again and type "ollama run llama3.1" you can play with it in command window. I'm looking for interface now, to run and play it in a nicer way - docker mentioned by the OP need subscription - could be 0 USD, but I don't like to subscribe.
@fotszyrzk794 ай бұрын
If you type just ollama it will print the available commands for you.
@HaraldBendschneider4 ай бұрын
@@fotszyrzk79 Thank you! This worked. I can chat in the command window. But I wanted to have an UI and I don't understand what the icon in the taskbar window is for. View logs and Quit Ollama is all I can do.
@longboardfella53063 ай бұрын
@@HaraldBendschneiderOpenWebUI gives you the windows chat interface. I use it on Windows 11 and it works fine with Ollama and Docker. You can look at Matthew Berman’s channel for simple install instructions as well. Bottom line: most tutorials assume Mac but it’s not hard to get it all working on Windows - you have to hunt a bit more for work arounds
@DenofLore4 ай бұрын
1.2 tokens a second at fp16 on a 4090 and 14900k. 😒
@SkillLeapAI4 ай бұрын
wow that is slow
@DenofLore4 ай бұрын
@@SkillLeapAI and that’s the 8B model 🙄
@ronilevarez9014 ай бұрын
@@SkillLeapAI Slow is 2 seconds per token with no previous context using GPT2, which is the only model my pc can load. You guys are being given the world on a silver plate and you still call it slow 😑
@HitsInSandbox4 ай бұрын
Go with the Q8 version and work up or down from their. Remember the Graphics CPU still takes orders from the much slower computer CPU and memory on the MB. If you only have 16GB Ram and a 16GB Nvidia the computer will slow the info flow going to to the video card which will always be waiting for more data which it can not pass fast enough to the RISC based video card. Think of it like a 64bit computer trying to send data to a multi core Nvidea RISC 128/256bit CPU system. The video card is slowed down waiting by as much as 400% for the slow computer CPU more so if it is an Intel which is not RISC designed for information flow. That is where AMD tends to leap ahead of Intel by being designed around the RISC architect. The Intel 64Bit CPU is having to convert and piece together four packets of info before the video card can recognize the much slower computer CPU. Many people today don't realize that Graphic CPU's have left regular computer CPU's in the dust for speed, performance and pricing many years ago. There are RISC graphics CPU's that can be had for around $8 that outperform i10 Intel chips going for around $360. Only problem is the boards to run them are not openly available to the consumers.
@DenofLore4 ай бұрын
@@HitsInSandbox 64gb 6800 and a 24gb nvidia BAR enabled. Z790 aorus master. Mistral’s new 13B with only partial offload (23 layers) gets 9-14 same quant and size. It’s an Ollama-side inference issue. What that is I have no idea I just implement the things and can narrow down based on experience. Yes different models but unless Meta is just bad at that or the model has inherent training and architecture that makes inference difficult to optimize it’s just this generation of engines build that isn’t there yet to handle it. Give it a few updates it’ll probably improve but right now it’s dogshit.
@CodeCraftHub-NAS4 ай бұрын
could you put the file on your web server then use it to be your search and or help?
@JarppaGuru2 ай бұрын
8:41 no need docter blabla bla. we just unzip on windows hit bat