This mini GPU runs LLM that controls this robot

Рет қаралды 37,954

Nikodem Bartnik

Күн бұрын

Пікірлер: 109

@nikodembartnik 4 күн бұрын

The first 500 people to use my link skl.sh/nikodembartnik12241 will get a 1 month free trial of Skillshare!

@X862go 2 күн бұрын

Amazing work, mate 👏

@MilanKarakas Күн бұрын

What is missing here is a memory. Llama can understand few things and may have small amount of memory. But, after you cycling power, it forgets. It will be great to write python script and record all conversation. Also, some type of 3D mapping, where robot can store past experience and mark the obstacles.

@StevenIngram 4 күн бұрын

I hope you get the Jetson working. I like the idea of it being self contained. :)

@domramsey 4 күн бұрын

I think the biggest issue here is your overall approach and your prompt. You have a distance sensor that gives a precise result in cm, yet the quantities you're using are arbitrary "low", "medium". If in your prompt, you tell the LLM the nearest object in front is (say) 85cm away, the nearest to the left is 10cm, and to the right is 200cm away, then ask it to output an angle to turn and a distance forward to travel. So it will come back with "Angle: 20, Forward: 50" or similar, which should be easy for the robot code to process. Make every move an angle followed by a distance, but use actual measurements. Your prompt could probably also do more to get the LLM to guess at the distance the objects it sees are likely to be from it. Oh, and get more distance sensors and mount them at 45 degrees left & right. I really feel like these should be the primary input for guiding movement. Yes, it's entirely possible that won't work at all. :)

@RemoteAccessGG 4 күн бұрын

I think you should make the robot remember previous image outputs (if you haven’t already), so it will have some logic. And also add a lidar sensor if you find that camera hard to setup. Giving the information to the LLM will be tough, because it cant understand what are a bunch of random numbers given to it.

@SLRNT 4 күн бұрын

i think the llm could understand it if given the "format" of the lidar data. e.g an array of 1,2,3 and telling the llm first number(s) mean distance to left, 2 for distance to front and 3 meaning distance to right. ofc the array would be longer and you could average the numbers or just separate the directions with code

@Davidfirefly 2 күн бұрын

that would require some machine learning implementation

@Math2C 3 күн бұрын

Here's what you should do next. Use a computer Vision software to identify only the names of the items in the room. Let it draw bounding boxes around the identified items. Combine the area of the bounding box with the distance from your distance sensor to determine its size. Im not sure you did this already but your robot needs to know its actual location. Use the LLM to distinguish between objects that are permanently placed and those that are laid out. Record the various direction that the rover has looked already. So for each object the rover should know its size, relative direction and the distance it is away from it. Provide that information to the LLM finally to determine which direction it should move. Or if it should rotate.

@OriNachum Күн бұрын

With Hailo-8L/Hailo-8 you can do that on Raspberry Pi at surprising processing power

@v1ncend Күн бұрын

When I saw your first robot, it brought back memories.

@M13RIX Күн бұрын

Man, your videos are so inspiring! They significantly help me not to give up on my own ai projects. I would love to see better improvements in this one, for example complete rejection of paid serviceces in exchange for local, but still high quality ones (for tts you can use coqui xtts - runs localy, has a realtime version + you can clone any voice)

@DearNoobs 4 күн бұрын

i love this project, wish i wasnt so far behind on all my other projects becasue i want one of these too!! hahah GJ bud

@soeasy22 Күн бұрын

Imagine a robot equipped with 16 NVIDIA H100 GPUs, running the 405B parameter LLaMA model, packed with sensors, exploring the world.

@GlobalScienceNetwork 15 сағат бұрын

Yeah, this is a great thought. Easy to create and super powerful. You just need to give it a platform so it can interact with the world as well. We will see many products using this coming out soon. It could be similar to the Tesla Optimus humanoid robot with very little development.

@MyPhone-qg2eh 3 күн бұрын

But your text to speech isn't local.

@colinmcintyre1769 Күн бұрын

You don't want it to be.

@slapcitykustomz1658 Күн бұрын

@@colinmcintyre1769 Why not? Nvidia has local (llamaspeak) text to speech and (Open Ai Whisper) For speech recognition both library both can be ran locally on the jetson

@ChigosGames Күн бұрын

@@colinmcintyre1769 why not? If the computer can create beautiful voices as well then everything could be locally local.

@colinmcintyre1769 Күн бұрын

@ChigosGames you want to utilize as much compute as you can for the best results, I'd asume. By trying to do everything locally, it's instantly much more expensive and less practical.

@ChigosGames Күн бұрын

@colinmcintyre1769 I fully understand you. But to outsource everything to paid API's, real life products will be unaffordable. Imagine making a product that only consumes API's, you could only sell it for a hefty price with steep subscription to it.

@catalinalupak 3 күн бұрын

Great progress on your project. I like your attitude and thinking. You should also try n8n for more local logic on that Jetson Orin Nano. Also will help you to build a map of the environment and have it stored locally, this will speed up also the decision making. Looking forward on your next steps

@yt742k 2 күн бұрын

we are now in age of "talking animals" this prediction is so spot on said

@anonym_user_nksnskdnkdnksndkn 4 күн бұрын

Do you think you could make a drone, controlled by GPT? would be sick XD.

@manuel_elor 4 күн бұрын

@sergemarlon 3 күн бұрын

Seems possible. Drones can hover in space, acting like these robots without moving their wheels. The issue I see is that you would need a large drone in order to handle the payload of the electronics. It may be possible to stream the video from a drone to a stationary PC which then computes and sends the radio signals to the drone.

@GlobalScienceNetwork 15 сағат бұрын

Cool video. Bluetooth latency: ~100-200ms. Just a heads up that this could be one of the issues for real-time obstacle avoidance. WiFi latency: 15-30ms Analog RF systems: ~5-10ms. These systems should be all on board or analog if sending to an extra source for computing. The LLM computing will add further delay but should be quick if you use a trained network. However, if you want to train based on your environment from its sensors perspective I would think you would want to do some training and have a custom network. I am not sure how difficult that would be to achieve. Personally, I am going to try a more basic approach and stay analog for everything and not use an LLM. So it might take me more than 10 minutes to program.

@andyburnett4918 3 күн бұрын

I love watching your videos. They are very inspiring.

@HamzaKiShorts 4 күн бұрын

I love your videos!

3 күн бұрын

For very simple things, a microcontroller could be nice to learn programming, but IMO i think something like raspberry pi (computer with gpio) is much more useful for robotics to start with. Imagine you are creating robot, and you want to change code, see what the code is doing, see camera etc... you find bug in code, you just ssh over wifi, change code with nano, run code, see what it does, etc. Now i agine doing it with microcontroller. Any bug, you need to get to the robot, turn it off, plug usb cable, program it (in case for arduino wait for compile...), unplug, power.... it gets tiring pretty quick.

@MSworldvlog-mr4rs Күн бұрын

You should use ultrasonic sensor in all directions, and you can send a depth map with img to this llm

@loskubalos 4 күн бұрын

No wydaje się ciekawie musze obejrzeć kiedy będę miał wolną chwilę

@jakub38200 4 күн бұрын

tu jest więcej polaków niż myślisz

@loskubalos 4 күн бұрын

@jakub38200 wiem bo Nikodem jest z Polski ja oglądam jego dwa kanały

@bananabuilder2248 4 күн бұрын

Just a suggestion, what if you added LIDAR to improve obstacle avoidance!

@64jcl 2 күн бұрын

Using Llava model image descriptions alone is not really enough for navigation although it is an interesting experiment. A thing you should try is to make your robot scan the environment by rotating it 90 degrees, take a picture, analysis, and repeat that. When you have 4 descriptions you can make a judgement about where to go based on whatever the goal is. Ofc this is somewhat slow though. Also you could run the image through a depth analysis model. That spits out a gradient image based on depth estimation and those are very good at knowing where there best path might be taken, although you'd have to calculate approximate rotation based on what area of the image you decided the robot should navigate to (either towards closest object or where there are no objects).

@jtreg 3 күн бұрын

so good! Messing about with my Tesla K80 today... a bit limited what it can run but llava is ok np

@MrMcBauer Күн бұрын

You should try two LLM´s on one robot. One for decision making and one for the controls.

@MeinSchatzEra 4 күн бұрын

When you love engineering, also love 3D printing and also find the video creator cute ❤, = my world falls apart 😂😂.

@_taki.debil_ 3 күн бұрын

You can also use piper tts, it runs locally and you can train custom voice for it.

@g0ldER 4 күн бұрын

I have a bit called Cozmo, who has some basic internal 3d modeling, it uses this for object permanence! You should try this, it only uses one really bad camera.

@CheerApp 2 күн бұрын

Great project, just a thought...maybe initially the robot should detect types of objects and based of type "book" found then attempt to read the cover title?

@angelbar Күн бұрын

You need to generate volumetric data from stereoscopic images and create a semi-permanent map with constatn updates

@legendaryboyyash 4 күн бұрын

damn I was actually learning robotics just to make this exact same robot controlled with ollama after watching your chatgpt one, I was planning on using raspberry pi 5 for everything, guess u beat me to it and made it even better lol

@nikodembartnik 4 күн бұрын

Thank you! Keep working on your project and make it better than I did!

@legendaryboyyash 4 күн бұрын

thanks :D I'll try my best to meet your expectations :D @@nikodembartnik

@AmoZ-u7b 4 күн бұрын

Hy man I love these series

@OmPrakash-ai 3 күн бұрын

I feel like adding more sensors, like LiDAR, could help the LLM make better decisions. Also, what if all the data from those sensors and cameras were used not just for reacting, but for planning ahead and executing smoothly? It might make the robot feel less… stuttery, you know?

@nikobellic570 3 күн бұрын

This is the coolest thing

@stony42069 Күн бұрын

Seems turning speed is faster than computational speed

@jathomas0910 3 күн бұрын

I’m watching this high as hell, when she said “let’s head over there to the wall where humans hang out” I nearly died laughing omfg 🤣😂😂🤣😂🤣😂🤣😂💀🙏🏾😇 <a href="#" class="seekto" data-time="720">12:00</a>

@truthtoad 4 күн бұрын

maybe adding a flir cam can assist it's navigation, great work. I want a nano😝

@sethunthunder 4 күн бұрын

here before "1 hour ago", creative project bro, keep the work up!

@stumcconnel 3 күн бұрын

This is so damn cool, what a huge step up to have everything local! From that part at around <a href="#" class="seekto" data-time="830">13:50</a> where you omitted all the extra output processing and just let it run around, operating immediately on each result, it looks like it never really paused in its movement, but was processing an image roughly every second? Maybe the images were all just blurry because it was moving and couldn't be processed well? Or did it pause briefly to get each shot? Sorry if you'd already accounted for that, maybe the camera frame rate or whatever is plenty fast enough!

@josh-barth 2 күн бұрын

Explain the prompts and the data exchanged. How did you form context? You say you're changing the prompt and the task but don't tell us what actually changed. How does the machine go? I get the multimodal RAG aspect, but how does the LLM know to respond with an intent to move? Then how is that given to the Pi? What's that datagram look like?

@X862go 2 күн бұрын

Awesome 👌

@power_death_drag 2 күн бұрын

you should add a lidar detector to measure distance seems like it cant tell 5 meters to 10 cms

@christiansrensen3810 3 күн бұрын

I like your vids great job. But before filming you could have cleaned up a bit. ?

@redthunder6183 Күн бұрын

if your using ollama, you should try setting the context window to be bigger. the default is 2,048 which fills up very fast after 2-3 api-calls especially with images. If ur using a GPU like the 4060ti, you can bumb that up to at least 16,000 easily while still having the same performance. I have 12GB vram, and I am able to run llama3.1 8b with 28k context size for a comparison. This should help significantly for things that require more than 3 steps, you can also keep track of the prompt size as it builds up to know when it overflows and starts to truncate it and forget stuff. Also as for navigation, the LLM has no context of its position, where it is reletive to the world, etc. you would need to design a system to give it enough information to be able to gauge its relative position to make informed decisions. for example, if you are able to get the relative position of 3 random points, it should be possible to triangulate you exact relative position, and you could overlay those 3 points as 3 different colored dots on the image. this is a bad example cuz your asking the LLM to triangulate its own position, but it shows the idea of modifying the image to put it into context more.

@newmonengineering 3 күн бұрын

I just got my orin edition a few days ago. I am having issues using the GPIO part but Im not going to bother with it, Im just going to use an arduino or esp32 for that instead and communicate with serial over USB. My last Jetson worked well and I was able to use everything but it was significantly slower than the newer version. My only question for you is: are you using ROS2? Or are you just running your own routines entirely? I ask simply because I know ROS2 is capable but can be a real pain to setup and get working. I am assuming you may not be using it because of the position requirement for motoes. I.E. need encoders for speed and direction feedback. I know there are a few hacks around this so I was wondering if you did some sort of hack to use ROS2. Great job with the robot though. He/she is a pretty neat robot. Thanks.

@grigorione 4 күн бұрын

put some gps / topographical ability ( cameras on ceiling etc ) and add a way for it to draw a picture as it moves , like a printer .. kind of :D

@OZtwo 4 күн бұрын

Very very cool! I been waiting for someone to try this! I stopped playing with my robots when LLMs came out knowing it would be better than DL -- was I right? Please prove me right! :) Also mixing both Pi and Jetson you get much better overall servo control as the Jetson really has no power to support them. Very cool! Hint: I hope you use two LLMs that can talk to each other as our brain works... (edit: are you using the LLMs API or directly chatting to it?)

@sandinopaulguerroncruzatty4440 Күн бұрын

Would be interesting a little dron with AI

@miltontavaresinfo 2 күн бұрын

Very nice 👏🏽 New subscribe!👍🏽

@Willie-vr6gk 3 күн бұрын

Jetson can output good signal (I think at least 16kHz), so why don't you connect to Jetson's output pins the speaker (I think that amplifier isn't really required here)?

@jros4057 2 күн бұрын

Are the GPIO on the jetson working at all like can you get an led to flash? Or is it just not working as intended?

@erniea5843 23 сағат бұрын

Working with Jetson nano is such a pain in the A

@nikodembartnik 22 сағат бұрын

Why? So far it seems like any other raspberry pi/Linux based sbc

@simeonnnnn 3 күн бұрын

Zima Blue 💙

@tsungxu 3 күн бұрын

could you just use the Orin Nano without the Pi? It can basically do everything the Pi can do right but with much more compute

@alrimvt02 Күн бұрын

hubieras usado xtts para la voz mejor, gratis y local, ademas el modelo que deberias usar para el proposito que deseas es gemini flash 2.0 este si cuenta con una vision mucho mas avanzada respecto al modelo que usaste.

@beastmastern159 2 күн бұрын

u can use rasberry AI hat whit hailo 8 module to run llama 3.2 module, rasberry 5 have great grafic computing capaciti but the AI module is great for runing IA models, i follow u, i like ur content keep going

@grigorione 4 күн бұрын

can you let it make bar codes it can stick to entry ways so it acts a certain way when it enters a room?

@Math2C 3 күн бұрын

Doesn't the jetson have an i2s interface?

@5fsrinikshith436 4 күн бұрын

under 15 mis batch here!!

@narpwa 2 күн бұрын

you should buy a good mechanical keyboard, your fingers will thank you because what you got looks like a pain to write on or be productive with

@el3print 3 күн бұрын

"end the project before it is too late...' ;-) first robot-killer...? By the way, happy 2025

@maglat Күн бұрын

Just use Piper for TTS

@Dj-Mccullough 2 күн бұрын

I'd suggest Llava-Llama rather than just llava

@nishant3899 4 күн бұрын

Hello brother, I really wanna build robot. But it's very complicated for me. Can you please make a easy robot to start with. Please 🙏🙏🙏

@g0ldER 4 күн бұрын

<a href="#" class="seekto" data-time="310">05:10</a> you should try using the Intel Arc B580, it has more VRAM than the 4060 (useful for LLMs) and is way cheaper

@Willie-vr6gk 3 күн бұрын

But slower (OpenVINO is really slow for now)

@Davidfirefly 2 күн бұрын

theres a better choice you can explore ok use the rasp pi 5 with the AI hat 28 TOPS and also combine with the rasp AI camera and thta it for the logic also try adding a 360 lidar sensor and a couple of 24ghz radar human presence sensors your project has a cool.

@irkedoff 3 күн бұрын

I would love to know where I can get one of these powerful GPU's that are not that expensive. My 4090 seems expensive.

@Willie-vr6gk 3 күн бұрын

You can use some Radions (AMD's GPU), they are supported for Ollama directly. I also tested, you can drive Intel GPUs, but it isn't supported by Ollama, I change little bit of Llama.cpp's source code to work.

@Willie-vr6gk 3 күн бұрын

Also, you can use multiple GPUs at once

@AB-cd5gd 3 күн бұрын

Try with deepseek vl2

@drj2220 Күн бұрын

great!

@narpwa 2 күн бұрын

skill chair 💀

3 күн бұрын

Use mistral pixtral or lllama 11b

@jp5862 2 күн бұрын

설레이네..

@indramal 4 күн бұрын

Where is code?

@zebatagirl1348 Күн бұрын

So dont get the Raspberry pi AI hat?

@anispinner 3 күн бұрын

Niko Bellic

@phizicks 2 күн бұрын

just use a pulse audio the sound to remote pi wiith a tcp pulse server like I have for my system

@stardobas Күн бұрын

Wonderfull and Very intresting project... For me llava Is too slow... I Will try a new approach using not multimodal llm but using tool or python implementation using yolo to find object and position (about 5 FPS) ... And Ros 2 for navigation ... All into pi5...

@takeraparterer Күн бұрын

"offline" and then it uses paid closed source APIs

@imdaboythatwheheryeah 2 күн бұрын

Says he wants a local system, 5mins later uses a payed TTS service because he gets money from them. Your goals are flimsy and disappointing

@ecp5758 4 күн бұрын

I have the jetson nano and I did a project a couple of years ago (kzbin.info/www/bejne/a3LShWyEdt6Maa8), but the truth is that the configuration is very complicated, I prefer the raspberry pi whenever I can use it, it just needs a GPU (kzbin.info/www/bejne/gJO7qJVjoJyCm8U). Very good video!!

@g0ldER 4 күн бұрын

<a href="#" class="seekto" data-time="215">03:35</a> trump is putting tariffs on GPUs… yay.

@SaintLouisWeatherClub 4 күн бұрын

Sorry, what were you saying?

@RenardThatch Күн бұрын

So you need to figure out how to tell it how much distance it would cover with any number of predefined scripts you use to execute movement or get super sweaty an give it all the specs of the hardware its running on so it can determine how far or fast to go with the pi measuring power to each motor... Then give it a sense of scale with a secondary model designed to run on stereoscopic cameras... Use that model to give the dimensions back to the decision making model and then figure out how to run that jetson on it to keep it all self contained and then... I think you should call Mark Rober because youre basically designing a curiosity rover for consumers at that point.