ESP32 stereo camera for object detection, recognition and distance estimation

Рет қаралды 48,755

Жыл бұрын

In this video, I show how I made an esp32 stereo camera, which can be used for object detection and distance estimation, with quite good results. The camera consists of two esp32-cams, a perf board, some female header pins, some wires, a battery shield (with 5v output and charger) and a battery. The camera streams into a python notebook, and uses the mask-rcnn model from the pytorch library for segmentation of left and right images. Then it matches the objects from the left image to the objects from the right image and does distance estimation.
GitHub code:
github.com/jonathanrandall/es...
Artificial Intelligence graduate certificate that I am doing:
www.latrobe.edu.au/courses/gr...
And graduate diploma:
www.latrobe.edu.au/courses/gr...
Daniel Rossi’s ESP32-CAM Python stream OpenCV project:
www.hackster.io/onedeadmatch/...
Battery Shield
www.aliexpress.com/item/32870...

Пікірлер: 132

@belalsafy7993 6 ай бұрын

I think this is the best video ever on KZbin. It contains exactly what I need and is clearly explained. Thank you with all my heart. Please keep me going.

@jonathanr4242 6 ай бұрын

Thank you. That is very kind of you.

@zachd4785 9 ай бұрын

Jon, trying to wrap my head around this process, but must say, it's clear you know what you're talking about! Very well done on this video and the project.

@jonathanr4242 9 ай бұрын

Thank you, for the kind comment.

@deoarlo 2 ай бұрын

Amazing work Jonathan! Esp is hands down my favourite board so thank you for documenting your work!

@jonathanr4242 2 ай бұрын

Thank you, Deo. That is very kind.

@ArduinoTex Жыл бұрын

Quite an interesting idea for the project. Thank you for the useful information.

@jonathanr4242 Жыл бұрын

Thank you. That is very kind of you to say.

@avikdas18 Жыл бұрын

Amazing tutorial. Thank you so much for the beautiful explaination ❤❤

@jonathanr4242 Жыл бұрын

Thank you, Avik.

@aditya_01 10 ай бұрын

Thanks that was reallly helpful

@gouledawad2377 Жыл бұрын

Great content. Thank you for making it easy to understand.

@jonathanr4242 Жыл бұрын

Thanks Gouled

@Tech_Talks_By_Marsh Жыл бұрын

Appreciate your time and good work ..kudos

@jonathanr4242 Жыл бұрын

Thank you. I appreciate it.

@sermadreda399 Жыл бұрын

Great video, thank you for sharing

@jonathanr4242 Жыл бұрын

Thank you for the kind words.

@maheshpatel2005 Жыл бұрын

Nicely explained

@jonathanr4242 Жыл бұрын

Thank you, Mahesh.

@OMNI_INFINITY Жыл бұрын

Thanks for posting that and the files! Been wanting a nice stereo camera to use with an HMD, so I guess I may build a small edition of that.

@jonathanr4242 Жыл бұрын

Sounds Awesome! There is so much cool stuff you can do with a stereo camera. I made my own because I had some extra esp32-cams. But, the good thing about getting a proper stereo camera is that you can sync the frames a bit more easily, which is important if you want to do stereo video. I had some difficulties with synchronisation when I tried to use the esp32-cams for stereo video.

@OMNI_INFINITY Жыл бұрын

@@jonathanr4242 Oh? Surprised. With short wires the latency should be short enough to synch the first frame.

@jonathanr4242 Жыл бұрын

@@OMNI_INFINITY I was using wifi with frame buffering. I didn't realise it at the time, but one solution would be to grab the most recent frame in the buffer, and drop the others.

@OMNI_INFINITY Жыл бұрын

@@jonathanr4242 Ah. Can connect rx and tx of those seemingly.

@rithikroshan7252 9 ай бұрын

amazing video!!!

@jonathanr4242 9 ай бұрын

Thank you

@3dnoob Жыл бұрын

Wow! This is really great, you explained and simplified a lot of topics in this video. I hope if you can modify your stereo-vision setup to create a structured-light scanner with a projector and OpenCV.

@jonathanr4242 Жыл бұрын

Thank you. There are a number of projects, tweaks and modifications I'd like to make. The structured-light scanner is a very good idea.

@R00kTruth 8 ай бұрын

I wish @@jonathanr4242 that I had came across your channel/video before I bought the kinectv2 adapter!!!

@chenbenzvi2164 7 ай бұрын

First, this video is amazing! Deployment of the model and communication with the esp32 are quite straight forward. The core of the project is the tracking and calculating the distance which I felt you kind of breezed over. I wish you could do an in depth video just on that part. A great video anyway though . Keep it up After a second look - from my understanding there is a dangerous state where the object is close to the cameras. It will calculate the distance from the opposite corners of the object which will return...something that is probably not correct. Good luck! (The object should be symmetrical like a screen so the model predicts it as the same class)

@jonathanr4242 7 ай бұрын

Hi. Thanks for your comment. Yes the project needs a bit of tweaking. There are probably a fair few heuristics one can employ. In fact, every AI project will have some tweaks applied that will greatly improve them. Also, I want to say that I discovered in another project that when I bring in the image from the esp32 cam, I'm taking it from a queue, so that I don't get the most recent image, I can get a bit of a delay. To fix this, I found this wrapper for the image capture stuff which avoids this bug. I've used the wrapper in another video with the time of flight camera. The github repo should be referenced in the description of that project. So basically, it uses a thread to keep reading eliminating frames in the queue so that when you want to grab a frame, it grabs the most recent, which is essential if you have moving objects.

@collinb5524 Жыл бұрын

Very interesting stuff. Im currently learning openCV from Paul Mcwhorter. Please keep posting

@jonathanr4242 Жыл бұрын

Thank you, Collin. Yes, openCV is powerful stuff. I am working on another project with the stereo camera which I hope to post in the next couple of weeks.

@darthvader4899 2 ай бұрын

great amazing video! another great video buried under you tube cemetery

@JamesNewton Жыл бұрын

Have you heard of the old laser line trick for doing distance measurements? Super cheap, super fast, one camera and a (you guessed it) laser line. This is nice, because it tells you want the object is, which the laser line won't do. Nice work!

@jonathanr4242 Жыл бұрын

Thanks James. I will look at the laser line trick. I am planning to add a LIDAR to the esp32 camera at some stage. Hopefully, this will happen in the near future sometime.

@JamesNewton Жыл бұрын

@@jonathanr4242 LiDAR are the cool toy everyone wants to play with. And they are wonderful, not doubt. A little spendy, but I'm seeing that the prices are getting down into the below $100 range? Hmmm... looks like a lot of those are NOT really LiDAR. LOL. The real problem with LiDAR and most 3D cameras is that they bury you in data. In the same way that the NN takes several seconds to lock in on the objects in this project, sorting through the LiDAR data and making use of it takes time. The laser line thing is nearly instant. It's a great first shot at ranging, which then helps things like image recognition because instead of handing the NN an entire frame, you can hand it just the one area where the object is known to be. That really speeds up recognition. And the total cost is like $5... or less actually, you can get cheap modules for a $1 from china.

@jonathanr4242 Жыл бұрын

Thank you, James. I will definitely look into it. Is the module you're referring to a time of flight sensor? Do you have any references for projects that have done this?

@JamesNewton Жыл бұрын

@@jonathanr4242 I did it multiple times back in the 80's and it's been done many times since then but usually in slightly different form. It's nothing more than a laser line connected to an io pin to turn it on and off, and a camera. You capture a frame with the laser line off, then turn it on and capture another frame. Subtract the two frames, and the only thing left will be any reflections of the laser line against objects in front of the camera. Because of parallax between the line and the camera (the line is placed a bit below the camera), the distance from the bottom of the picture, up to the pixel where the line shows up, will be the range to the object. To clarify, the image subtraction is pixel by pixel. The first pixel (upper left corner) of the first image has a value (or values in RGB) and you subtract that value from the first pixel of the second image, making a new image with a pixel value which is the difference. One common use is with a turntable to make a low cost 3D scanner. like this from my site: techref.massmind.org/techref/new/letter/news0310.htm I'm suggesting that you don't use the turntable and you turn the laser and camera on their side. The laser projects out a plane under the camera, and only detects things that are resting on the surface. But then... what doesn't rest on the surface?

@jonathanr4242 Жыл бұрын

@@JamesNewton Thanks James! That sounds awesome. It gives me so many ideas. I had a project where I tried to use a laser to guide an electric water gun to the target. But I couldn't get this to work because the red light from the laser saturated the camera (showed up as all white), so it didn't work very well. I think you've just told me how I can solve this.

@dhiyaulhaqfairaazjamil3134 8 ай бұрын

Hello sir, Can it be used to detect 3-dimensional geometry shapes?

@jacekf 10 ай бұрын

Binocular camera product introduction: Based on the OV2640 camera, the binocular camera switches display, the pin is compatible with the standard 24P camera, and can be directly connected through the 24P FPC cable. The binocular distance is 60mm, which is close to the real body distance of the human body. It can be used for deep visual research. MaixPy provides a binocular operation interface.

@arnabbarua1670 Жыл бұрын

Amazing video and well explained Do you have any video, on how we could implement stereovision on human flow with esp32cams in realtime?

@jonathanr4242 Жыл бұрын

Thank you, Arnab. I'm working on something at the moment with a depth camera that might help. Hoping to have it out by the end of this month or middle next month at the latest. R-CNN is a bit slow, might be better to use YOLO if you want to do it in real time. In the video after this (esp32cam for autonomous task completion) , I do an implementation with YOLO, which is much faster than the R-CNN implementation.

@arnabbarua1670 Жыл бұрын

@@jonathanr4242 Yes, I have been seeing you are doing a lot of work with esp32 cams and implementing it with various case. I have seen the video on your yolov8 implementation. I have one concern though, As i am working on people counting, i might use it at night too. In that case, can stereovision help with IR footage?

@jonathanr4242 Жыл бұрын

@@arnabbarua1670 I don't see there would be an issue with using two IR cameras, as long as you can get a decent image from each camera and do feature mapping. As well you can estimate the size of the person, so can estimate how far a way they are from a single camera, which might help a bit. I mean it might be an advantage if you have both cameras with IR LEDs because you'd have more IR light, but on the other hand there might be too much IR light (I'd say the more the better, given those IR LEDs are not very bright). Sounds very interesting. I was looking at one of those the other day for raspberry pi, but it was a bit more expensive than I wanted to spend.

@ansjaved3484 Жыл бұрын

Hi, I appreciate your work. You have done an excellent tutorial. I would like to ask about the system requirements to run the MaskRCNN_ResNet50_FPN_V2. Can you tell me about it? As I'm doing the same work but running on jetson nano. I'm not able to find a quite fast object detection model with good accuracy. I will appreciate your help. Thanks!

@jonathanr4242 Жыл бұрын

Hi Ans. Thank you for your kind words. I am using an old-ish lenovo yoga slim laptop. It takes around 5 seconds per image. In my next video I use YOLOV8 (model m), which is about 10 times faster. I haven't tried on a jetson nano, but I would like to. Please let me know how it goes.

@jacekf 10 ай бұрын

1pcs Sipeed OV2640 Binocular Camera Development Board Stereo Vision Depth Vision

@kevindarren756 5 ай бұрын

Amazing video, just got some minor questions: - Is it a fair assumption to make that the focal length of this ov2640 is the same for all the of them from different esp32cams? - Have you worked on calibrating the sensor to get the projection matrix? Regards, Kevin

@jonathanr4242 5 ай бұрын

Hi Kevin. Thanks for your comment. In my experience, ov2640 from different sellers (buying from ali express) can have varying degrees of quality. So, I would say you need to calibrate every one of them. I haven't really spent much time on calibration. I usually just plug in the sensor and hope for the best.

@truetech4158 Жыл бұрын

It would be nice to have stereoscopic cams linked to a VR eyewear setup, and the cams mounted on the face of a droid robot that walks around, realism for when you are too lazy to go to the mall, but also want to your robot to provide for realism as if you were at the mall. Stereoscopic hearing, sight, even stereo nostril sensors to detect from where exactly smells are occuring from. This could detect covid in a crowd this way if anyone there farts and expells a observable testable dangerous toxic invisible cloud, a life saving alert can be given to people preemptively.

@jonathanr4242 Жыл бұрын

That sounds very cool, indeed. It would be amazing to build something like that.

@sidharthpisharody 9 ай бұрын

When you replace d3 with dc, what should be the pixel count p. Will it remain the same as the number of pixels of object or the pixels coming within dc distance?

@jonathanr4242 9 ай бұрын

The pixels making up distance dc. I'm trying to replace all distance measurements with pixel counts, so that it makes sense in digital image processing.

@m7mold128 9 ай бұрын

can you do a tutorial on connecting the hardware

@jonathanr4242 9 ай бұрын

Thanks for your comment. It's actually very simple to connect the hardware. All I do is make sure the two esp32 cams are on a grid, so that they are fixed in position, and then upload the sketch to them. In all honesty, the kind of stereo cam I've used is a bit messy. I would probably recommend using a proper usb type stereo camera.

@MunavirZamanPK 2 ай бұрын

thank you for the video how to upload code bez there is no port? is there any hardware circuit diagram

@jonathanr4242 2 ай бұрын

you need to use an FTDI to connect the ai thinker to a usb. You can get these from Amazon or aliexpress. There are other esp32 cams which have a usb conneciton, like the XAIO esp32 cam. They're generally a bit more expensive, but worth it.

@mohamedmohsen1859 Жыл бұрын

Hello, thanks for this great video, I tried using your code with different images it gives me wrong distances, it only works with the image in your repo

@jonathanr4242 Жыл бұрын

Did you calibrate your camera?

@utkucicek6664 2 ай бұрын

Do you think it would be possible to do slam with the stereo cameras? My only worry is the processing power. Thanks for sharing this tutorial!

@jonathanr4242 2 ай бұрын

Yes. 100%. There is a great clip of Elon Musk with Andrej Karpathy saying lidar is pointless for cars. But you’re right about the processing power, especially for a small hobby robot. As well, there are still challenges that need to be addressed, like visual understanding.

@junyang1710 8 ай бұрын

Can you do a project, which uses only one camera and odometry to detect if the distance between two cars is enough for my car to side parking? Sometimes it is very hard to park if the distance is not enough.

@jonathanr4242 8 ай бұрын

Hi. I think it would be possible, but would require decent labeled data for training.

@Moon-D0G 4 ай бұрын

Thanks for the great video. Will this process be faster with my own gpu ? I'm thinking to implement this to my school project. Camera will track the object, i thought it will be more precise but if it's thaat slow 😢

@jonathanr4242 4 ай бұрын

It will be much faster with a gpu. Also YOLOv8 will be faster than mask r-cnn.

@Mohammedalhalabi16 Жыл бұрын

Question: If you disconnect the internet from it, will it still work on tracking moving objects without its connection to the internet, or should it be connected to the internet in order to continue tracking the movement of objects, and if it works, can we make a project to track an object using Arduino Uno

@jonathanr4242 Жыл бұрын

You can run it on a local network, but it needs a computer that can run open cv and pytorch.

@tusheyy6033 3 ай бұрын

hello! the feed of the camera appears to be slow when object recognition and distance estimation are running, how do i speed it up?

@jonathanr4242 3 ай бұрын

Thanks for your message. There are a couple of things you can do. 1. If you want to speed up the object recognition part, you can use YOLOv5 or YOLOv8 instead of r-cnn I am using. This is much faster. Also running it on a GPU will make it faster. I have implemented this in the next video. The code is on this github repo: github.com/jonathanrandall/autonomous_task_stereo_cam 2. Secondly, if you just want to speed up the video capture part, you can try running on a separate thread. See the solution to this question on stackoverflow: stackoverflow.com/questions/43665208/how-to-get-the-latest-frame-from-capture-device-camera-in-opencv You might want to try moving the imshow inside the videocapture class if you implement the above thread.

@mohammedrhaiouz2668 3 ай бұрын

Hi i just wanna ask you if i can use this camera to detect only white ball and estimate the ball'trajectory?? W8ing for ur answer please 🙏🏻

@jonathanr4242 3 ай бұрын

Thanks for your question. You’d be better off looking at key point detection in something like yolov8 or v9.

@migueldiaz8669 Ай бұрын

I wonder if you would consider a new project for a stereo cam than can film fast action sports with a 180 FOV using a raspberry pi.

@jonathanr4242 Ай бұрын

Very cool idea.

@59vijaiyaaravindthsr39 11 ай бұрын

what the maximum range it can deduct the distance of an object?

@jonathanr4242 11 ай бұрын

Thanks for your thought inspiring comment. Using 640x480 resolution, I think I was able to accurately measure up to around 120cm (if I remember correctly ?). As the resolution increases, so does the maximum distance. You can calculate the theoretical maximum by taking the left/right distance apart as 1 pixel. But I would probably divide this by at least three, taking into account practical considerations. But then I guess the distance resolution would also decrease as the distance away increased.

@welsonfy5246 Жыл бұрын

Hello, good job! with stereo cameras where can we find the width, height, fx, fy, cx, cy values from a photo taken or a video? Thanks in advance

@jonathanr4242 Жыл бұрын

Thanks. Most of the calibration information should be in the python notebooks on the github.

@jonathanr4242 Жыл бұрын

Also, I should probably mention that my approach is fairly simple. I'm only using the distance of the object from the camera, and the distance apart of the two cameras. My assumption is that both cameras have the same focal length.

@welsonfy5246 Жыл бұрын

Thank you but the parameters when taking the photo or video which concerns the values width, height, fx, fy, cx, cy that I quoted how are they found?

@jonathanr4242 Жыл бұрын

@@welsonfy5246 I think my approach is a little different to what you find in the text book. I have two parameters, which are tan(theta) and focal length. And then, I take the corner point of the object as the matching point in each image to calculate the distance. I go through the calculations in the video, and they should be included in my code. There is room for improvement. For example, one improvement could be once you find the matching objects you can correct for parallax.

@abang_poi 10 ай бұрын

Hello, sir. Thank you for your content which helps me to complete my assignments at school. I'm from Indonesia, I'm confused about how to take an image from the file name so that it displays two different images, for example the left image is 50cm and the right image is 50cm, can you help me sir?

@jonathanr4242 10 ай бұрын

Hi. I’m not sure I understand the question.

@MuhammadIqbalft 21 күн бұрын

To count the distance estimation, do we need to have 2 cam?

@jonathanr4242 21 күн бұрын

@@MuhammadIqbalft for this method you will need two cameras but there are other methods that might use one camera, although I couldn’t name any off the top of my head.

@Sumathy.v Жыл бұрын

I am new to embedded system,,can we suggest me where can i learn esp32 programing(not using adruino ide)

@jonathanr4242 Жыл бұрын

My go to pla es are dronebot workshop, random nerd tutorials and Paul mcwhorter from toptechboy.com, the best way to learn is to do projects.

@Sumathy.v Жыл бұрын

@@jonathanr4242 hey man ...thanks for the reply u gain my respect as a teacher 🛐

@amalcz6066 4 ай бұрын

hi, currently i doing distance estimation project from your video. i made the set up exactly like yours. esp32 cam kept at 7.05 cm away. do i need to calibrate my camera, or can i use the calculated value from the video for focal length and tantheta, since we are using the same setup? what are the values should i change in the code for tunning it, apart from focal length and tantheta? I have to submit the project soon😢😂

@jonathanr4242 4 ай бұрын

You need to calibrate the camera. Basically, you need to measure the same object at a few different distances and adjust the focal length and tantheta.

@shreshthjha997 Жыл бұрын

Hello sir.. i am doing a project exactly same as you have done in the video previously i was using 1 esp32 camera and everytime i got wrong distance of the object..sir can you help if i use 2 esp32 can i do it like you

@jonathanr4242 Жыл бұрын

You need to use two esp32.

@shreshthjha997 Жыл бұрын

@@jonathanr4242 Sir, Should I train model for object detection? I had used two esp32 like you but it was not detecting any object like person etc. Can you please help?

@lancelotnub 11 ай бұрын

sir, how to solve it ? Traceback (most recent call last): File "C:/Users/ASUS/OneDrive/Documents/test.py", line 59, in faces = face_classifier.detectMultiScale(gray) cv2.error: OpenCV(4.5.3) C:\Users unneradmin\AppData\Local\Temp\pip-req-build-uzca7qz1\opencv\modules\objdetect\src\cascadedetect.cpp:1689: error: (-215:Assertion failed) !empty() in function 'cv::CascadeClassifier::detectMultiScale'

@jonathanr4242 11 ай бұрын

It's difficult to tell without know what part of the code is causing the exception. Usually, I would try to isolate the error.

@goldcoasttime Жыл бұрын

johnathon where do you live are you interested in colaborating on a project i'm on the gold ccoast

@jonathanr4242 Жыл бұрын

Thanks for you comment. I am quite far from the gold coast. I am in Sydney. Happy, if you have any ideas you want to bounce off me.

@hamid9083 10 ай бұрын

Hi, thank you for this great video. I learned a lot from you just in this one video. But I think you made a mistake in one part of the code to calculate the focal length. you use this formula: f1=30-38.44*50/68.75 I think you must use this: f1=(30-38.44*50/68.75)/(1-38.44/68.75) I also have a question. Can we use this distance to get the horizontal and vertical distance of the object to the camera?

@jonathanr4242 10 ай бұрын

Hi. Thanks for your comment. Yes. It’s not difficult to measure height and width. You can do it from the geometry by taking the ratio of pixels across to the pixel width.

@waltherhumberto271 4 ай бұрын

Can I create a model on YOLO8 to find especifc mouses , can I simple deploit in the program ?

@jonathanr4242 4 ай бұрын

Yes. The only issue is you will need lots of training data.

@waltherhumberto271 2 ай бұрын

@@jonathanr4242 thank you, I did and now is working perfect, one thing yet is that ,I saw your video many times, and I still didn't get the ideia behind the focus legnth calculation, could you write a resume here please ?

@waltherhumberto271 2 ай бұрын

or give any paper ?

@waltherhumberto271 2 ай бұрын

@@jonathanr4242 I didn't get how you got the formula f = 30-38*50/68

@jonathanr4242 2 ай бұрын

@@waltherhumberto271 this is the formula at the bottom right at 15:38. I’ve placed the object 30cms away for calibration and measured the other values manually.

@a7medrafat31 Жыл бұрын

I have a problem with the weights file i dwonloaded it but still got the same problem

@jonathanr4242 Жыл бұрын

what is the error that you are getting?

@a7medrafat31 Жыл бұрын

@@jonathanr4242 pytorchstream readerfailed reading zip archive failed finding central directory

@jonathanr4242 Жыл бұрын

@@a7medrafat31 Oh! I've never seen that error. In the next video, I do the stereo pair matching with yolov8. I've put the notebooks on the github for those as well. It's much faster than pytorch, but not as reliable. And you can tweak it, I suppose. The git repository is called autonomous_task_stereo_cam

@a7medrafat31 Жыл бұрын

@@jonathanr4242 i resolved the error but i want to know how to make inference continously ?

@jonathanr4242 Жыл бұрын

@@a7medrafat31 You will probably need a stereo camera that can synchronise the left and right images. If you're using the esp32-cam, there will be a lag between left and right images. Also, you will most likely need to implement the object segmentation and detection on a GPU to get it to keep up with the frame rate.

@mohamedabdelmalekbouchrika6134 10 ай бұрын

can it be in realtime detection?

@jonathanr4242 10 ай бұрын

Yes. You would need to run it on a gpu, and also YOLO would be a bit faster.

@homeexperimenter5781 Ай бұрын

Is there a way to reduce processing time (Possibly realtime)?

@jonathanr4242 Ай бұрын

Yes. To do in real time, you would need to use Yolov8 and run on a gpu.

@user-bx9rn5ix6q Ай бұрын

The main thing you can try is using smaller images. Pixels count increases by O^2 when increasing dimensions which means significant memory savings from downsizing images. The latest YOLO versions have lightweight versions (tiny, nano) which are designed to run faster on basic hardware. You can probably get 5-10fps on small images with v8-tiny on a decent laptop.

@homeexperimenter5781 Ай бұрын

@@user-bx9rn5ix6q Thanks a lot for reply, we will probably test on 240 x 240 images initially. Thanks again

@homeexperimenter5781 Ай бұрын

@@jonathanr4242 Yup, we are working with YOLOv8 nano for now. Thanks a lot.

@homeexperimenter5781 Ай бұрын

@@jonathanr4242 We are working with Yolov8 nano for now. Thanks a lot