Multiplane Video - Volumetric Video with Machine Learning

Рет қаралды 28,553

Josh Gladstone

Күн бұрын

Пікірлер: 86

@grimsk Жыл бұрын

Everyone will be very interested in seeing old youtube videos in instant stereo like this.

@vibeknight 2 жыл бұрын

Really excited to see this channel pop up with a new video!

@cfrend Жыл бұрын

Love the exploratory workflow format.

@Atmosphaeres 2 жыл бұрын

It really does look like the example from Minority Report, which is amazing. Great work, Josh!

@grinhorum3855 2 жыл бұрын

kzbin.info/www/bejne/m5Pccmqghtupa5I

@ShinyTechThings Жыл бұрын

You are worthy to be subscribed to 🤓😎

@JoshGladstone Жыл бұрын

🙏

@cristiandominguezreinloring 2 жыл бұрын

Gotta love the name of the script!

@JoshGladstone 2 жыл бұрын

🏢 🦍

@wrestlingcommunity 2 жыл бұрын

Hey josh. It’s chris from down the street on moncado. I’m stoked to learn about this! Thanks for the video

@Naundob 2 жыл бұрын

Great work! If I had your tools at hand I would probably test this on 3d movie scenes. Avatar, Life of Pi, Pixar stuff and so on.

@JoshGladstone 2 жыл бұрын

Good suggestion, I can give that a shot. I actually have a pretty big collection of 3D blurays

@echeng 2 жыл бұрын

This is all going to shine if depth estimation ever becomes really good.

@Copa20777 Жыл бұрын

2023 and it is😂

@fhoniemcphonsen8987 2 жыл бұрын

Cool. Came here from Reel3D.

@danrojas 7 ай бұрын

Josh amazing work! Thank you for sharing! I think now there are a lot of interests of multiplane video with the Apple Vision Pro and their Spatial Video mode. Have you tried with that type of video? The separation is really low is almost 2 cm from the pupil.

@JoshGladstone 7 ай бұрын

Thanks! I haven't tried it specifically with spatial videos, as I've mostly moved on to capture with a few more cameras and a different technique that allows for more movement (see my more recent video on Layered Depth video). But it almost certainly would work, assuming the subject isn't too far away (and after converting the spatial format to separate left and right images). Thanks for asking and let me know if you have any other questions!

@danrojas 7 ай бұрын

@@JoshGladstone I'm really interested on trying how to see multiplanar video into Quest, I don't know much of Unity but have been learning a bit. I download stereo-magnification, but couldn't make it work. Sure is something really easy. Do you think you could make a tutorial on how to install it? Best!

@vanillagorilla8696 2 жыл бұрын

Thank you for this.

@BenEncounters 2 жыл бұрын

Btw have you seen the work on LifecastVR? They do something similar but with 1803DVR cameras, which enables a wider field of view. I was suggesting them to use it more to make improved comfort in 180VR experiences where you still lock the position in order to avoid going out of the sweet perspective spot, but enable a rotation at the level of the neck to make it much more natural, and also enable a customizable stereo in term of the distance between the eyes (although the Quest 2 doesn't allow this anymore I think).

@JoshGladstone 2 жыл бұрын

Yeah, if you headlock this or RGBD, it could solve a few stereoscopic discomfort issues, like the parallax from our necks and ipd adjustment like you mentioned, and also also being able to roll your head to the side and not ruining the stereo effect. The issue is that it's a large amount of added effort in post production for a sort of minor payoff that's hard to demonstrate the advantage of to people. But I'm with you!

@BenEncounters 2 жыл бұрын

@@JoshGladstone Yeah totally! I forgot the head tilting, definitely another advantage! For the post-production, I feel this could be automated. I'm sure you could even edit like regular VR180 footage, and just after export, load the XML into another software to render an OpenXR built (in the future it could be a specific file accepted by Oculus TV for example). The ipd adjustment would just be determined by the headset used and its settings. So to me, much of the effort would be in that software to be created once and which LifeCastVR could do quite easily. I'm gonna post that on their Facebook group. So I agree it's hard to show the added value but to me is not just a minor payoff, it's a huge one if people are having a good experience in the headset unlike what I could observe with classic 1803D. Without mentionning all the dynamic moves then possible to do with the camera.

@BenEncounters 2 жыл бұрын

Because full 6Dof video is more of a huge issue with the distortions and all. And the interactivity breaks the narrative. But if you want to tell a story presence and immersion are ok. But interactivity is the enemy of narration. That why I think this would be a bigger deal for narrative content. Interactive content like games is another story and I think there it would be better to have it completely interactive with separate assets etc,.

@importon 2 жыл бұрын

Great video! I had tried firing up that stereo magnification code a while back but I could not get it working with all the dependencies. Will you be sharing your scripts? Maybe that could help me get it running.

@JoshGladstone 2 жыл бұрын

The original release of stereo mag was 2018 so it's relatively old at this point. If you want to get it up and running, you have to use Tensorflow 1.11

@brettcameratraveler 2 жыл бұрын

Great to see your continued work on this project. Have you experimented with extruding the pixels in each of the 32 layers so that they each have depth and pseudo fill in the gaps between each layer? The end game to all this might be phones with stereo lenses as standard + machine learning gets to a point where it takes that depth data and combines it with it's ability to recognize objects/scenes and turn them into 3D meshes then uses the video to texture most of those meshes. Angles without any texture data will use content-aware fill technique.

@JoshGladstone 2 жыл бұрын

What you're describing is remarkably similar to a 'Layered Depth Image' or a 'Layered Mesh Representation', which is sort of the next advancement from MPIs that Google used in their 2020 paper. I haven't tried it yet, but others have and it works very well. My gut instinct is that's probably a bit much for two cameras to capture, but honestly I never though this or nerfs would really be possible, so its all absolutely within the realm of possibility. This idea actually overlaps with some of the tech developments of self driving cars and AR, especially with detection and segmentation.

@brettcameratraveler 2 жыл бұрын

@@JoshGladstone Extruding depth to those pixels seems like it would be relatively easy to implement. As far as the stereo cameras - they would only be recording video. All video to mesh and temporal content aware fill would happen in the cloud later. Not sure of computationally heavily that would be though. Until then, there's that technique I posted on your Reddit months ago that seemed like something that could easily be widely adopted - in short my idea was to shoot stereo 180 video pointed towards the subject of most interest and, if aftwards you decide it was a special memory you wanted to preserve in a 360 environment you would then take 1 minute to shoot handful of panoramic shots with your arm extended outward to create a 360 photo with a double arms length lightfield of depth information. You then compare and composite in the stereo video with the 360 lightfield photo to place them within 6dof space. You wouldn't be able to kneel down without some tearing but a 5ft circular 6dof view from the standing position should be good. Much less data as well.

@JoshGladstone 2 жыл бұрын

@@brettcameratraveler Similar to your idea, I've heard some talk about using photogrammetry or nerf for static elements and then something else for dynamic content. Could be interesting

@brettcameratraveler 2 жыл бұрын

@@JoshGladstone How do you like that Kandoa camera? Couldn't find the spec info on their site. What resolution does it shoot at per eye? How does it compare to the Insta360 Evo? What are you using for your custom rig?

@JoshGladstone 2 жыл бұрын

@@brettcameratraveler I think it's a fun camera. Very portable. Different from the Evo in that it's not VR180, it shoots rectilinear 16x9 content, 1080P per eye. The custom rig is two Yi 4K action cameras synced with an oddly proprietary cable. This is the same camera that Google used for its dome rig. I designed a 3d printed frame that lets me have a variable baseline. The current frame is adjustable from about 130 - 220mm.

@importon 2 жыл бұрын

This is really cool! Will you be sharing your script and/or a tutorial perhaps?

@JoshGladstone 2 жыл бұрын

Thanks! Not releasing any code at the moment, but the original project is open source: github.com/google/stereo-magnification

@importon 2 жыл бұрын

@@JoshGladstone Yes, I've been trying to get it running locally for years now, HA! If you take a look at the sheer amount of issues on the gitub you'll see I'm not alone. It's kind of a miracle that you've managed to get it working so well, hence why I'm hoping you might do a detailed tutorial.

@JoshGladstone 2 жыл бұрын

@@importon iirc, the main issue is that you have to use an old version of tensorflow. You have to use Tensorflow 1.11 or earlier.

@Instant_Nerf Жыл бұрын

I’m really curious to see how volumetric videos work on the apple vision …

@Instant_Nerf 2 жыл бұрын

Well that is the next frontier barrier to succeed in.. is nerf videos. I was wondering if this would be possible. I have 3 drones, and about 6 iphones 8-13 plus max. If we were to capture a scene from all angles with different cameras, and drones.. how would we go about joying all thos different view points .. so that we can play the nerf like a vide but being able to move the camera as the scene is being played. So if someone is walking in the scene we would see the actual movemenet from point a to point b.. and be able to look at it from different points of view. This is not currently possible, but what would it take maybe 360 cameras and a bunch of them and have them positioned at the right distance? A new algorithm to combine all the data?

@zippyholland3001 2 жыл бұрын

Thanks. This is really good.

@marcoszanre 4 ай бұрын

Thanks and congrats on this awesome project! Quick question please, is the Unity project available somewhere? I'd like to test this approach for a spatial video, whose side-by-side frames were extracted using ffmpeg, to then be viewed from Unity as an immersive video. Thanks in advance.

@philipyeldhos Жыл бұрын

amazing techniques with just two images! Any clue how Apple's new spatial video works?

@JoshGladstone Жыл бұрын

Thanks! As far as I'm aware, Spatial Video is just stereoscopic video. There's no 6dof movement, despite what the marketing videos imply

@echeng 2 жыл бұрын

If Minority Report didn’t do infill, maybe we don’t need to either. :)

@JoshGladstone 2 жыл бұрын

Agreed!

@BenEncounters 2 жыл бұрын

That's pretty cool. Have you tried using Nerfs as well? Seems this kind of image representation closer to light fields and combined with AI will really be the solution.

@BenEncounters 2 жыл бұрын

I just arrived to the end of your video whee you mention the Nerfs ahah. But yeah, still to be seen for videos..

@JoshGladstone 2 жыл бұрын

@@BenEncounters I've also found in my limited experience with nerfs so far (and other view synth techniques that require a lot of inputs), that they are extremely reliant on accurate camera poses for good results, and generally Colmap is used to figure out the camera poses. I've never had good luck with colmap, although I've only tried it out a few times. But that is another reason I like stereo inputs, you can avoid camera poses altogether. Although when it does work, the results from nerf are really incredible especially with shiny and refractive surfaces, so I definitely do want to play with it.

@BenEncounters 2 жыл бұрын

@@JoshGladstone yes definitely a trade off there. And it’s true that for many video use cases, like adventure content you cannot have a rig of cameras (only if shooting in studio). And as for photogrammetry, it’s even at the moment of taking the pictures that it is even more important to do it well, so colmap other tools produce accurate results ahah

@JoshGladstone 2 жыл бұрын

@@BenEncounters I could definitely see a capture stage set up that way. Although a lot of the appeal of this stuff for me is the ability to capture motion in the background as well. The real goal for me is to have a camera or camera system that you can set up on location and capture the whole scene. 'Reality Capture' or something like that

@BenEncounters 2 жыл бұрын

@@JoshGladstone Yes I am totally in line with you there. Agin I think you should check and play with the LifecastVR tool if you havn't yet! On my side I only wished it was possible to publish their format on OculusTV with a lock at the neck level like I mentioned in my other comment above :)

@yeknommonkey Жыл бұрын

So I’m guessing you have strong opinions on the apple vision pro camera feature..?

@JoshGladstone Жыл бұрын

I love stereoscopic media, so I'm all for it. And the more devices that can play immersive content, the better!

@n1ckFG 2 жыл бұрын

Nice! Do you know if the Kandao QooCam Ego has got first-party depth map support for video? That would open up a lot of other processing techniques.

@JoshGladstone 2 жыл бұрын

I don't believe it does. At least not that I've seen

@TonyAaronII Жыл бұрын

Keep working.

@StereoPicture3D Жыл бұрын

Great video! Is your MPIre python script available? I just found this video from the 6/14/23 PetaPixel article "Filmmaker Uses Action Cams and AI to Create Incredible Volumetric Video"

@Boostlagg Жыл бұрын

We're totally living in a simulation

@Twoeggsq Жыл бұрын

That's really what you got from this?

@kitws 2 жыл бұрын

coool!

@smartpotatoMNL Жыл бұрын

Two raw insta360 videos side by side. Machine learning stitched. Has anyone done this yet?

@AhmedAlYousify 8 ай бұрын

Is it possible to ReCreate a 3d scene/ environment from a 2d video?

@JoshGladstone 8 ай бұрын

Yes and no, it depends on what your expectations are and what you're trying to capture. For a static scene with no moving objects or people, you can recreate a fairly full environment by moving the camera around and getting shots from a lot of different angles and all sides of the subject/environment. If that's your goal, look into NeRFs and/or gaussian splatting. Luma.ai makes this very easy and user friendly (lumalabs.ai/)

@AhmedAlYousify 8 ай бұрын

@@JoshGladstone The idea is to convert a video, of a scored goal, captured from different angles into 3d. Is it possible?

@JoshGladstone 8 ай бұрын

@@AhmedAlYousify It's possible, but not simple. Look into "Dynamic NeRF" for more info. It's an active area of research.

@AhmedAlYousify 8 ай бұрын

Thank you for the direction. @@JoshGladstone Much appreciated.

@vanillagorilla8696 2 жыл бұрын

Can you play them back as point clouds?

@JoshGladstone 2 жыл бұрын

No, not the way it's currently designed. The neural network doesn't export geometries or depth, it outputs rasterized images in layers. You could in theory turn this into a point cloud I suppose, but you'd still just have discrete layers of points.

@vanillagorilla8696 2 жыл бұрын

@@JoshGladstone Either way, I want to learn more.

@vanillagorilla8696 2 жыл бұрын

@@JoshGladstone My interest is in converting monocular video into these depth images, and this video has shown me another way of getting information out of certain shots. Every shot is different, and some have more depth information than others, sometimes when the camera pans around a subject, photogrammetry and other elements might provide something comparable to stereo. Other times automated depth maps work fine, and other times they need correction, or geometric reconstruction. But, this is pretty badass.

@JoshGladstone 2 жыл бұрын

@@vanillagorilla8696 Check THIS out: single-view-mpi.github.io/

@vanillagorilla8696 2 жыл бұрын

@@JoshGladstone Thanks, I've been trying to find this page for a while. I had forgotten where it was.

@PLANET-EATER Жыл бұрын

Could this process be done with a live stream video or would needing to apply the ML algorithm restrict us from doing that?

@JoshGladstone Жыл бұрын

It's unlikely this particular implementation would ever run in real-time, but that isn't to say that a similar approach couldn't

@JackAaronOestergaardChurchill 2 жыл бұрын

What happens if you use stereo 360 video / photo? Or at least taking a middle section from of the equirectangular image to remove the nadir and zenith from the calculations. You could use the Kandao Obsidian R to capture those. That also can produce a depth map in the Kandao software which I've seen but not yet tried to do anything interesting with

@JoshGladstone 2 жыл бұрын

I haven't tried 360, but the cameras I used for the Chinese Theater video have a 160º fov, so that's pretty wide. I did try VR180 videos about a year ago, and they didn't work when stitched, but the raw fisheye recordings did work. I then had to use hemispheres as opposed to flat planes to display them (which is something Google also did with MSIs - multisphere images), and that worked pretty decently. There was still quite a lot of fisheye distortion though, and it takes an already limited resolution and stretches it over an even larger area, so it looks less detailed too. But it did work for the most part.

@JackAaronOestergaardChurchill 2 жыл бұрын

Really interesting. I've not done a lot with VR180 so I'm not that familiar with the workflow and output. I'd be happy to share some images from the Obsidian R of the full 360 if you'd like to try. I think cropping the stitch to a somewhat undistored letterbox could produce something, even if its a flat panoramic image

@JoshGladstone 2 жыл бұрын

@@JackAaronOestergaardChurchill Btw, I found a capture I did last year that I never published with a VR180 sample, if you're curious: kzbin.info/www/bejne/ZprOl6mqipeIntU

@JackAaronOestergaardChurchill 2 жыл бұрын

Thats really cool! Thanks for sharing! Its interesting to see that 180 distortion

@smetljesm2276 Жыл бұрын

Great how your technique is working. The Looking glass looks awfully bad. Can't believe they raised 2.5mil and that many people found it is workt 250$😅

@JoshGladstone Жыл бұрын

The looking glass is one of very few volumetric displays that even exist right now, and the fact that they can produce and sell one for $250 is pretty impressive. In my opinion.

@smetljesm2276 Жыл бұрын

@@JoshGladstone I know. And you are right. I just didn't het that feeling of a beta product until your demo. I kinda feel that given its resolution limit it would look more impresive with an OLED and some fancy eye tracking cameras for passer by 😁

@JoshGladstone Жыл бұрын

@@smetljesm2276 You can definitely see the resolution limitation on the looking glass, but I just got a lumepad2 which uses eye tracking with an autostereoscopic display, and while it's cool and the perceived resolution is much higher, I'd still say the looking glass still *feels* more like an actual volumetric hologram

@smetljesm2276 Жыл бұрын

@@JoshGladstone Cool tech 😎

@Instant_Nerf 2 жыл бұрын

holy crap man.. this is the exact problem i have been trying to solve.. I started 10 months ago and even used the same movie scene from minority report to explain what im trying to do.. and from the movie Dejavu... creepy! :D link: kzbin.info/www/bejne/i3ykmKh7isdlgsU Now i been trying to do that with lidar. and have had some success. but nothing like this .. wow!

@underbelly69 2 жыл бұрын

how about a rig with 2 or 3 of those kandao stereo cameras mounted side by side (sync potential?).. would you also need another layer of cams above/below? or is LR parallax "enough"

@JoshGladstone 2 жыл бұрын

It wouldn't help this neural network, as this one takes two inputs and outputs MPIs. But other techniques such as in the Google 2020 paper, and the various flavors of NeRF all require more inputs, so the more cameras the better. Left and right are sort of more important because human vision works that way, and generally we don't see a lot of vertical disparity unless you're intentionally moving up and down. Plus, my goal for output is the LookingGlass which only displays horizontal views anyway. But if you want to be able to move around freely in the space in VR or something, then you really do need cameras at a variety of angles. Especially for things like NeRF that also map more advanced view dependent effects like refractions.