can you pls explain more why the model also predict the latitude if you are intersetd only in up vectore
@johanvonliebert4812Ай бұрын
Impressive and fantastic work
@juicedatom3 ай бұрын
Really impressive work!
@rikugamingofficial3 ай бұрын
can you tell about how the project is run
@fluffy_shark_studio3 ай бұрын
Thank you for sharing the good vid 😊
@mlachahesaidsalimo99589 ай бұрын
Your work is incredible ! Thank you for sharing. I really like the dynamism and playfulness of the presentation. Which software did you use to make the video presentation ? Thank you in advance for your reply
@pesarlin9 ай бұрын
Thank you! I used only PowerPoint :)
@HiwotAmlaku Жыл бұрын
Very impressive work! Question: Can I generate a neural map for localization only from birds eye view? Let us say using images from a downward looking camera for a flight from Brussels to Amsterdam.
@anywallsocket Жыл бұрын
How you choose validation data areas within training data areas?
@pesarlin Жыл бұрын
We randomly sampled a fixed number of S2 cells in each training city.
@Unique-Concepts Жыл бұрын
Fantastic work....Love it👏👏👏🙏🙏👌👌👌👍👍👍
@RicanSamurai Жыл бұрын
Fascinating! Very interesting novel approach to this problem. At 6:39, it appears as though you have ~12 map images that cover the area of interest (of which you highlight four), and then you are able to successfully get a position prediction from a query image. Do you have a sense of how densely that area needs to be covered by your map images before SNAP beats other models? Similarly, is there a map image density at which you see diminishing returns? I'm just curious how many training images are necessary to cover a given region before SNAP's predictions become useful. For that same region in your example, would 50 map images of the region make a meaningful difference to the prediction? Thanks!
@pesarlin Жыл бұрын
We use a rig with 3 cameras so we actually have 36 images in these examples (each triangle is a camera pose). We have an ablation study in Table 1 in the paper: aerial-only is a bit worse than semantic maps, while StreetView-only is a bit worse than aerial+StreetView. So aerial-only can already get you quite far but having some coverage of ground-level images is important. During training we actually map with fewer images (20 instead of 36) so the model is pretty robust to sparse views, but indeed more is better. I don't have numbers at hand, but I guess that the performance is already quite saturated at 36 views (0.6 per meter), unless there is strong occlusion (e.g. from trucks) in most views.
@Patrick-vq4qz Жыл бұрын
Awesome work!
@simsonyee Жыл бұрын
How do you get the ground truth for Aria data?
@pesarlin Жыл бұрын
We combine multi-session SLAM poses with GPS and OrienterNet predictions. See section B.3 of the paper for more details: arxiv.org/pdf/2304.02009.pdf
@simsonyee Жыл бұрын
@@pesarlin Thanks! Great work!
@yeon6761 Жыл бұрын
His voice is similary to the Richard Ayoade from Sitcom IT Cloud!
@gujiaqi5849 Жыл бұрын
Thanks for your excellent video! I want to ask taht if I don't know the intrinsic data of image , Can I localize using this method?
@pesarlin Жыл бұрын
There are works that can predict the intrinsics of an image using a deep network, for example: jinlinyi.github.io/PerspectiveFields/ This is what we use in our demo: skydes-orienternet.hf.space/
@Reeves-k2k Жыл бұрын
Cool!
@nicolasyang55822 жыл бұрын
Learning how to do the oral presentation here hahaha
@BenLu-rc9lt2 жыл бұрын
Thanks
@PixelPulse1682 жыл бұрын
Thanks
@liubonan8212 жыл бұрын
nice
@mikoychinese77352 жыл бұрын
Nice dataset for studying. Hahah, How to get a good model for a indoor environment, I try hloc but get a bad result.😂 I am a newer in sfm.
@alpharealcat2 жыл бұрын
great work!
@alpharealcat2 жыл бұрын
awesome work😃
@fbayes18112 жыл бұрын
6啊老铁
@zubairmalik6133 жыл бұрын
WOW!!! I will be shocked if they did not give you the Marr prize.
@johncasey4343 жыл бұрын
Excellent work!
@v000000000000v3 жыл бұрын
very interesting work, the output is very clean how does it deal with lens distortion? is it compensated by the FBA where the sample with the least distortion gets chosen as the reference? or does the FBA internally have some estimation of the distortion and applies that to all keypoints from the same sample?
@Matojeje3 жыл бұрын
Can't wait to see a software implementation of this!
@MarkRuvald3 жыл бұрын
What is the ballpark query latency? And is my understanding correct that pytorch/machine learning is used during the query process? Ie training can't be done offline before
@christianmoore87953 жыл бұрын
Congrats Paul! Best title yet!
@GTARobotics3 жыл бұрын
Super cool paper and demos Paul-Edouard! I can't wait for the code to be released and integrate it in #OSSDC_VisionAI open source real time video processing platform.