Fascinating! Very interesting novel approach to this problem. At 6:39, it appears as though you have ~12 map images that cover the area of interest (of which you highlight four), and then you are able to successfully get a position prediction from a query image. Do you have a sense of how densely that area needs to be covered by your map images before SNAP beats other models? Similarly, is there a map image density at which you see diminishing returns? I'm just curious how many training images are necessary to cover a given region before SNAP's predictions become useful. For that same region in your example, would 50 map images of the region make a meaningful difference to the prediction? Thanks!
@pesarlin Жыл бұрын
We use a rig with 3 cameras so we actually have 36 images in these examples (each triangle is a camera pose). We have an ablation study in Table 1 in the paper: aerial-only is a bit worse than semantic maps, while StreetView-only is a bit worse than aerial+StreetView. So aerial-only can already get you quite far but having some coverage of ground-level images is important. During training we actually map with fewer images (20 instead of 36) so the model is pretty robust to sparse views, but indeed more is better. I don't have numbers at hand, but I guess that the performance is already quite saturated at 36 views (0.6 per meter), unless there is strong occlusion (e.g. from trucks) in most views.
@HiwotAmlaku11 ай бұрын
Very impressive work! Question: Can I generate a neural map for localization only from birds eye view? Let us say using images from a downward looking camera for a flight from Brussels to Amsterdam.
@mlachahesaidsalimo99588 ай бұрын
Your work is incredible ! Thank you for sharing. I really like the dynamism and playfulness of the presentation. Which software did you use to make the video presentation ? Thank you in advance for your reply
@pesarlin8 ай бұрын
Thank you! I used only PowerPoint :)
@anywallsocket Жыл бұрын
How you choose validation data areas within training data areas?
@pesarlin Жыл бұрын
We randomly sampled a fixed number of S2 cells in each training city.