Planetary-scale image geolocation, the process of identifying the geographic location of an image, represents a significant challenge in computer vision due to the enormous variety and complexity of global images. Traditional methods, primarily focused on landmark images, struggle to generalize to unfamiliar locations.
The game “Geoguessr,” which has amassed 65 million players, highlights this challenge by tasking players with identifying the location of a Street View image from anywhere in the world. The research paper titled “PIGEON: PREDICTING IMAGE GEOLOCATIONS” details how to address this challenge. Researchers at Standord University have developed PIGEON and PIGEOTTO, two innovative models that mark significant advances in image geolocation technology.
PIGEON (Image Geolocation Prediction) is a model trained on planetary-scale Street View data inputting four-image panoramas to predict geographic locations. Remarkably, PIGEON can place over 40% of its predictions within a 25 km radius of the correct global location, a remarkable achievement in this field. This model has demonstrated its power by competing with the best human players on Geoguessr, ranking in the top 0.01% and consistently outperforming them.
In contrast, PIGEOTTO is trained on a more diverse dataset of over 4 million photos from Flickr and Wikipedia without relying on Street View data. This model takes a single input image and has achieved state-of-the-art results in various image geolocation benchmarks, significantly reducing mean distance errors and demonstrating robustness to changes in image location and distribution.
The technical backbone of these systems includes advanced methodologies such as semantic geocell creation, multi-task contrast pre-training, a new loss function, and downstream guess refinement. These methods contribute to minimizing distance errors and improving the accuracy of geolocation predictions.
The process of training these models is complex. PIGEON is trained on a dataset specially created for it using 100,000 randomly sampled locations from Geoguessr, while PIGEOTTO’s training dataset is significantly larger and more diverse. The evaluation of these models uses a metric system focused on average distance error and various kilometer distance accuracies, from street level to continent level.
Although the advances these models bring are significant, they also raise important ethical considerations. The precision and capabilities of such technologies can have both useful applications and potential for abuse. This duality necessitates a careful balance in the development and implementation of image geolocation technologies.
In conclusion, PIGEON and PIGEOTTO represent a major leap forward in image geolocation technology, achieving state-of-the-art results while being adaptable to changes in distribution. Their development highlights the importance of various technological innovations and points to the potential future of image geolocation technologies that are either truly planetary or focused on narrowly defined distributions.
Image source: Shutterstock