Click to Skip Ad
Closing in...

AI can create accurate images of streets just by listening to them

Published Dec 9th, 2024 7:38PM EST
Panorama of Jerusalem old city. Israel
Image: SJ Travel Footage / Adobe

If you buy through a BGR link, we may earn an affiliate commission, helping support our expert product labs.

AI has entranced the scientific community. While chatbots like ChatGPT might be the most prominent AI we see in our daily lives, there’s a lot more you can do with AI than just talk to it. In fact, some researchers have even found a way to create a sound-based AI image generator that uses soundscapes to create accurate street images.

In a new paper published in Computers, Environment and Urban Systems, researchers showed that it is possible to take the “soundtracks” of real locations of urban and rural settings and recreate them using AI. Researchers at the University of Texas at Austin carried out the study, working to convert sounds from audio recordings into fairly accurate street-view images like you might see on Google Street View.

sound-based AI image generator comparison to real world imagesImage source: University of Texas at Austin

It’s quite an accomplishment, to be honest, and reminds me quite a bit of the AI-powered camera that takes photos without a lens by using location data to recreate wherever the photographer has pointed it. These researchers used both audio and visual data to train their sound-based AI image generator. They then tested using just audio to recreate some of the locations from which they captured soundscapes.

The results are quite compelling, showcasing just how much the acoustic environments of an area can help represent the visual nature of the location, too. The researchers used a YouTube video, as well as audio clips from cities in North America, Asia, and Europe, to carry out their tests. They created 10-second audio clips and image stills from the locations to train the AI model used in their image generator.

They then compared the images created from 100 audio clips to photos taken of their respective real-world locations using both human and computer evaluations. They discovered that the sound-based AI image generator was capable of capturing the scene accurately just based on the acoustic properties—something that was previously a uniquely human capability.

Josh Hawkins has been writing for over a decade, covering science, gaming, and tech culture. He also is a top-rated product reviewer with experience in extensively researched product comparisons, headphones, and gaming devices.

Whenever he isn’t busy writing about tech or gadgets, he can usually be found enjoying a new world in a video game, or tinkering with something on his computer.