Google's fancy new AI video tool can be completely fooled

Image: Jose Sanchez/AP/REX/Shutterstock

Knowing the content of any piece of media on the web is kind of Google’s bread and butter. It started with search of web pages but quickly expanded to images, where its algorithms and various tools help return image search results with greater accuracy than ever before, and now the company wants to do the same with video. To that end, Google recently debuted what it calls Cloud Video Intelligence, which is a fancy way of saying it taught machines how to identify what is in a video without having to have a human watch it and manually label it. There’s just one problem: It’s ridiculously easy to break.

A team of researchers from University of Washington have demonstrated beyond a shadow of a doubt that Google’s fancy content detection system is so sensitive that it can be duped into thinking a video is about one thing, when it’s really about something entirely different. After digging into how the API works, the group realized that by inserting single-frame images of a specific object at regular intervals, the system can be fooled.

To prove their point, the researchers took a video of a tiger playing in a zoo and ran it through Google’s fancy tool. It produced results for Animal, Wildlife, Zoo, Nature, and Tourism, which are all perfectly handy labels and appropriately describe what’s going on. Then, they inserted a single frame image of an Audi wagon at intervals of one out of every fifty frames.

When the modified video was sent back through the tool, the results looked much different, with labels for Audi, Vehicle, Car, Motor Vehicle, and Audi A4. There wasn’t a single mention of anything animal related, despite the fact that just 1/50th of the video had images of the car, and the rest was a tiger chilling out in a pool.

Google’s tool is still in its relatively early stages, and is currently in private beta. However, is problems such as this persist it could seriously limit the usefulness of the software in accurately detecting and labeling content.