Google just combined DeepMind and Google Brain into one big AI team, and on Wednesday the new Google DeepMind shared details of how one of its visual language models (VLM) is used to generate descriptions for YouTube Shorts, which can help with findability.
“Shorts are created in just a few minutes and often lack descriptions and helpful titles, making them harder to find through search,” DeepMind wrote in the post. Flamingo can make those descriptions by analyzing the first frames of a video to explain what’s going on. (DeepMind gives the example of “a dog balancing a stack of crackers on its head.”) The text descriptions are stored as metadata to “better categorize videos and tailor search results to viewer queries.”
This solves a real problem, says Colin Murdoch, Chief Business Officer of Google DeepMind The edge: For Shorts, creators sometimes don’t add metadata because the process of creating a video is more streamlined than for a longer video. Todd Sherman, the director of product management for Shorts, added that because Shorts are usually viewed on a feed where people just swipe to the next video rather than actively scrolling through it, there’s not much reason to add the metadata.
“This Flamingo model — the ability to understand these videos and give us descriptive text — is just really valuable in helping our systems that are already looking for this metadata,” says Sherman. “It helps them better understand these videos so we can make that match for users when they search for it.”
The generated descriptions are not targeted to the user. “We’re talking about behind-the-scenes metadata,” says Sherman. “We don’t present it to creators, but a lot of effort goes into making sure it’s accurate.” As for how Google ensures these descriptions are accurate, “all descriptive text will comply with our responsibility standards,” Sherman says. “It is very unlikely that any descriptive text will be generated that paints a video in any way in a bad light. That is not at all a result that we anticipate.”
Flamingo already applies auto-generated descriptions to new Shorts uploads
Flamingo already applies auto-generated descriptions to new Shorts uploads, and has done so for “a large number of existing videos, including the most viewed videos,” according to DeepMind spokesperson Duncan Smith.
I had to ask if Flamingo would be applied to longer YouTube videos later on. “I think it’s totally conceivable that it could,” Sherman says. “However, I think the need is probably a little less.” He notes that a creator of a longer-form video can spend hours on things like pre-production, filming, and editing, so adding metadata is a relatively small part of the video-making process. And because people often view longer videos based on things like a title and a thumbnail, creators who create them have an incentive to add metadata that helps with findability.
So I think the answer there is we have to wait and see. But given Google’s big push to add AI to almost everything it offers, applying something like Flamingo to longer YouTube videos doesn’t seem out of the realm of possibility, which could have a huge impact on the future. YouTube searches.