Why is this important: Researchers continue to find new ways to leverage artificial intelligence and machine learning capabilities as technologies evolve. Earlier this week, Google scientists announced the creation of Transframer, a new framework with the ability to generate short videos based on singular image inputs. The new technology could one day augment traditional rendering solutions, allowing developers to create virtual environments based on machine learning capabilities.
The name of the new framework (and, in some ways, the concept) are a nod to another AI-based model known as Transformer. Originally introduced in 2017, Transformer is a new neural network architecture capable of generating text by modeling and comparing other words in a sentence. The model has since been included in standard deep learning frameworks such as TensorFlow and PyTorch.
Just as Transformer uses language to predict potential outputs, Transframer uses context images with similar attributes in conjunction with a query annotation to create short videos. The resulting videos move around the target image and view accurate perspectives despite the lack of geometric data in the original image inputs.
Transframer is a general-purpose generative framework that can handle many image and video tasks in a probabilistic framework. New work shows it excels at video prediction and view synthesis, and can generate 30-second videos from a single frame: https://t.co/wX3nrrYEEa 1/ pic.twitter.com/gQk6f9nZyg
— DeepMind (@DeepMind) August 15, 2022
The new technology, demonstrated using Google’s DeepMind AI platform, works by analyzing a single photo context image to obtain key pieces of image data and generate additional images. During this analysis, the system identifies the framing of the image, which in turn helps the system predict the environment of the image.
Pop-up images are then used to predict how an image would appear from different angles. The prediction models the probability of additional image frames based on data, annotations, and any other information available from the context frames.
The framework marks a milestone in video technology by providing the ability to generate reasonably accurate video based on a very limited set of data. Transframer tasks have also shown extremely promising results on other video-related tasks and benchmarks such as semantic segmentation, image classification, and optical flow predictions.
The implications for video-based industries, such as game development, could be potentially huge. Today’s game development environments rely on basic rendering techniques such as shading, texture mapping, depth of field, and ray tracing. Technologies such as Transframer have the potential to offer developers a whole new development path by using AI and machine learning to build their environments while reducing the time, resources and effort required to create them.