It can do things like commonsense reasoning, understanding and reasoning with videos, generating images and text together, reasoning through problems and generating code. It can also produce high-quality images with just one example, answer questions using different types of media, understand and reason about images, verify a student's physics problem solution, and even comprehend really complex images.