On Thursday, Google unveiled Gemini 1.5 Pro, which the company describes as delivering “dramatically enhanced performance” over the previous model. The company’s AI trajectory — viewed internally as increasingly critical for its future — follows the unveiling of Gemini 1.0 Ultra last week, alongside the rebranding of the Bard chatbot (to Gemini) to align with the new model’s more powerful and versatile capabilities.
In an announcement blog post, Google CEO Sundar Pichai and Google DeepMind CEO Demis Hassabis try to balance assuring their audience about ethical AI safety while touting their models’ rapidly advancing capabilities. “Our teams continue pushing the frontiers of our latest models with safety at the core,” Pichai summarized.
The company needs to emphasize safety for AI skeptics (including one former Google CEO) and government regulators. But it also needs to stress its models’ accelerating performance for AI developers, potential customers and investors concerned the company was too slow to react to OpenAI’s breakout success with ChatGPT.
Pichai and Hassabis say Gemini 1.5 Pro delivers comparable results to Gemini 1.0 Ultra. However, Gemini 1.5 performs at that level more efficiently, with reduced computational requirements. The multimodal capabilities include processing text, images, videos, audio or code. As AI models advance, they’ll continue to offer a more versatile array of capabilities in one prompt box (another recent example was OpenAI integrating DALL-E 3 image generation into ChatGPT).
Gemini 1.5 Pro can also handle up to one million tokens, or the units of data AI models can process in a single request. Google says Gemini 1.5 Pro can process over 700,000 words, an hour of video, 11 hours of audio and codebases with over 30,000 lines of code. The company says it’s even “successfully tested” a version that supports up to 10 million tokens.
The company says Gemini 1.5 Pro maintains high accuracy in queries with larger token counts when it has more new data to learn. It says the model impressed in the Needle In a Haystack evaluation. In this test, developers insert a small piece of information inside a long text block to see if the AI model can pick it out. Google said Gemini 1.5 Pro could find the embedded text 99 percent of the time in data blocks as long as one million tokens.
Google says Gemini 1.5 Pro can reason about various details from the 402-page Apollo 11 moon mission transcripts. In addition, it can analyze plot points and events from an uploaded 44-minute silent film starring Buster Keaton. “As 1.5 Pro’s long context window is the first of its kind among large-scale models, we’re continuously developing new evaluations and benchmarks for testing its novel capabilities,” Hassabis wrote.
Google is launching Gemini 1.5 Pro with 128,000-token capabilities, the same number at which OpenAI’s (publicly announced) GPT-4 models max out. Hassabis says Google will eventually introduce new pricing tiers that support up to one million-token queries.
Gemini 1.5 Pro is also adept at learning new skills from information in long prompts — without additional fine-tuning (“in-context learning”). In a benchmark called Machine Translation from One Book, the model learned a grammar manual for Kalamang, a language with fewer than 200 speakers globally that it hadn’t previously been trained on. The company says Gemini 1.5 Pro learned to perform at a similar level as a human learning the same content when translating English to Kalamang.
In a piece of the announcement that will catch developers’ attention, Google says Gemini 1.5 Pro can perform problem-solving tasks across longer code blocks. “When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works,” Hassabis wrote.
On the ethics and safety front, Google says it’s taking “the same approach to responsible deployment” it took with Gemini 1.0 models. That includes developing and applying red-teaming techniques, where a group of ethical developers essentially serve as devil’s advocate, testing for “a range of potential harms.” In addition, the company says it heavily scrutinizes areas like content safety and representational harms. The company says it continues to develop new ethical and safety tests for its AI tools.
Google is launching Gemini 1.5 in early access for developers and enterprise customers. The company plans to make it more widely available eventually. Gemini 1.0 is currently available for consumers, alongside a Pro variant that costs $20 monthly.
Trending Products