Google's Gemini 2.0: A New Era of AI
Information is at the core of human progress. For over 26 years, Google's mission has been to organize the world's information and make it accessible and useful. This commitment fuels their continuous pursuit of AI advancements, aiming to organize information across all inputs and make it accessible through any output, ultimately enhancing its usefulness for individuals.
This vision guided the introduction of Gemini 1.0 last December. As the first natively multimodal model, Gemini 1.0 and its successor, 1.5, significantly advanced multimodality and long context understanding. These models could process vast amounts of information across text, video, images, audio, and code.
Millions of developers now build with Gemini, which has been instrumental in reimagining Google's existing products (including seven with over 2 billion users) and creating new ones. NotebookLM stands as a prime example of the power of multimodality and long context, showcasing its appeal to numerous users.
The Agentic Era of AI
Google's efforts over the past year have focused on developing more 'agentic' models; models capable of understanding the surrounding world, thinking multiple steps ahead, and acting on behalf of users under their supervision.
This development leads to the launch of Gemini 2.0, Google’s most capable model to date. With advancements in multimodality – including native image and audio output – and native tool use, Gemini 2.0 paves the way for new AI agents bringing us closer to a universal assistant vision.
Gemini 2.0's Capabilities
Gemini 2.0 Flash, an experimental model, is immediately available to all Gemini users. A new feature, Deep Research, leverages advanced reasoning and long context to function as a research assistant, investigating complex topics and generating reports. This is currently available within Gemini Advanced.
The impact of AI on search is undeniable. AI Overviews now reach 1 billion people, allowing for entirely new types of questions, quickly becoming one of Google's most popular Search features. The advanced reasoning of Gemini 2.0 is being integrated into AI Overviews to handle more complex topics and multi-step questions, including advanced math, multimodal queries, and coding. Limited testing began this week, with broader rollout planned for early next year. Expansion to more countries and languages is also slated for the coming year.
Technological Underpinnings of Gemini 2.0
Gemini 2.0's advancements are rooted in Google's decade-long investment in a full-stack approach to AI innovation. It's built on custom hardware like Trillium, their sixth-generation TPUs. Trillium powered 100% of Gemini 2.0's training and inference, and is now generally available to customers for their own development.
If Gemini 1.0 focused on organizing and understanding information, Gemini 2.0 is about making it significantly more useful.
Project Astra: A Universal AI Assistant
Project Astra, a research prototype exploring the future of universal AI assistants, has seen significant improvements with Gemini 2.0. Initially tested on Android phones, feedback from trusted testers has shaped its development, particularly regarding safety and ethics. Improvements include enhanced capabilities and integration with prototype glasses, providing an immersive and helpful experience.
Project Astra's Features and Applications
Project Astra uses real-time conversations and fast responses. It also boasts a memory function, refining answers by recalling details from previous conversations and retaining information from the current session (up to 10 minutes). Users can utilize Google Search, Maps, and Lens through Project Astra for information gathering. A waitlist is available for those interested in becoming testers.
Project Mariner: Navigating the Web with AI
Project Mariner, another early research prototype utilizing Gemini 2.0, explores human-agent interaction within a browser. Through an experimental Chrome extension, it analyzes browser information (pixels, text, code, images, forms), performing tasks for the user. Achieving state-of-the-art results (83.5% success rate on the WebVoyager benchmark), it demonstrates the technical feasibility of web navigation via AI, despite present limitations in accuracy and speed. These shortcomings are expected to improve rapidly over time. Safety and security measures are rigorously implemented, with humans in the loop, ensuring controlled and responsible functionality.
Jules: An AI-Powered Code Agent
Jules, an experimental AI-powered code agent, integrates directly into GitHub workflows. Under a developer’s supervision, it can address issues, develop plans, and execute them, reflecting Google’s long-term goal of creating AI agents helpful across domains, including coding.
The Future is Agentic: A Glimpse into a World Enhanced by AI
Gemini 2.0 Flash and the research prototypes—Astra, Mariner, and Jules—represent Google's commitment to testing and refining cutting-edge AI capabilities. These advancements are poised to significantly enhance the helpfulness of Google products in the future. The responsible development of these technologies, prioritizing safety and security, is paramount. Google is taking a measured, gradual approach, employing iterative safety training, collaborating with trusted testers and experts, and conducting extensive risk assessments.
The release of Gemini 2.0 Flash and the research prototypes mark a significant milestone in the Gemini era. Google looks forward to exploring new possibilities as they continue to progress toward Artificial General Intelligence (AGI).