Google’s biggest annual event, the I/O developer conference, was held on May 14, 2024.
The company typically uses the Google I/O keynote to announce new software updates and the occasional hunk of hardware.
The I/O 2024 conference celebrated the past 12 months of rapid AI evolution. This included smart assistants, generative systems of images, words, music, and speech, and large language models trained on areas of specificity like coding or medical knowledge.
Here are the most exciting new AI updates from the I/O 2024 conference.
Gemini Live and Project Astra
Google unveiled Gemini Live, a voice AI agent with enhanced multimodal capabilities, and Project Astra, a groundbreaking prototype AI assistant that can interpret video input.
Gemini Live, which will be available in the summer, expands upon Gemini’s multimodal capabilities to allow the user to “have an in-depth two-way conversation using your voice.
RELATED POSTS:: MEET THE PRODUCT LEADER BEHIND GOOGLE’S NEXT-GEN AI MODEL GEMINI
Google showcased a video demonstration where its AI agent, Project Astra, demonstrated its versatility by identifying objects in a camera feed and comprehending code displayed on a computer screen, among other tasks.
The Google news comes the day after Microsoft-backed (MSFT) OpenAI announced improved voice capabilities on ChatGPT powered by the new GPT-4o model.
Veo, Imagen 3, and Music AI Sandbox
Alphabet’s Google also unveiled AI-powered generation tools for images, videos, and music: Veo, Imagen 3, and Music AI Sandbox.
Veo is Google’s most advanced video generation model to date. It can video create content from text and video prompts and produces high-quality 1080p videos in various styles.
This AI understands cinematic concepts such as “timelapse” and “aerial shots,” allowing it to produce videos that faithfully adhere to a filmmaker’s vision. Veo’s introduction promises to be a game-changer for video creators, providing a tool that captures the essence of their prompts and maintains realism and coherence across sequences.
The company also introduced Imagen 3, which can generate high-quality images from text prompts. According to Google, this model represents a significant leap forward, producing photorealistic images with intricate details and minimal visual distractions. Imagen 3 thrives by understanding and interpreting the nuances of text prompts, allowing it to create images that closely match user specifications.
From detailed wildlife portraits to dynamic landscape shots, Imagen 3’s capabilities were showcased through a series of generated images, each reflecting a specific prompt’s complexity and style. This model’s enhanced text rendering feature also broadens its applicability, extending to personalized messaging and professional presentations.
Alphabet’s chief executive officer, Sundar Pichai, said it is the “best model yet for rendering text,” which often indicates that an image is AI-generated. Users can sign up to try Imagen 3 on Labs.Google, it’s AI workspace, will later come to developers and enterprise customers.
Google reported that it has been working with YouTube to create a music generator called Music AI Sandbox. The company said that the tool has been designed and tested with artists. Music AI Sandbox will allow people to create instrumental sections from scratch or transform sounds.
Gemini Model
AI Overview is another generative tool powered by Gemini that brings multi-step reasoning to Google Search.
The tool summarizes content from Search at the top of the page. It can use data from Google’s other services, like Maps, to answer users’ typed questions and respond to video inputs.
The company said that AI Overview will start its rollout in the United States on Tuesday but will be available in other countries soon.
“Google Search is generative AI at the scale of human curiosity,” Pichai said, adding, “this is our most exciting chapter of Search yet.”
Gemini Nano
Google announced that its AI tech will be integrated into Android devices through Gemini Nano, the smallest Gemini model, to run AI. The company said that Pixel phones will have multi-modality AI capabilities later this year through Gemini Nano.
“This means your phone can understand the world the way you understand,” a Google employee explained at the event, adding that a device can respond to text, visual, and audio inputs with Google Nano.
The model uses context gathered from the user’s phone and runs the workload locally on the device, which could minimize some privacy concerns. The locally run AI tech minimizes latency when running AI on remote servers and can work without an internet connection since all the work is happening on the device.
Gemini 1.5, Gemma Updates and Next-Generation Hardware
The company announced improvements to its AI model, Gemini 1.5 Pro, launched the new Gemini 1.5 Flash model, added two new Gemma models, and unveiled a new version of its tensor processing unit (TPU).
The Gemini 1.5 Pro changes include improvements for translation, coding, reasoning, and other uses to improve quality. The new Gemini 1.5 Flash is a smaller model optimized for more defined tasks where speed is the priority. Gemini 1.5 Pro and Gemini 1.5 Flash are available in preview starting Tuesday and will generally be available in June.
Google also launched two new models, PaliGemma and Gemma 2, for Gemma, Google’s family of “lightweight open models.” PaliGemma is a vision-language open model, which the company says is the first of its kind and will be available on Tuesday. Gemma 2 is the next generation of Gemma, coming in June.
Google unveiled the sixth generation of its TPU, Trillium, which the company said delivers 4.7 times improved computing performance per chip compared to its predecessor. The company also reiterated that it would be one of the first cloud providers to offer Nvidia’s Blackwell GPUs in early 2025.
Photo Credit: Google