Google Announces Project Astra, OpenAI GPT-4o’s Competition

Published on: May 15, 2024

Google unveiled Project Astra at this year’s Google I/O, just a day after OpenAI announced its new GPT-4o model. OpenAI’s announcement, which was carefully planned to coincide with Google’s big event, did not surprise the internet giant. As an alternative, Google was prepared with a potent counterpunch, demonstrating its advances in artificial intelligence (AI) technology through its Gemini suite, which includes the formidable new multi-modal AI model Project Astra. A day after OpenAI unveiled GPT-4o, a video feed comprehension tool that promises to comprehend content and have conversations about it, Google unveiled Project Astra, a research prototype with similar capabilities.

ASTRA – Google’s new advanced AI project

The acronym Astra stands for Advanced Seeing and Talking Responsive Agent. It is intended to be a useful, all-purpose agent in the real world. To be genuinely helpful, an agent must be able to comprehend and react to the complex and dynamic world in the same way that people do. It must also be able to absorb and retain what it hears and sees to comprehend the situation and act. Along with updates to its Gemini chatbot, Google unveiled advancements in its artificial intelligence lineup, including a search feature called AI Overviews and an initiative called Project Astra. Additionally, the business unveiled Imagen 3, the most recent iteration of its image generation model, and Gemini Live, a conversation-driven feature.

Read More: Snapchat Launches New AR and ML Tools for Advertisers

A demonstration that accompanied the announcement gained a lot of attention on social media platform X (formerly known as Twitter). Google showed off how its own AI, Gemini, could analyze a room and make educated guesses about the goings-on, a direct response to OpenAI’s most recent accomplishments. This feature suggests head-to-head competition in multi-modal AI technology, mirroring the capabilities demonstrated by OpenAI with its most recent ChatGPT model.

Google DeepMind just announced Project Astra.

It’s a universal AI agent that can see AND hear what you do live in real-time, and take action on your behalf.

Google just made it very clear that it’s transforming Gemini from a chatbot into a personal AI agent.

Public access… https://t.co/wdj7dWeucn

— Rowan Cheung (@rowancheung) May 14, 2024

With products ranging from Google’s Gemini (formerly Bard), Microsoft’s Copilot, Adobe Firefly, and entries from startups like Perplexity and Anthropic, creator of the Claude chatbot, the field of generative AI has exploded in the last 18 months since ChatGPT’s launch.

Astra’s AI capabilities

According to Google, Astra assists users in their daily lives by using the camera and microphone on their devices. Astra builds a timeline of events by continuously processing and encoding speech and video input, then caching the data for easy recall. According to the company, this makes it possible for the AI to recognize objects, respond to queries, and recall things that it has seen but are now out of the frame of the camera.

Read More: Meta Platforms Announce AI Image and Text Generation for Ads

AI Overviews

Google is introducing AI Overviews, a new search experience, to the US market starting this week. “Take the work out of searching” is the stated objective. Google wants to do some of that legwork for its users, using a custom Gemini model created just for search. Gemini’s multistep reasoning helps Google perform more advanced research on users’ behalf, taking into consideration factors like location, hours, and offers, so you can find the information you’re looking for faster. This eliminates the need for users to ask multiple questions about a topic, such as finding a nearby yoga studio. Google groups the results according to the topic and potential user interests using generative AI.

Google displayed numerous new products from the Gemini portfolio at the event, Astra being just one of them. One noteworthy launch is the Gemini 1.5 Flash variant. This model is intended to meet the growing need for speed and efficiency in AI operations by achieving common tasks like summarization and captioning at a much faster rate. Another new model, the Gemini Nano, is also focused on speed. It is designed to be used on small devices, like smartphones. According to Google, Nano is the fastest model for on-device applications because of its improved performance over earlier versions.

Gemini Veo Model

Additionally, the tech giant unveiled the Gemini Veo model, which can create videos based on text prompts. Google has made significant improvements to Gemini Pro’s context window in addition to these models. To support 2 million tokens, the context window—which establishes how much data the model can handle in a single query—has been doubled. With this improvement, Gemini Pro can now handle more complicated commands and give more thorough answers, enhancing its standing as a superior AI model.

Read More: Apple Announces Five New Games To Arrive on Apple Arcade

Gemini Live

Google has released Gemini Live, which aims to improve user engagement with these sophisticated models. This new device is a voice-only assistant made to facilitate smooth communication. Users are able to have a conversation with the AI in a natural, back-and-forth manner; they can even pause it if it starts talking too much or go back and review previous exchanges. With the help of this feature, a wider audience will be able to access AI assistance more easily and intuitively.

Google Lens

The ability to record and narrate videos while conducting web searches is a new feature of Google Lens, which was unveiled at Google I/O. A more dynamic and interesting way to find information online has been made possible by the integration of video input into web search, which marks a significant advancement in user interaction and utilization of search technology.

Read More: Google Delays Third-Party Cookies Phase-Out Yet Again!

Tags: Artificial intelligence, chatbot, gemini AI, Google, Project Astra

Author Profile

About Netra

Netra is a Dual Masters graduate in International Business and Marketing. She is a content-writing enthusiast and a social media addict. In her downtime, you will find her headbanging to Pop songs from around the world. She is also a sports fanatic and especially loves F1, Volleyball, and Cricket. Her hobbies are baking and watching Anime.

View all posts by Netra

Related Posts