Google’s Gemini 1.5 Pro can now hear

Illustration: The Verge

Google’s update to Gemini 1.5 Pro gives the model ears. The model can now listen to uploaded audio files and churn out information from things like earnings calls or audio from videos without the need to refer to a written transcript.

During its Google Next event, Google also announced it’ll make Gemini 1.5 Pro available to the public for the first time through its platform to build AI applications, Vertex AI. Gemini 1.5 Pro was first announced in February.

This new version of Gemini Pro, which is supposed to be the middle-weight model of the Gemini family, already surpasses the biggest and most powerful model, Gemini Ultra, in performance. Gemini 1.5 Pro can understand complicated instructions and eliminates the need to fine-tune models, Google claims.

Gemini 1.5 Pro is not available to people without access to Vertex AI. Right now, most people encounter Gemini language models through the Gemini chatbot. Gemini Ultra powers the Gemini Advanced chatbot, and while it is powerful and also able to understand long commands, it’s not as fast as Gemini 1.5 Pro.

Gemini 1.5 Pro is not the only large AI model from Google getting an update. Imagen 2, the text-to-image generation model that helps power Gemini’s image-generation capabilities, will also add inpainting and outpainting, which let users add or remove elements from images. Google also made its SynthID digital watermarking feature available on all pictures created through Imagen models. SynthID adds an invisible to the viewer watermark on images that marks its provenance when viewed through a detection tool.

Many of the new features of Imagen, especially inpainting and outpainting, have been part of other text-to-image models like Stability AI’s Stable Cascade and Getty’s Generative AI by iStock, not to mention wider consumer availability on newer Samsung Galaxy phones.

Google says it’s also publicly previewing a way to ground its AI responses with Google Search so they answer with up-to-date information. That’s not always a given with the responses produced by large language models, sometimes intentionally; Google has intentionally kept Gemini from answering questions related to the 2024 US election.

Gemini was also recently criticized for generating photos with historically inaccurate people.

Posted from: this blog via Microsoft Power Automate.

Hot Posts

Google’s Gemini 1.5 Pro can now hear

Posted by Types Digital Marketing

Post a Comment

0 Comments

Google This!

Secrets of Perfect Vision

Popular Post

After three months on Linux, I don’t miss Windows at all

Unlock the Power of Open-Source LLMs: LangChain x Mistral RAG Agent Cookbooks & Video

Amazon’s color screen Kindles are finally getting a system-wide dark mode

We picked the 24 best Big Spring Sale deals that are under $50

Subscribe Us

Search This Blog

Contact form

Hot Posts

Google’s Gemini 1.5 Pro can now hear

Posted by Types Digital Marketing

You may like these posts

Post a Comment

0 Comments

Google This!

Secrets of Perfect Vision

Popular Post

After three months on Linux, I don’t miss Windows at all

Unlock the Power of Open-Source LLMs: LangChain x Mistral RAG Agent Cookbooks & Video

Amazon’s color screen Kindles are finally getting a system-wide dark mode

We picked the 24 best Big Spring Sale deals that are under $50

Subscribe Us

Search This Blog

Contact form