Google has finally taken the covers off its project Gemini, after almost a year-long secrecy, and the world now gets to take a look at its capabilities. Google Gemini is the company’s largest AI model and is a multimodal AI system capable of producing outputs in images, video, and audio formats in its most powerful version. The AI model will be competing with OpenAI’s GPT-4 directly, and the first shots have already been fired by Google. At its launch, Google, without really looking to do a comparison, claimed that its Gemini AI model beats any other models out there in most of the benchmarks. So, how different is Google Gemini compared to GPT-4, and can it surpass the ChatGPT maker? Let us take a look.
The Gemini model’s problem-solving skills are being touted by Google as being especially adept in math and physics, fueling hopes among AI optimists that it may lead to scientific breakthroughs that improve life for humans.
“This is a significant milestone in the development of AI, and the start of a new era for us at Google,” said Demis Hassabis, CEO of Google DeepMind, the AI division behind Gemini.
Google claimed that Gemini is its most flexible model yet and able to efficiently run on everything from data centers to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI. It is available in three variants — Gemini Nano, the basic model, Gemini Pro, and its most advanced model Gemini Ultra which can generate results in images, video, and audio.
Gemini vs GPT-4
Google has also tested its benchmarks against those of GPT-4, and the company claims that its AI modal has defeated OpenAI’s LLM in 30 out of 32 benchmarks. The blog post said, “We’ve been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio, and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development”.
So, what were some of these benchmarks where Google Gemini took the lead? The first and the most significant one was MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine, and ethics to test both world knowledge and problem-solving abilities. As per the company, Gemini became the first model to outperform human experts with a score of 90.0 percent. GPT-4, in comparison, scored 86.4 percent.
Gemini was also ahead in Big-Bench Hard (multistep reasoning) and DROP (reading comprehension) benchmarks under the Reasoning umbrella where it scored 83.6 percent and 82.4 percent respectively, compared to GPT-4’s 83.1 and 80.9 percent scores. It also swept the OpenAI LLM in coding and math-based benchmarks. GPT-4, however, scored a massive 95.3 percent score in HellaSwag (commonsense reasoning for everyday tasks), beating Gemini which scored 87.8 percent.