It was an event filled with Gemini stories. Various use cases surrounding the Gemini

Google I/O

✦ It was an event filled with Gemini stories. Various use cases surrounding the Gemini model and tools to make each use case stand out were introduced, including the Gemini 1.5 Pro and Gemini 1.5 Flash model updates. Google DeepMind’s Demis presentation naturally incorporates efforts to quickly service ongoing research.

✦ The path for Google seems very clear. By providing customized AI services to users who are incorporated into Google’s various products/platforms, it is possible to completely eliminate the hassle of having to connect to apps separately from other AI services. This is why the Multimodality model, which can consider all kinds of data at once, such as Gmail, Doc, Sheet, Slide, Search, and Photo, was designed from the beginning. It also focused on increasing the length of the context to consider a large amount of personalized data. Customers who are satisfied with Gemini, like the Apple ecosystem, will find it difficult to leave the Google ecosystem.

✦ At the beginning of the year, the Gemini 1.5 Pro has already been experimented with 10M token lengths, but it has immediately increased to 2M token lengths at the actual service/API level. Still, it is twice as much as the existing 1M, and it may vary depending on the thickness and area of the book, but you can put in dozens of books of the appropriate size. The code base of any project is almost unconditionally capable of directing the analysis. Currently, the 1M token version model has been applied to Gemini Advanced, but it is expected to be gradually expanded.

  • When I first did the beta test, the processing speed was very long, so I thought it was actually inappropriate for end users to use. However, the processing speed has improved tremendously, and in my feeling, it doesn’t feel much different from when I used 1.0 Ultra.

The ✦ Gemini 1.5 Pro has further strengthened its capabilities internally. It did not surpass Gemini 1.0 Ultra as much as the MMLU number, but it seems to perform significantly better on all other tasks. In fact, the Gemini 1.0 Ultra, which has a very limited context length, will remain as a reference only, and the Gemini 1.5 lineup is used as a practical commercial service model. The backbone of Gemini Advanced, which can be used by ordinary users like ChatGPT, has already been replaced by the Gemini 1.5 Pro, and the interface has been improved to accommodate various inputs according to the characteristics of the model.

The ✦ Gemini 1.5 Flash is a lightweight model that feels like a lightweight Pro. MMLU levels are low, but other performance is quite compliance like the 1.5 Pro. The context length also supports 1M. From the standpoint of using Gemini as an API, it is desirable to use Gemini 1.5 Flash. In fact, since the API price is very evil, the 1.5 Pro can be used for research purposes, but it seems difficult to integrate into actual services.

  • With the 1.5 Flash model taking the default position, the 1.0 Pro model may be out soon. The 1.0 Pro is also more expensive (not Ultra).
  • In fact, when using the 1.5 Flash model, the text generation speed is incredibly fast. Considering that it has the ability to process 1M tokens, it seems that there has been a lot of optimization.

A guy named ✦ Project Astra was introduced. I’m not sure how it works, but it seems to be a project that can create an experience similar to GPT-4o. Judging from the Gemini Live demo and the Twitter video where the Astra developer watched the I/O event together on Astra, it quickly processes audio/video data flowing in in real time and generates a response. Whether or not to interrupt in the middle of speaking seems to be a big problem because it is an application implementation issue rather than the ability of the model.

  • In the demo video, I wrote and demonstrated something like Google Glass that looked like a prototype, which was pretty impressive.

✦ Image, Audio, and Video generation models were introduced. Image 3, the next-generation version of Image, and Video generation model Veo, which blends the technology of several published papers well. I’m not sure about anything else, but the video generated by Veo through the demo seems to be at the level of OpenAI’s Sora model or beyond. The video that can be generated is not very long, but it is possible to continue generating, and there seems to be little sense of heterogeneity in the parts that are generated.

✦ Gemini products have been incorporated into almost all Google products, including Google Search and Gmail. You can feel Google’s powerful infrastructure, not only on simple platforms, but also because search optimization has already been established, you can feel that it already has a system that enables very fast and effective RAG with a large amount of data. It also seemed to be specialized/optimized for each service/platform, not just a simple connection, but a function called AI Overview integrated with search data searches and generates content at an incredible speed.

Gemma has already been updated, which was previously released as a ✦ open-source model. I personally like the rapid evolution. A 3B Vision Language Model called PaliGemma, which reflects Google’s PALI-3 research, has been introduced and has already been distributed in a form available to numerous open-source communities. It is also available to try out-of-the-box on Hugging Face Space. News of Gemma2, the next-generation version of Gemma, has also been reported, and Gemma2 will be released in versions with weights ranging from ~27B.

  • Currently, 27B version of Gemma2 is still under pre-learning, and the performance of the LLaMA370B model is already in progress
tslaaftermarket

Share
Published by
tslaaftermarket

Recent Posts

Respect for values used

Respect for values used to be established only after harsh experiences. After the Great Depression,…

2일 ago

Meaning of LG Energy Solution’s

● Meaning of LG Energy Solution's self-disclosure and what to pay attention to in the…

7일 ago

It feels like inflation is an

It feels like inflation is an issue again. The U.S. consumer price index has bounced…

1주 ago

summarize what seems complicated

😁To summarize what seems complicated below, from the standpoint of LP (fund investors - investors);…

1주 ago

Clearing the history of Tesla stock purchases by major institutions (Q4 2024)

📌 Clearing the history of Tesla stock purchases by major institutions (Q4 2024) Goldman Sachs:…

1주 ago

TeslaNews Summary

25/2/10 #TeslaNews Summary Tesla To Begin Deployment Of FSD v13.2.7 On Some VehiclesTesla has begun…

2주 ago