Apple의 WWDC 2024
I think it’s been about 14 years since I’ve been watching WWDC events. I’ve been having fun from Obj-C to Swift for a long time, and as a developer, there’s something about WWDC events that always makes my heart flutter. Anyway, I also enjoyed watching this WWDC.
The keynote itself was a bit disappointing because it went by so quickly, but the blog post and several session videos released later are quite technical, so this is another fun factor. Personally, it seems like WWDC has gathered the most technically deep dive sessions among large corporate conferences. In particular, Apple Intelligence (AI) was very impressive, so let’s summarize why.
Hierarchical LLM Operations with ✦ On Devices and Cloud
- The idea of putting LLM on an on-device is still pretty fresh. Google also offers Gemini Nano for on-device use on some Pixel and Galaxy phones, but it’s only a very experimental stage and has a long way to go in terms of integrating operating systems and cross-platforms. A cursory look at what Apple has been doing seems to show that it has been making various efforts to raise LLM on-device. In addition to boosting the performance of small models, we’ve developed tools like Talaria that look into Performance, Latency, and Power conversation resulting from model weight reduction, so we can see how serious they are about on-device models. LocalFirst.
- From the perspective of controlling device operation, an omnipotent, super-large-scale single model such as GPT4o may be used through multiple prompt engineering, but rather, having an adapter that may be individually applied to various situations may be much more stable in terms of security and consistency. Not only the on-device model but also the model hosted on the Private Cloud Compute (PCC) reflects the method of having an adapter. It would be a good reference to the reason for finetuning and to get meaningful results from creating a task-specific adapter. In fact, if you go into Apple Research, you can see that the result of performing a specific task with multiple adapters has reached the GPT4 level and that the safety is very good. Manipulating something in the device operation with a prompt is a very important point to prevent hacking.
- Hierarchical operation of on-device and cloud models is a fairly common pattern, but it is likely that there will be a very complex architecture in terms of integrated coverage of all use cases. First, it is necessary to understand the user’s intention, be able to determine whether a model with better performance is needed, and be able to route to the relevant adapter model. In addition, as the version of the model will become increasingly fragmented due to the aging of the device and the update of the operating system, this must also be managed.
- It seems that Apple Silicon has built a cluster environment for its own LLM inference (JAX and Google Cloud appear to be used for learning). Although it would have been very optimized by itself, I thought that ordinary users might be able to purchase several Mac mini-like things and configure their own cluster inference environment. It can be operated with low power and is very cheap compared to using NVIDIA GPUs, so I was able to imagine what I personally liked.
✦ Third-Party LLM Integration
- At first, it seemed a little fun to call ChatGPT, but Apple’s models that operate Adapters to achieve certain tasks are not suitable for general-purpose Arbitrary Tasks. This is interpreted as a position to integrate well-made external models from third parties. In line with this trend, OpenAI also seems to have released a free version of GPT4o. It made it possible to communicate with GPT4o without signing up, and to link paid accounts for a more comfortable experience. If you want to integrate your LLM into your Apple device in the future, you will basically have to provide a free version of your own powerful model (in terms of user experience, I think you will unconditionally ask for a free version).
- You might want to create your own GPT4, but there’s no particular reason for that. Apple doesn’t have that computing resources right now, even though we don’t know what will happen in the future (guessing that Google Cloud is the learning environment). In the future, you may be able to build your own, but it’s device control that’s right now that’s not necessary for a super-large model. It’s also very risky for Apple that models like GPT are constantly changing, and legacy models will be out of service, and prompt engineering must continue to be accompanied each time. It was desirable and practiced to bake an adapter that worked well for multiple tasks, fix versions, and remove completely external dependencies.
AI integration across ✦ products
- Although the era of widespread use of LLM has arrived, and many people were astonished by the surprise, in fact, the Killer App and Killer Service have not appeared in particular. There may be a number of reasons, but it is possible to play with the right amount and try a Toy project with a WoW sound, but in order to make practical use of it, the existing software/platform/service structure must be substantially renovated (interface, before/after input/output, etc.). Chatting is OK, so many companies have introduced it, but it seems that putting time, money, and manpower into internal innovation is still a lot to see. The risk is high.
- Apple is