Microsoft has announced an innovative technology that can implement LLM with 1-bit operations.

Microsoft has announced an innovative technology that can implement LLM with 1-bit operations.

In an era where a power plant must be built to turn one model, and a GPU can be the price of a car. The advancement of artificial intelligence soon became a war of capital. It is not strange to call it a kind of GPU mafia, a few companies that grab huge data centers and power grids.

However, Microsoft quietly announced a framework called bitnet.cpp last week. It unveiled the entire one-bit computational framework that allows large language models (LLMs) to be quickly turned into CPUs without GPUs.

The idea of BitNet is simple. Do we really need 16-bit, 32-bit operations? Can’t we use 1-bit operations?

It is an idea that returned to the essence of the computer that represented the world by 1 and 0,000. This model does not just reduce numbers, it again asks the philosophy of computation. For what percentage of accuracy are we using a few thousand watts?

BitNet compresses the world to a strange number of 1.58 bits. The weight is expressed in three values (-1, 0, +1), and activation is converted to an 8-bit integer. Let’s do this, the model is not a two-gigabyte monster, but a small, hard engine with 400 megabytes. It runs on desktops and laptops. Power is still one-tenth as fast.

The combination of bitnet.cpp and native 1-bit learning is likely to be a turning point for CPU reasoning. The commercial threshold for small model reasoning has been exceeded, and the need for GPUs seems to have been significantly lowered in edge, on-prem and cost-sensitive workloads.

However, it is clear that the high quality, long contextual processing, and re-learning of large-scale models are still the domain of GPUs. Therefore, a hybrid strategy that combines CPU (lightweight inference) and GPU (high-level inference and learning) is the most reasonable, and it seems that it is important to adopt this method to optimize the cost structure, especially in multi-agent systems that rely on high-cost large-scale LLM operations.

출처: https://arxiv.org/abs/2504.12285

tslaaftermarket

Share
Published by
tslaaftermarket

Recent Posts

What’s scarier than Trump’s tariffs

🚨 What's scarier than Trump's tariffs… is what's happening in emerging markets right now Recently,…

40 분 ago

Broadcom cooperates with OpenAI to develop customized chips

> 1) USD-KRW 1430 Less Than KRW 1430 Amid Oral Intervention by Foreign Exchange AuthoritiesLast…

42 분 ago

Theme stocks jump on JPMorgan’s strength

10/14 Theme stocks jump on JPMorgan's strength amid gains on U.S. stock, Trump, Bessent comments…

46 분 ago

Coin, phishing, and windbreak

[Coin, phishing, and windbreak] These three words seem to be enough to talk about the…

7시간 ago

JP Morgan “Invests 2140 trillion over 10 years in 4

JP Morgan "Invests 2140 trillion over 10 years in 4 U.S. supply chain (scarcity metal),…

10시간 ago

Traveling abroad is like a minefield road, by all means

Traveling abroad is like a minefield road, by all meansYou should check the Safe Zone…

11시간 ago