Microsoft has announced an innovative technology that can implement LLM with 1-bit operations.

Microsoft has announced an innovative technology that can implement LLM with 1-bit operations.

In an era where a power plant must be built to turn one model, and a GPU can be the price of a car. The advancement of artificial intelligence soon became a war of capital. It is not strange to call it a kind of GPU mafia, a few companies that grab huge data centers and power grids.

However, Microsoft quietly announced a framework called bitnet.cpp last week. It unveiled the entire one-bit computational framework that allows large language models (LLMs) to be quickly turned into CPUs without GPUs.

The idea of BitNet is simple. Do we really need 16-bit, 32-bit operations? Can’t we use 1-bit operations?

It is an idea that returned to the essence of the computer that represented the world by 1 and 0,000. This model does not just reduce numbers, it again asks the philosophy of computation. For what percentage of accuracy are we using a few thousand watts?

BitNet compresses the world to a strange number of 1.58 bits. The weight is expressed in three values (-1, 0, +1), and activation is converted to an 8-bit integer. Let’s do this, the model is not a two-gigabyte monster, but a small, hard engine with 400 megabytes. It runs on desktops and laptops. Power is still one-tenth as fast.

The combination of bitnet.cpp and native 1-bit learning is likely to be a turning point for CPU reasoning. The commercial threshold for small model reasoning has been exceeded, and the need for GPUs seems to have been significantly lowered in edge, on-prem and cost-sensitive workloads.

However, it is clear that the high quality, long contextual processing, and re-learning of large-scale models are still the domain of GPUs. Therefore, a hybrid strategy that combines CPU (lightweight inference) and GPU (high-level inference and learning) is the most reasonable, and it seems that it is important to adopt this method to optimize the cost structure, especially in multi-agent systems that rely on high-cost large-scale LLM operations.

출처: https://arxiv.org/abs/2504.12285

tslaaftermarket

Share
Published by
tslaaftermarket

Recent Posts

what happened overnight

what happened overnight e Bloomberg Iran is refusing to discuss the reopening of the Strait…

11시간 ago

It’s already been more than 20

[Iranian War] It's already been more than 20 days. There have been a lot of…

13시간 ago

U.S. stocks fall amid soaring interest rates as concerns spread over ground forces deployment

03/20 U.S. stocks fall amid soaring interest rates as concerns spread over ground forces deployment…

13시간 ago

At the end of the market, the U.S. market rose to

At the end of the market, the U.S. market rose to a steady level as…

2일 ago

Overnight, U.S. markets rose as Israel attacked

Overnight, U.S. markets rose as Israel attacked Iran's largest gas storage facility, raising inflation concerns…

2일 ago

Tesla pushes to build Australia’s largest

26/3/17 #TessleNews Summary Tesla pushes to build Australia's largest supercharged charging stationTesla said it plans…

4일 ago