[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

rufus@discuss.tchncs.de · edit-2 3 months ago

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

rufus@discuss.tchncs.de · edit-2 3 months ago

Hmm. I meant kind of both. I think them not releasing a model isn’t a good sign to begin with. That wouldn’t matter if somebody picked it up. (What I read from the paper is that they did some training up to 3B(?!) and then scaled that up in some way to get some more measurements without actually training larger models. So also internally they don’t seem to have any real larger models. But even the small models don’t seem to have been published. I mean I also don’t have any insight on what amount of GPUs the researchers/companies have sitting around or what they’re currently working on and using them for. It’s a considerable amount, though.)

It’s only been a few weeks. I couldn’t find a comprehensive test / follow-up of their approach yet. However last week they released some more information: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

And I found this post from 2 days ago where someone did a small training run and published the loss curve.

And some people have started doing some implementations on Github. I’m not sure though where this is supposed to be going without availability of actual models.

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper page - The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits