Combining a deep-depthwise CNN architecture with variable quantization in BitNetMCU achieves state-of-the-art MNIST accuracy on a low-end 32-bit microcontroller with 4 kB RAM and 16 kB flash.
(Guest article on the Nous Research blog) Anecdotal evidence suggests open weight models produce significantly more tokens for similar tasks than closed weight models. This report systematically investigates these observations. We confirm this trend to be generally true, but observe significant differences depending on problem domain.
Is it possible to implement reasonably accurate inference of MNIST, the handwritten numbers dataset, on a “3 cent” Microcontroller with only 64 bytes of RAM and 1K of instruction memory?
BitNetMCU is a project focused on the training and inference of low-bit quantized neural networks, designed to run efficiently on low-end microcontrollers like the CH32V003. Quantization aware training (QAT) and fine-tuning of model structure allowed surpassing 99% Test accuracy on a 16x16 MNIST dataset in only 2kb of RAM and 16kb of Flash.