How much memory do Ternary Bonsai models use?

According to PrismML and PR Newswire, Ternary Bonsai 8B uses about 1.75 GB, the 4B about 0.86 GB, and the 1.7B about 0.37 GB, roughly 9 times smaller than standard 16-bit models.

PrismML 1.58-Bit Language Models

Ternary Bonsai

Ternary Bonsai is a PrismML family of 1.58-bit language models announced on April 16, 2026. The lineup includes 8B, 4B and 1.7B variants that use ternary weights {-1, 0, +1} to reduce memory roughly 9x versus standard 16-bit models while improving on the earlier 1-bit Bonsai tradeoff.

Ternary Bonsai WebGPU Demo

Embedded from the official Hugging Face Space for the Ternary Bonsai browser demo.

Read Overview

The full-width layout keeps the embedded Ternary Bonsai experience close to the original Space, with more details, benchmarks and background below.

What Ternary Bonsai Is

Ternary Bonsai is a family of compact language models from PrismML built around a true 1.58-bit representation. In the official announcement, PrismML positions the family as a step beyond its earlier 1-bit Bonsai line: slightly larger, but meaningfully stronger on standard evaluations while staying dramatically smaller than conventional 16-bit models.

True ternary architecture

PrismML says Ternary Bonsai uses ternary weights throughout the network, including embeddings, attention layers, MLPs and the LM head. There are no higher-precision escape hatches in the published design.

1.58-bit representation

Each weight is constrained to one of three values, {-s, 0, +s}, encoded as (-1, 0, +1) with a shared FP16 scale for each group of 128 weights. That is the core reason the family is described as 1.58-bit.

Why people search it

Most searches for Ternary Bonsai are trying to find the PrismML announcement, the Hugging Face demo, benchmark context, memory numbers, or practical deployment details for edge and Apple hardware.

Ternary Bonsai WebGPU

PrismML Ternary Bonsai

ternary-weight language model

1.58-bit LLM

Ternary Bonsai 8B

Ternary Bonsai 4B

Ternary Bonsai 1.7B

on-device inference

MLX Apple models

efficient language model

Ternary Bonsai Specs and Benchmarks

The published Ternary Bonsai materials give enough detail to cover the main informational queries around the keyword: model sizes, memory footprint, benchmark deltas, throughput and supported platforms.

Model sizes and memory

Ternary Bonsai 8B: about 1.75 GB
Ternary Bonsai 4B: about 0.86 GB
Ternary Bonsai 1.7B: about 0.37 GB
Roughly 9x smaller than comparable 16-bit models

Benchmark highlights

PrismML reports that Ternary Bonsai 8B reaches a 75.5 average benchmark score versus 70.5 for 1-bit Bonsai 8B, while using only about 600 MB more memory. The announcement calls out results across MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval and BFCLv3.

Throughput and platform coverage

According to PrismML, Ternary Bonsai 8B runs at 82 toks/sec on M4 Pro and 27 toks/sec on iPhone 17 Pro Max. The family runs natively on Mac, iPhone and iPad through MLX, with weights released under Apache 2.0.

Model	Published Memory	Weight Format	Notes
Ternary Bonsai 8B	About 1.75 GB	1.58-bit ternary	Main flagship in the launch materials, with published benchmark and throughput examples.
Ternary Bonsai 4B	About 0.86 GB	1.58-bit ternary	Middle deployment tier for tighter memory budgets.
Ternary Bonsai 1.7B	About 0.37 GB	1.58-bit ternary	Smallest published variant for the lightest footprint.

Published memory figures above come from the launch materials and PR Newswire summary referenced at the bottom of the page.

Ternary Bonsai vs 1-Bit Bonsai

The launch framing is not “replace 1-bit Bonsai.” It is a tradeoff story: Ternary Bonsai spends a modest amount of extra memory to get a stronger model, while 1-bit Bonsai remains the choice when absolute minimum footprint matters most.

Dimension	Ternary Bonsai 8B	1-Bit Bonsai 8B	Why It Matters
Published Memory	About 1.75 GB	About 1.15 GB	Ternary Bonsai adds roughly 600 MB for a stronger capability profile.
Average Benchmark Score	75.5	70.5	The published delta is 5 points in favor of Ternary Bonsai 8B.
Positioning	Higher performance at still-small size	Minimum footprint first	They target different points on the efficiency frontier.
Best Fit	When a bit more memory is acceptable	When the smallest possible footprint wins	Useful for product teams choosing between quality and memory ceilings.

Why Ternary Bonsai Matters

The official positioning is not that Ternary Bonsai replaces 1-bit Bonsai. Instead, it extends the efficiency frontier with a different tradeoff: slightly more memory for a meaningful increase in capability.

Extending the Pareto frontier

PrismML describes Ternary Bonsai as a further left shift on the performance-versus-size frontier. The claim is that developers can get stronger reasoning while staying inside footprints that remain practical for edge devices and local deployment.

Edge to datacenter relevance

The marketing and technical framing both emphasize intelligence density: more useful model quality per unit of memory, compute and energy. That matters for phones, laptops and also for server-side infrastructure efficiency.

Search intent coverage

Common follow-up searches include Ternary Bonsai 8B, Ternary Bonsai memory size, Ternary Bonsai benchmark, Ternary Bonsai MLX and Ternary Bonsai WebGPU demo.

Ternary Bonsai Use Cases

The interesting deployment story for Ternary Bonsai is not just compression for its own sake. It is the combination of local memory efficiency, practical throughput and enough model quality to be useful on real products and devices.

On-device assistants

Apple-device support through MLX makes Ternary Bonsai relevant for local chat, summarization, writing help and lightweight agent experiences on Mac, iPhone and iPad.

Browser demos and product previews

The Hugging Face WebGPU demo is useful for instant evaluation. Teams can show a live experience before asking users to install anything or sign into a hosted playground.

Memory-constrained inference

Ternary Bonsai is a fit when memory limits are strict but product quality cannot collapse. That tradeoff is relevant for edge devices, embedded workflows and smaller inference budgets.

Area	Published Detail	Interpretation
Benchmarks Mentioned	MMLU Redux, MuSR, GSM8K, HumanEval+, IFEval, BFCLv3	The launch message is broad performance gains, not just one benchmark spike.
Throughput on M4 Pro	82 toks/sec for Ternary Bonsai 8B	Strong local speed for laptop-class deployment.
Throughput on iPhone 17 Pro Max	27 toks/sec for Ternary Bonsai 8B	Shows the model family is positioned for real mobile use, not just desktops.
Energy Framing	Roughly 3-4x better energy efficiency than 16-bit counterparts	Important when battery life, thermal limits or inference cost matter.

Ternary Bonsai FAQ

These FAQs map to the main informational searches around the Ternary Bonsai announcement and model family.

What is Ternary Bonsai?

Ternary Bonsai is a PrismML family of 1.58-bit language models announced on April 16, 2026. It includes 8B, 4B and 1.7B variants designed to trade a small increase in size for stronger performance than the earlier 1-bit Bonsai models.

How much memory does Ternary Bonsai need?

PrismML and PR Newswire list the family at about 1.75 GB for 8B, 0.86 GB for 4B and 0.37 GB for 1.7B, which is roughly 9 times smaller than standard 16-bit models of similar size.

How does Ternary Bonsai compare with 1-bit Bonsai?

The official comparison says Ternary Bonsai 8B scores 75.5 on average across the published benchmark set versus 70.5 for 1-bit Bonsai 8B, while requiring only about 600 MB more memory.

Where does Ternary Bonsai run?

PrismML says Ternary Bonsai runs natively on Apple devices including Mac, iPhone and iPad via MLX. The announcement also highlights local throughput results on M4 Pro and iPhone 17 Pro Max.

Is Ternary Bonsai open source?

The released model weights are available under the Apache 2.0 license according to the official announcement, with additional technical details linked through the published whitepaper.

Source References

These are the primary references used for the factual content on this page.

Hugging Face Space The embedded WebGPU demo source for the browser experience shown above. PrismML: Ternary Bonsai announcement The primary source for the April 16, 2026 announcement, the 1.58-bit architecture description, the benchmark summary and Apple MLX support details. PR Newswire release Useful for the published memory figures for the 8B, 4B and 1.7B models and for a second primary-source version of the launch announcement. Ternary Bonsai whitepaper The technical document linked by PrismML for readers who want the methodology behind training, evaluation and benchmarking.