What is the main difference between Holo3 and the new Holo3.1?

While Holo3 focused on state-of-the-art performance, Holo3.1 prioritizes production-readiness. The key differences are improved robustness across mobile, desktop, and web environments; native support for function-calling; and, most importantly, the introduction of smaller models and quantized checkpoints (FP8, Q4 GGUF, NVFP4) designed specifically for fast, local inference on consumer and enterprise hardware.

Holo3.1: Fast & Local Computer Use Agents

Hcompany Pushes Agents On-Device with Holo3.1

Hcompany has released Holo3.1, a new family of computer-use models that shifts focus from pure performance to production viability across diverse environments. The release directly addresses key developer demands for agents that can run efficiently and locally, introducing for the first time quantized checkpoints and smaller model sizes. This move targets the growing need for AI agents that operate seamlessly on web, desktop, and mobile platforms, with the flexibility to be deployed anywhere from the cloud to a user's personal machine.

Technical Enhancements for Production Workflows

Built on the Qwen architecture, the Holo3.1 family is engineered for robustness in real-world conditions. Acknowledging that performance in one setting often fails to transfer to another, Hcompany has made significant improvements for cross-platform and cross-framework use. The models now support function-calling protocols natively, achieving near-parity with the original structured JSON outputs. This release also marks a significant collaboration with NVIDIA to enable highly efficient local inference.

Quantized Checkpoints: The first release to include quantized weights, with FP8, Q4 GGUF, and NVFP4 options for the 35B-A3B model.
Performance Gains: On NVIDIA DGX Spark hardware, the NVFP4 W4A16 checkpoint delivers 1.74x the token throughput of the full-precision BF16 model.
Expanded Environment Support: The 35B-A3B model's score on the AndroidWorld mobile benchmark improved from 67% to 79.3%.
New Model Sizes: New 0.8B, 4B, and 9B models are available for cost-effective and private deployments.

Impact on the AI Agent Ecosystem

The launch of Holo3.1 represents a pragmatic step toward the widespread adoption of AI agents by lowering the barrier to entry for local deployment. By providing optimized checkpoints like GGUF for consumer hardware (Windows, Mac) and NVFP4 for enterprise systems, Hcompany is enabling developers to build private, low-latency applications that do not rely on cloud infrastructure. This challenges the dominant cloud-centric deployment model and could accelerate the integration of capable agents into internal enterprise tools and consumer software where data privacy and responsiveness are paramount.

Hcompany's focus on quantization and multi-platform robustness with Holo3.1 is a pragmatic response to the market's shift from performance benchmarks to production viability. By providing a clear path to local, private, and fast agent deployment on consumer hardware, they are positioning themselves not just as a model provider, but as an enabler of the next wave of on-device AI applications.

>> Verify Original Transmission at Hugging Face