AiPhreaks ← Back to News Feed

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI

By Jakub Antkiewicz

2026-05-29T11:29:30Z

StepFun's Multimodal AI Model Goes Enterprise-Ready on NVIDIA

StepFun has released Step 3.7 Flash, a large-scale vision-language model now available on NVIDIA-accelerated infrastructure for enterprise deployment. The model is engineered for agentic AI workflows that require a combination of perception, search, and multi-step reasoning across text, images, and video. This move provides a direct pathway for organizations to integrate advanced multimodal capabilities into production environments using a standardized, performance-optimized hardware and software stack.

Technically, Step 3.7 Flash is a 198 billion-parameter Mixture-of-Experts (MoE) model, which keeps inference costs manageable by activating only a fraction of its parameters—approximately 11 billion—for any given task. It features a 256k context window and native support for image and video inputs, making it suitable for high-throughput use cases like financial document analysis and concurrent coding agents. A checkpoint quantized to the NVFP4 data type is available on Hugging Face, designed to reduce memory and storage requirements for more efficient inference.

  • Total Parameters: 198B
  • Active Parameters: ~11B per forward pass
  • Model Type: Mixture-of-Experts (MoE) Vision-Language Model
  • Context Window: 256k tokens
  • Input Modalities: Native image and video

From Prototyping to Production with NVIDIA's Toolchain

The collaboration extends beyond model hosting to encompass a full development and deployment toolchain. Developers can use open-source frameworks like NVIDIA TensorRT-LLM and vLLM for optimized performance on NVIDIA hardware. For production, NVIDIA NIM offers the model as a containerized inference microservice with standardized APIs, simplifying deployment across on-premises, cloud, or hybrid setups. Additionally, the NVIDIA NeMo framework enables Day 0 fine-tuning with techniques like SFT and LoRA, allowing businesses to customize Step 3.7 Flash for specific domains without extensive model engineering.

The tight integration of StepFun's open model with NVIDIA's full-stack platform, from hardware-optimized libraries to NIM microservices, signals a strategic shift towards providing enterprises with end-to-end, production-ready AI solutions rather than just individual components.
End of Transmission
Scan All Nodes Access Archive