How can enterprises deploy Step 3.7 Flash in a production environment?

Enterprises can deploy Step 3.7 Flash as a production-ready, containerized microservice using NVIDIA NIM, which provides standardized APIs and supports on-premises, cloud, or hybrid environments. For optimized performance, it is also compatible with open-source frameworks like NVIDIA TensorRT-LLM and vLLM.

Run Step 3.7 Flash on NVIDIA GPUs with Enterprise-Ready Multimodal AI

StepFun's Multimodal AI Model Goes Enterprise-Ready on NVIDIA

StepFun has released Step 3.7 Flash, a large-scale vision-language model now available on NVIDIA-accelerated infrastructure for enterprise deployment. The model is engineered for agentic AI workflows that require a combination of perception, search, and multi-step reasoning across text, images, and video. This move provides a direct pathway for organizations to integrate advanced multimodal capabilities into production environments using a standardized, performance-optimized hardware and software stack.

Technically, Step 3.7 Flash is a 198 billion-parameter Mixture-of-Experts (MoE) model, which keeps inference costs manageable by activating only a fraction of its parameters—approximately 11 billion—for any given task. It features a 256k context window and native support for image and video inputs, making it suitable for high-throughput use cases like financial document analysis and concurrent coding agents. A checkpoint quantized to the NVFP4 data type is available on Hugging Face, designed to reduce memory and storage requirements for more efficient inference.

Total Parameters: 198B
Active Parameters: ~11B per forward pass
Model Type: Mixture-of-Experts (MoE) Vision-Language Model
Context Window: 256k tokens
Input Modalities: Native image and video

From Prototyping to Production with NVIDIA's Toolchain

The collaboration extends beyond model hosting to encompass a full development and deployment toolchain. Developers can use open-source frameworks like NVIDIA TensorRT-LLM and vLLM for optimized performance on NVIDIA hardware. For production, NVIDIA NIM offers the model as a containerized inference microservice with standardized APIs, simplifying deployment across on-premises, cloud, or hybrid setups. Additionally, the NVIDIA NeMo framework enables Day 0 fine-tuning with techniques like SFT and LoRA, allowing businesses to customize Step 3.7 Flash for specific domains without extensive model engineering.

The tight integration of StepFun's open model with NVIDIA's full-stack platform, from hardware-optimized libraries to NIM microservices, signals a strategic shift towards providing enterprises with end-to-end, production-ready AI solutions rather than just individual components.

>> Verify Original Transmission at NVIDIA