Why did the NVIDIA KGMON agent use a multi-phase approach with different models instead of just one powerful model for all tasks?

Question

Accepted Answer

The multi-phase approach separates the computationally intensive task of understanding the data and creating tools from the faster task of applying those tools. A heavyweight model (like Opus 4.5) is used in an initial 'Learning' phase to build a library of reusable code (helper.py). Afterwards, a smaller, more efficient model (like Haiku 4.5) can perform the actual 'Inference' by simply calling functions from that library. This resulted in a 30x speedup and better performance on complex tasks while minimizing latency and cost during operation.

Build an Agent That Thinks Like a Data Scientist: How We Hit #1 on DABStep with Reusable Tool Generation