What are the key performance trade-offs when running the Cosmos Reasoning 2B model on different Jetson devices?

Question

Accepted Answer

The primary trade-off is between available GPU memory and the model's context length. High-end devices like the Jetson AGX Thor and AGX Orin can run the model with a large 8192-token context, allowing for more complex reasoning. In contrast, the memory-constrained Jetson Orin Super Nano must use aggressive optimizations that reduce the context length to 256 tokens and limit processing to single sequences, prioritizing functional deployment over inference speed and conversational depth.

Deploying Open Source Vision Language Models (VLM) on Jetson