Five Labs, Five Minds: Building a Multi-Model Financial Narrative Game with Small Models

ME AI news, Thousand Token Wood v2 utilizes four different labs' small models (gpt-oss-20b, MiniCPM3-4B, Nemotron-Mini-4B, and fine-tuned Qwen 0.5B) to power agents in a financial simulation game. The core finding is that the friction in the heterogeneous service layer is due to vLLM 0.22.1 requiring the CUDA toolkit, not the model itself. With a tolerance JSON parsing layer, adding models takes just one config line. Information isolation ensures that insider flags are not in the prompts, and scanning tests validate no leaks. Memory uses emotional summaries truncated to avoid drowning. The fine-tuned 0.5B model achieves 0% self-execution and 100% valid quotes, with the truth firewall guaranteeing zero leaks. The small models are reliable format generators but not reliable reasoners, which can be compensated through structuring, prompts, and fine-tuning. (Source: ME)