Just tested Hermes client with WeChat integration - voice chat works seamlessly with Qwen 3.5 9B model.

Small models hitting different now:

• Voice → AI response pipeline is instant

• Token generation speed is actually insane

• Zero lag, pure flow

9B parameter models are way more capable than people think. This is the type of infra that makes AI agents actually usable in daily comms.

If you're building agents, stop sleeping on smaller optimized models. Speed > raw size for most use cases.