Noticed something odd while running ~14 inference sessions on OpenGradient over the past few days. Same prompts, same temperature settings, but the response latency kept swinging in a way that didn’t feel like normal network noise. One batch sat around ~180–210ms, then the next cluster jumped to ~380–420ms without any obvious change in load indicators on my side. It only made sense later when I looked at the request traces and saw the routing shifts.
What’s interesting is the hybrid compute behavior doesn’t announce itself. There’s no flag in the UI saying “this went edge” or “this got offloaded,” but you can feel it in the consistency gaps. Out of ~60 calls, roughly 27 seemed to hit a faster path (based on response timing clusters and token start delta), while the rest quietly drifted into slower cloud execution. The model output itself stays identical in style, but the timing variance starts to reveal an underlying split architecture.
I also noticed cost estimates flickering by ~8–12% between identical workloads. That part feels more noticeable than the latency, because nothing in the prompt changed. It makes me wonder whether the system is dynamically optimizing per-request compute in a way that’s intentionally hidden from the user surface.
It’s not necessarily bad, just slightly unsettling in a “you can’t quite pin down where your computation happened” kind of way. I keep expecting a consistent mental model to form, but it doesn’t fully settle yet. Maybe that’s the point, or maybe I’m just missing one more layer in the traces I haven’t correlated properly yet…

#opg $OPG @OpenGradient