Ran about ~140–160 requests through OpenGradient over a few sessions, mostly small inference calls, nothing exotic. What stood out wasn’t accuracy or output quality—it was how uneven the infrastructure felt under normal use.
Some requests would settle in around ~180–220ms, then the same type of call minutes later jumps to ~500–650ms without any obvious change on my side. Payload size stayed under ~2KB most of the time, so it doesn’t look like data transfer is the bottleneck. It feels more like routing or cold-path handling kicking in unpredictably. I logged roughly 9–12 spikes where latency doubled or tripled within the same “steady” workload window.
There’s also this odd pattern where the first request after idle (say 20–30 minutes) consistently hits the slower band. After that, performance stabilizes again for maybe 15–20 requests, then drifts. That drift is the part that sticks in my head more than anything else. It’s not dramatic, just persistent enough to notice.
What makes it interesting is that nothing in the output suggests strain. No degradation in responses, no throttling signals, just timing variance. Feels like the system is doing background decisions I’m not seeing—maybe cache misses, maybe node selection, maybe something else entirely.
I kept expecting it to “settle” into a predictable range after enough calls, but it didn’t really. Even at ~3 different times of day, the same 200–600ms spread shows up again.
Not sure if this is early-stage infrastructure behavior or just how it’s meant to operate under load distribution. Either way, it doesn’t behave like a single pipeline. More like something constantly negotiating where your request should live… and you can almost feel that negotiation happening in the delay before the response lands…

#opg $OPG @OpenGradient