The thing that kept bothering me wasn’t latency. It wasn’t model quality either.
It was how much of the workflow still assumes that sending data somewhere else is the default answer.
I was testing a small workload that processed around 18,000 records over a few days. Nothing huge. But enough volume that every extra transfer started showing up in logs, costs, and operational noise.
What stood out with OpenGradient wasn't a benchmark number. It was the absence of a step I had become used to accepting.
Data stayed where it already existed.
That sounds trivial until you compare it against the usual pattern. Export. Move. Process. Store. Repeat.
A single pipeline in my test generated more than 70 GB of unnecessary data movement over one week. The actual inference workload wasn't the bottleneck. The movement around it was.
That's the assumption OpenGradient seems to be pushing against.
Not that models need to be faster.
Not that they need to be larger.
But that computation should travel to data more often than data travels to computation.
I don't think most AI discussions spend enough time on that distinction because it's less visible than model releases or benchmark charts.
Yet operationally, it's where a surprising amount of friction lives.
The interesting part is that once you start measuring transfers instead of just inference speed, some decisions that looked efficient suddenly don't look efficient anymore.
Still trying to figure out how far that observation goes...

#opg $OPG @OpenGradient