A few days ago, I was chatting with a friend who works in customer service systems, and he vented that his AI customer service is simply a 'two-faced' entity—polite and courteous in the morning, but can confront users with the same question in the afternoon; yesterday it accurately handled refunds, but today it suddenly starts talking nonsense.
“The same question, asked on a different line or at a different time, can yield completely different answers from the AI, and users are complaining about our system's instability.”
This issue is actually quite common. Many teams think that as long as the model is strong enough and the data is abundant, the AI can work stably. But the reality is: no matter how smart the AI is, if it doesn't have a stable 'decision framework,' it can easily experience 'split personality' in complex scenarios.
Smart models ≠ Reliable systems
Think about the process humans use to make decisions: we rely not only on intuitive responses but also on past experiences, adherence to company regulations, recording progress, and clarifying responsibilities. These structures allow different people to maintain basic consistency when dealing with similar issues.
But many AI systems only have the layer of "intuitive response"—problems come up, they are directly thrown to the large model, generating answers and that’s it. What is missing is:
Shared memory pool (how was the last issue handled? What is the user's history?)
Rule guardrails (what absolutely cannot be said? What processes must be followed?)
State tracking (to what step has this question been processed? Who is responsible?)
The boundary between thinking and execution (when should we "think it over", and when should we "get to work"?)
The result is that AI can easily "improvise" and even contradict itself.
The metaphor of the "kite string": it needs to be let go but also pulled back.
I like to use "flying a kite" to metaphor this relationship:
Large models are like kites—they need space to fly, to be creative, and to handle unforeseen situations. But if you let go completely, it will fly wildly and may even crash into a tree.
The key "kite string" is the decision-making framework. It does not limit the height of AI's flight but ensures three core elements:
Consistency anchors: similar inputs should yield logically similar responses.
State continuity: able to remember what it was just doing and what the next step should be.
Explainable paths: each decision can be traced back to a certain rule or data point.
Recently, after we added this "line" to a financial customer service project, the discrepancy rate of answers to the same question dropped from 37% to 4%, and user complaints about AI being "incoherent" have basically disappeared.
Stability under pressure is the true test.
Many AIs perform stunningly in demos but crash in real scenarios; why?
There are three major obstacles in real scenarios:
Concurrent requests (handling multiple users simultaneously)
Long-term tasks (a single requirement needs to be completed in several steps)
Unexpected inputs (users suddenly getting angry or asking bizarre questions)
At this point, relying solely on the model's "intelligence" is not enough. You need a structure like a spine—quietly supporting most of the time and preventing the system from "collapsing" at critical moments.
This spine needs to do four things:
Solidifying key rules (like "absolutely cannot promise unapproved discounts")
Maintaining dialogue state ("the user is angry and needs to be prioritized for calming")
Isolating risk areas ("this sensitive question needs to be escalated to a human")
Recording decision logs ("why did we respond this way at that time?")
Actual implementation: from "intelligent toys" to "work partners"
Building such a framework doesn’t always mean reinventing the wheel. Many teams approach it from three levels:
Level One: Context Management
Give AI a "work notepad" to record key information for current tasks.
Maintain a summary of the user's historical interactions instead of starting over.
Level Two: Rule Engine
It's not about replacing AI with rules, but about setting up "guardrails".
For example, "the refund amount must be confirmed twice"
Level Three: State Machine
Clarify what states each task has (waiting for information → verifying identity → processing → completed)
Prevent AI from "skipping steps" or "getting lost" in the process.
After an e-commerce company implemented this approach, the AI customer service's order issue resolution rate improved by 22%. The most crucial thing is—that situation where AI would contradict its previous promises never occurred again.
Written in conclusion: AI's "professional quality".
When we say a certain professional is "reliable", it doesn’t mean they never make mistakes, but rather their performance is predictable, their logic understandable, and their errors traceable.
Today's AI too easily gives the impression of being "genius but capricious"—sometimes stunning, sometimes ridiculous. What truly allows AI to integrate into business scenarios is often that stable professional quality.
A good AI system should be like an experienced customer service manager: flexible in response, while remembering the company's bottom line; able to handle complex situations while maintaining consistent service standards.
If your AI is still experiencing "splitting personalities", maybe you should ask: have we provided it with enough "structural support"? Or are we just expecting a single model to solve all problems?
True intelligent systems are not one-off bursts of inspiration, but repeatable rational decisions. And that thin "kite string" is often the key to moving from demonstration to practicality.


