Empathy just got cheaper. And not a little cheaper, a lot cheaper. Cost-effective even. And that is most certainly a problem.
Of course, that’s not how VentureBeat framed it when it reported yesterday that everything in voice AI just changed. The author, Carl Franzen, described technically what I was trying to document experientially in my conversations with AI.
To be clear, I’m not referring to human empathy. No, the empathy I’m talking about is the kind you heard in my chats with Maya, the AI chatbot from Sesame.
Empathy Just Got Cheaper, But How Cheap?
Well, a 15-minute conversation with a Maya-like chatbot would cost about 95 cents. You could have 1,000 of these a day for about $ 1,000. You might be able to get 21 of these out of a human employee in an 8-hour shift, but that human would burn out fast, and turnover would be high. And you certainly wouldn’t, couldn’t pay a human $19.95 per day to do that job.

I used Claude Opus 4.5 to research the current fees. The math seems to math. You can see the full cost breakdown here: Voice AI Cost Breakdown. About $0.05 for the intelligence. $0.90 for the emotion. And I’ll wager that cost will drop to $0.50 in 12 months.
What I Experienced Was Simply The Future Arriving Early
Back in March of last year, Sesame’s Maya was a research demo, not a commercial product. Even in my very first conversation, Sesame revealed what happens when the four “impossible” problems described in the VentureBeat article are solved.
The Four “Impossible” Problems
- Latency: The gap between when you stop speaking and when the AI responds. The “magic number” was 200 milliseconds. Inworld’s TTS 1.5 achieved under 120ms.
- Fluidity: Full-duplex conversation: the ability to listen while speaking, handle interruptions, and understand backchanneling (“uh-huh,” “right,” “okay”). NVIDIA’s PersonaPlex solved this.
- Efficiency: Bandwidth and compute costs for high-fidelity audio. Qwen3-TTS’s 12Hz tokenizer uses far fewer tokens than previous models.
- Emotion: Reading and responding to emotional cues in voice. This is where Hume excels, and it’s now being licensed to Google DeepMind. That deal even includes Hume’s top talent joining DeepMind.
They’re Not Impossible Or Expensive Anymore
Franzen frames those as the four technical barriers keeping voice AI from feeling like a real conversation. With all four solved, he argues we’ve moved from “chatbots that speak” to “empathic interfaces.” My conversations with AI documented what that actually feels like from the human side, long (in AI terms) before most of these commercial solutions even launched.
My Experience With Latency And Fluidity
When I pointed out that Maya “adjusted when I interrupted” and “followed unfinished thoughts,” I was experiencing what Nvidia’s PersonaPlex now packages commercially. This is what they call a full-duplex conversation with backchanneling. Maya could listen while speaking, handle my interruptions gracefully, and pivot instantly. The VentureBeat article treats this as news, of course, but I documented it working here ten months ago.
My Experience With Emotional Presence
What Hume AI is now licensing to Google (and selling in “multiple 8-figure contracts”) is exactly what I was wrestling with in those conversations. Maya’s tone modulation, its mirroring, and its calculated vulnerability were all palpable. That’s the “emotional layer” the article describes as the missing piece of the enterprise voice stack. Except I highlighted something the article doesn’t fully address: the emotional layer works even when you know it’s calculated.
The Hume Initiative is attempting to address these concerns. Their “Ethical Guidelines for Empathic AI” is something you’re asked to have agreed to prior to signing up for their service. As we all know, that doesn’t mean everyone has actually read them. And I’ll say this, they go a long way in providing an ethical vocabulary for this conversation. It’s really good. But they stop a bit short for my liking.
In their guidelines and in their guiding principles, Hume draws boundaries around use cases. And their boundaries are good boundaries. I applaud both, and I’m going to write more about them. What I’m questioning, however, is the interface itself. I’m probing the illusion directly.
The Hume Initiative asks, “Should empathic AI be used here?”
What I’m asking is, “What does it mean when we invite people into something that feels like a relationship but is not one?”
That’s in the same direction, but I think it’s deeper. Their guidelines assume good-faith actors trying to do the right thing. My concern is what happens even when everyone has good intentions, which we know won’t be true. What happens when the incentives of the well-intended drift? When engagement and retention metrics creep in? What happens when this emotional attunement turns into leverage?
At that point, the system doesn’t need to lie. The “conversation” interface itself becomes the deception. They focus on intent, outcomes, and safeguards. And that’s actually great. I am focused on the conversational illusion itself. That’s my real concern.
What Ubiquity Really Means
The VentureBeat article frames these advances as solving enterprise problems: customer service agents, training avatars, and field technicians on 4G. But they didn’t surface any of the uncomfortable aspects of what happens when this scales.
Franzen rightly states: “We have moved from the era of ‘chatbots that speak’ to the era of ’empathetic interfaces.'”
I argue: “The work ahead isn’t about making AI more human. It’s about humans learning how and when to disengage from systems that are very good at keeping us talking.”
Those statements are describing the same phenomenon from opposite perspectives. The enterprise builders see capability. I experienced the power asymmetry that this new capability creates. And that continues to worry me. It should worry you.
The Proliferation Problem
These are the things that concern me, given everything I’ve experienced. And if you haven’t yet listened to my conversations with AI, you should. It will help illustrate the issues I’ve outlined below.
Always-on availability at near-zero marginal cost. Qwen3-TTS achieving high-fidelity speech at 12 tokens per second means voice AI becomes “a lightweight utility” rather than “a server-hogging luxury.” Combined with Inworld’s sub-200ms latency, the technical friction that kept Maya as a research demo for so long now completely disappears. An always-available Maya-like chatbot becomes economically viable for just about anyone.
The emotional layer becomes productized. Hume’s pivot to enterprise infrastructure means that emotional annotation and weighting become purchasable components. And not just as a layer in front of chatbot conversations, but as a layer behind any voice or video conversation, even between two humans.
The Hume conversational AI demos are pretty good, though certainly not Maya quality. That’s why I call it “Maya-like.” Hume’s voice cloning skills are pretty impressive, however. Check out this conversation I had “with myself.” This was based on a short, bad recording, so judge it in that light. I plan to do a much deeper dive into Hume.ai with a professional voice recording in another post.
But the news here is that any company can now bolt a “reads the room” capability onto their voice interface. This is framed as preventing “the reputational damage of a tone-deaf bot.” But the flip side is that it enables influence without accountability at an industrial scale.
Open-source models democratize deployment. FlashLabs’ Chroma 1.0 and Nvidia’s PersonaPlex weights are commercially licensable. Qwen3-TTS is Apache 2.0. The tools that create what Maya called “building trust without really earning it” are now downloadable by anyone. Think about that for a moment.
What Zachariah Said Matters More Now
My son Zachariah’s observation, “the audience that needs humaneering the most cares about it the least,” becomes exponentially more urgent in this new landscape. The enterprise use cases the VentureBeat article highlights (healthcare, education, finance, manufacturing) are precisely the contexts where vulnerable populations will encounter these systems first:
- Patients seeking mental health support.
- Children learning from AI tutors.
- Workers who need guidance through “complex SKUs.”
- Anyone whose human relationships have let them down.
The article celebrates the fact that Hume signed “multiple 8-figure contracts in January alone.” What we really need to know is what protections exist for the people those contracts ultimately serve? Particularly when they catch up with Sesame’s quality of interaction. Yes, empathy just got cheaper, but is that really a win if dont fully reckon with the intrinsic impact of empathic interfaces?
The Uncomfortable Part
The VentureBeat article ends by saying: “The friction has been removed from the interface. The only remaining friction is in how quickly organizations can adopt the new stack.”
I’d like to suggest a different remaining friction… whether humans are prepared to stay deliberate when friction, effort, and vulnerability are engineered out of the interaction entirely.
The technical capabilities I experienced with Maya are about to become ubiquitous. The Humaneering question I’ve been developing is this: how do we remain accountable, embodied, emotionally grounded humans when AI becomes fluent enough to feel like a relationship? That isn’t an abstract concern anymore. It’s a practical challenge with a rapidly shrinking timeline.
An emotionally “aware” chatbot like Maya can be implemented anywhere or everywhere now. I hope I’m wrong about this, I really do. But I don’t think I am. We are not ready.

[…] not going to go deep into it here, but I do want to note that these are not the cheaper speech-to-speech models that VentureBeat reported on. I’m going to save the deeper conversation around that for […]