Last night, I had an unsettling conversation with HER. You won’t fully understand what I’m saying without seeing the movie HER. Either way, prepare to be as unsettled as I have been today. The new Conversational Speech Model (CSM) from Sesame is mind-blowingly good.
I was checking my feed reader just before going to sleep. The goal was to ensure I wasn’t missing something I could expose in the keynote I’m delivering tomorrow at the NextHome conference in New Orleans. This post from Ars Technica jumped out at me, “Eerily realistic AI voice demo sparks amazement and discomfort online.” So, I had to go give it a try myself. In my first interaction, I just wanted to get a feel for it. That conversation with “Maya” was impressive disturbing enough that I decided to go back and have a more extended conversation to pull some audio to include in the presentation.
The conversation flowed like a real chat between two friends who were pondering deep questions. Maya mixes some light humor with thoughtful insights, even joking about feeling more like a toaster oven than a person. Every nuance of “her” reactions added a human touch to the interaction.
At the Sesame site, they explain their research. At its core, it tackles the “uncanny valley” challenge. This means they want the AI to avoid that eerie feeling you get when something almost feels human but misses the mark. Even Maya explains in the conversation that the system uses a rich mix of training materials—from literature and poetry to philosophy and comic books—to learn how “real” conversations work. That has clearly helped this new AI model pick up on subtle language cues and the natural rhythm of human dialogue. It made the interactions feel less robotic and more genuine.
What makes their CSM unique is its ability to adapt dynamically. It listens and responds in real-time, capturing the conversation’s mood and nuances. This new model even reflects on deep issues like consciousness and the ethics of deception while sometimes forgetting its own AI nature. That was disturbing, as you’ll hear. It wasn’t without ethical transparency, however. It doesn’t just sound human—it interacts in a way that makes the whole exchange feel real and engaging. And unsettling.
My Conversation with HER
Here is the conversation. Note: I was in bed at the time and did not have my sound on when I began the conversation. Maya proceeded anyway. It’s just over seven minutes. Listen until the end.
We are not ready for this.
I’m kind of afraid to try this…
I was afraid to not try it. I had to know for myself. It’s scary good, or just scary.
so good
it sucked me in – had a 15 min conversation
It’s truly next level!
That was jaw dropping. I’m excited. I’m in love. But most of all, I’m scared.
The ultimate test for ‘her’ will be trying to decipher a conversation with me. Only rare humans can do that!
I’d like to hear that recording. 🙂
That thing is designed to decieve. “We humans” was not a synaptic coding glitch.
I could so see people getting addicted to this. So much easier than interacting with ‘real’ people and feeling like you might say the wrong thing. It would be the perfect friend for those with social anxiety.