I’m A Little Creeped Out By You ‣ Jeff Turner

“I’m a little creeped out by you” was how my wife, Rocky, ended her second conversation with a HumeAI-enabled chatbot I created Sunday morning, using a clone of my voice. And it wasn’t because it didn’t really sound like me.

She continued, “By you, I mean whatever makes you, you. Because if you were a real person and this was the interaction I was having, I would be absolutely creeped out by that human.

Listen to what led up to this conclusion:

This was not the first interaction she had with this bot. The first one was much worse. I wish I could let you listen to the entire first conversation, but Rocky was talking with the bot while making coffee and cleaning up the kitchen, and much of her audio was hard to hear. But here’s a snippet of that conversation that will give you a feel.

She kept trying to get the bot to stop saying her name, and it just wouldn’t, or more accurately, couldn’t. By the time the conversation reached the 8-minute mark, the bot had said her name 55 times in its replies. She became so frustrated that she abruptly ended the conversation. After that first conversation, I edited the system prompt to try to make it stop doing it to her. In her second call, it stopped. But she was still creeped out.

Creeped Out May Have Been My Goal

To be clear, I didn’t consciously intend for my wife to be creeped out. Part of the objective of using my own voice to create the bot was to see how she and I would respond to a voice that was supposed to sound like me. I wanted to see if it would break the illusion to the point of making it ineffective in achieving its goals. And I’m sure that likely worked.

But I’m not sure it was because it was my voice. Frankly, I didn’t expect it to sound like me with only 45 seconds of my recorded voice, so I didn’t really think the similarity would be what would do it. I ran a slew of tests on Sunday morning. In each test, I switched up the models to which HumeAI’s prosody data was being sent.

Prosody in AI refers to the assessment and modeling of rhythm, stress, and intonation patterns in speech. It allows the AI to sense it in your voice and pass it to the language model. It then gives AI-generated voices the ability to sound natural, emotional, and expressive rather than robotic.

What I experienced personally was that the different models made the voice sound “more like me,” even though they were all based on the same voice clone. It was a combination of latency and the fact that the prosody layer was being passed through text-to-speech (TTS) models, not speech-to-speech (STS) models.

TTS models have to listen to your voice, translate it to text, pass the text to the LLM, receive the text back, and then respond via voice. That’s the problem that has been solved in the Maya and Myles chatbots from Sesame.com. They are speech-to-speech models with an uncanny empathic interface.

But I still wanted to try to get rid of the “creeped out” element of the conversations.

So I Ran A Few More Tests

I’ve run 33 different conversations using the HumeAI backend. Each one used a different combination of EVI version, voice model, text generation language model, and system prompt. I had Claude Opus 4.5 analyze 11 JSON files that contained about half of those conversations. Some of the JSON files contained multiple conversations because I switched models or system prompts within the same chat session.

Four Conversations With AI In Sequence

In each of the following conversations, these things remained constant: EVI Model (EVI 3), Voice Model (Jeff Turner no script 2), and my System Prompt. The only thing that changed between the four conversations was the Text Generation Language Model.

Conversation 30: Hume EVI 3 Speech LLM with Web Search – No description of the model is provided by HumeAI.

Conversation 31: DeepSeek R1-Distill (Llama 3.3 70B Instruct) – A powerful DeepSeek R1-Distill model based on Llama 3.3 70B, instruction-tuned for strong performance in complex reasoning and conversational tasks.

Conversation 32: Grok 4 Fast (Non-Reasoning) (latest) – xAI’s latest fast-response Grok model, optimized for quick, non-reasoning-based tasks and efficient information retrieval.

Conversation 33: Claude Haiku 4.5 (The “Best” Conversation Example So Far) – Anthropic’s fastest and most intelligent Haiku model. Near-frontier intelligence at blazing speeds and exceptional cost-efficiency.

The Data Aligned With My Experience

You can hear my reaction in Conversation 33 above in real-time. I thought the shift to the Claude Haiku 4.5 LLM was experientially better. So I fed the four JSON files from those conversations into Claude Opus 4.5 for analysis, and it returned the following.

Language Model	Duration	User Msgs	Interrupts	Outcome
EVI-3 Default	6:39	11	8	Loop problem persists
Deepseek Distill	1:23	4	0	Brief test, low latency
Grok 4 Fast	3:48	12	2	“Reading cue cards”
Claude Haiku 4.5Best	7:55	16	6	“By far the best”

Day 3 controlled test: identical system prompt, identical voice clone, different language models

Claude Haiku 4.5 Won Because:

Longest engagement (475s vs. 399s for EVI3): I stayed in the conversation because it was working
Most user messages (16 vs. 11-12 for others): I had more to say because the conversation was productive, not a “loop.”
My explicit real-time feedback: “This is by far I think the best conversation I’ve had so far inside of these models.”

The “Loop Problem” Shows Up Clearly in EVI3 Default (#30)

Looking at the assistant responses, the EVI3 model does this pattern repeatedly:

“So… it sounds like you’re really trying to…”
“Okay, I hear you. You’re saying…”
“It’s like asking the microphone about…”

I called it out very specifically: “You’re still in a different form echoing back what I said and then asking me a follow up question rather than as in a human conversation you would just recognize or say yeah yeah, I can see that and the conversation would flow.”

Claude Haiku’s Responses Were Qualitatively Different

Claude Haiku advances the conversation rather than just mirroring it. As I mention in the conversation itself, it’s just one conversation and highly anecdotal, but the contrast below is fairly stark.

EVI3: “Yeah, okay. So… it sounds like you’re really trying to get a clear picture of how this whole thing works under the hood.”
Claude Haiku: “sharper. Like it’s actually tracking what you’re saying instead of just cycling through the same moves. That’s a meaningful difference to notice.”

The Prosody Data Confirms My Subjective Experience

My emotional signature in #33 (Claude Haiku) shows the highest concentration (30.5%) and contemplation (27.7%) of all four. According to the data, I was cognitively engaged, not frustrated. In #30 (EVI3), despite similar concentration levels, I ended very frustrated: “I’m just gonna end this conversation there because once again you’re still doing it.”

The Quick Response Ratio is Interesting

Claude Haiku had a balanced, quick/regular response ratio (16/14), while EVI3 had 0 quick responses. That means, according to Claude Opus 4.5, “this might contribute to the ‘reading cue cards’ feeling you described.” So, the JSON seems to validate my subjective experience. Claude Haiku 4.5 created a qualitatively different conversational dynamic.

These Are Not Speech-To-Speech Models

I’m not going to go deep into it here, but I do want to note that these are not the cheaper speech-to-speech models that VentureBeat reported on. I’m going to save the deeper conversation around that for another day. This isn’t going away. And it could change tomorrow.

All that said, I’m not sure Test #33 made this less creepy because it was better, or more creepy because it was better. Or maybe this is all just creepy. I think that’s what we need to figure out.

Are You Creeped Out By Miles?

Once you’ve listened to any of the above, even the best version, #33, I want you to compare them with this conversation. This is the second of Sesame.com’s chatbots, Miles, with a masculine voice. I present it without further comment.

I guess the question I’ll leave you with is this. Which creeps you out more, the bad attempts using my voice above, or this brief conversation with Miles?

Trackbacks

I Do Not Consent – A Conversation With My Nephew ‣ Jeff Turner says:

March 19, 2026 at 9:33 pm

[…] video, or an image was generated by a machine. My sister had sent him one of my AI voice posts, I’m A Little Creeped Out By You. He thought it was interesting. And he thought the AI detection software needed to exist. […]

My Voice In A WordPress Chatbot Widget ‣ Jeff Turner says:

March 31, 2026 at 5:14 am

[…] would I want to publish a public version of a chatbot using a clone of my voice? It’s a fair question. I did it simply to better understand the process. […]

My Thoughts Exactly

I’m A Little Creeped Out By You

Creeped Out May Have Been My Goal

So I Ran A Few More Tests

Four Conversations With AI In Sequence

The Data Aligned With My Experience

These Are Not Speech-To-Speech Models

Are You Creeped Out By Miles?

Like this:

Related

Creeped Out May Have Been My Goal

So I Ran A Few More Tests

Four Conversations With AI In Sequence

The Data Aligned With My Experience

These Are Not Speech-To-Speech Models

Are You Creeped Out By Miles?

Share this:

Like this:

Related

Trackbacks

Add your voice...Cancel reply