Why would I want to publish a public version of a chatbot using a clone of my voice? It’s a fair question. I did it simply to better understand the process and expose it to a wider audience. Period.
For the technically curious in the crowd, I’ll explain what took place and the tools used in more detail. I’ll do that after the embedded chat window below. Fair warning: the more technical explanation may get a bit long-winded.
Building My Voice Into A Chatbot: The Simple Explanation
For the more generally curious, this was built primarily using Hume AI, GitHub, and Vercel. I became interested in doing this when Hume’s founders joined Google DeepMind after signing a major licensing deal earlier this year. Hume builds models that try to understand and respond to human emotions, not just the words.
It listens to the “musical” qualities of speech: rhythm, pitch, stress, intonation, pace, and pauses. It processes that information and passes it along to the language models. You will see that in the chat window below, it exposes that information. I went through 22 configuration changes (a few of which freaked my wife out) to reach the base configuration used below. Give it a try.
Fair warning, while this chatbot has no memory from conversation to conversation, it does record every interaction. So, be kind. And don’t get too personal.
A quick note before you dive in: I’m running this voice chatbot on Hume AI’s Creator plan, which means there are some real-world constraints you might bump into. The plan allows five concurrent connections, so if a handful of people happen to land here at the same time and try to start a conversation, you may get an error or find it unresponsive. That’s not a bug on your end or in the coding of this chatbot. It’s just the ceiling of the tier I’m on.
There’s also a monthly limit of 200 minutes of conversation time built into the plan. I’ll do my best to monitor that and upgrade if needed, but if the chatbot seems unavailable, that could be why.
I’m sharing this because I’d rather you know upfront than wonder what went wrong. This is an experiment in progress, and I’m learning as I go. If you hit a wall, try back later or reach out to me in the comments on this post. It’s only going to live in this post for my experiment.
Building My Voice Into A Chatbot: The Longer Explanation
Prerequisites
- A Hume AI account with your cloned voice already created and assigned to an EVI configuration.
- A GitHub account (free)
- A Vercel account (free, sign up at vercel.com using your GitHub login)
- Your Hume API Key and Client Secret (found in the Hume Platform under API Keys) and your EVI configuration ID if you’re using your cloned voice or want to use a specific Hume voice.
Let’s Start With Hume
Setting Up Hume: Head to platform.hume.ai/sign-up and create your account. You can sign up with your email, or use your Google or GitHub credentials to speed things along. Once you verify your email and log in, you’ll land on the Hume Platform dashboard.
From there, click Settings in the sidebar, then API Keys. Click Create API Key and copy both your API Key and Secret Key somewhere safe. You’ll need both of these to authenticate any app you build with Hume’s voice technology. It’s available in lots of locations in Hume, however.
Once your account is set up, explore the EVI Playground to test voice configurations, or head to the Voice Library to browse over 100 prebuilt voices, design your own from a text prompt, or clone an existing voice from just a few seconds of audio.
What goes in: Audio of your voice (or text, or video in some modes). The key is they’re not just reading what you say, they’re analyzing how you say it: tone, pitch, pace, hesitation, vocal tension. This is called “prosody.”
Prosody is the term for the musical qualities of speech: rhythm, pitch, stress, intonation, pace, and pauses. The how of speech, separate from the what.
Hume’s core insight is that prosody carries an emotional signal that words alone don’t. The same sentence said slowly with a falling pitch means something different from when said quickly with a rising pitch. Standard speech-to-text (the semantic layer) throws that away. Hume tries to preserve and use it.
So when it is “listening” to you, it’s processing the prosodic layer along with the semantic layer, and when it responds, it’s also attempting to generate “prosodically appropriate” speech rather than emotionally flat text-to-speech (TTS).
What Hume Is Doing With My Voice
What the model detects: Hume was trained on a massive dataset of human expressions mapped to emotional states. Their core model (called the “Semantic Vocal Burst” model) identifies dozens of emotional dimensions from voice patterns, not just the standard happy/sad/angry buckets. Things like amusement, confusion, awkwardness, determination, and more nuanced states.
How EVI (their conversational voice API) uses that: Their EVI-3 model, the one the chatbot above is built on, runs a continuous loop:
- Listens to your voice input
- Transcribes the words AND scores the emotional content of the audio
- Passes both the semantic meaning and the emotional signal to the language model
- The language model generates a response that’s shaped by both what you said and how you seemed to feel saying it
- The text-to-speech output is also emotionally modulated, so the voice response isn’t flat
The key idea: Most AI voice systems treat emotion as an additional layer. Hume bakes it into the response generation itself, so the model isn’t just answering your question, it’s theoretically calibrating to your emotional state in real time.
The Limitation: It’s still pattern-matching on acoustic signals. It infers emotional state from my voice, your voice, or any voice. It doesn’t actually perceive it. And since this still relies on a Speech-To-Text (STT) and Text-To-Speech (TTS) loop, not Speech-To-Speech (STS) or Sesame’s Conversational Speech Model (CSM), it’s limited.
It’s still pretty good, even though only trained on a few minutes of my voice. But latency becomes the tell. You’ll notice the uncanny valley, though some chats will be more obvious than others. There are a lot of variables in play.
And you can test all of this inside Hume’s playground. However, if you want it to live somewhere outside of their playground, you’ll need to do some work on your own. For the voice chatbot shown above, this involves the following services.
GitHub
Setting Up GitHub: Go to github.com and click Sign up in the upper right. GitHub is free for everything you’ll need here. You don’t need a paid plan.
Once you’re in, you’ll see your dashboard. The main thing to understand is that GitHub stores your code in “repositories,” which are just project folders that live in the cloud. When you “fork” a project (like the Hume EVI starter), GitHub copies it into your account so you can make changes without affecting the original.
You won’t need to install anything on your computer for the approach we’re taking here. All file edits can be done directly in GitHub’s web interface by clicking on a file and then the pencil icon to edit it.
The biggest issue with GitHub wasn’t GitHub. GitHub is pretty straightforward. The problem was finding the right repository to fork. My first search took me to an older version of Hume’s Next.js app project. And though it worked at first, it broke down quickly. Choosing the wrong project will bring you hours of pain. Trust me on this one.
So, if you care to try this yourself, this is the right place: HumeAI App Router Quickstart. You don’t want to fork the whole repo, so just use the “deploy” button to deploy this example project with Vercel. It will ask you to create a fork of this project. And you’ll want to pay particular attention to the “modify the project” section.
Vercel
Setting up Vercel: Set up your Vercel account before you fork anything in GitHub. The reason is that when you click the “Deploy to Vercel” button on Hume’s starter project, Vercel will automatically fork the repo into your GitHub account and deploy it in one step. If your Vercel account is already linked to GitHub, the whole process is seamless.
Go to vercel.com and click Sign Up. Choose Continue with GitHub as your sign-up method. This is important. Vercel will ask you to authorize access to your GitHub account, which is what allows it to read your repos and auto-deploy when you push changes. Approve the authorization, and your accounts are linked.
Once your Vercel account is set up and connected to GitHub, you’re ready to deploy the Hume EVI starter. Go to HumeAI App Router Quickstart and click the Deploy to Vercel button in the README. Vercel will prompt you to create a Git repository (this is the fork into your GitHub account), then ask for your environment variables. Enter your HUME_API_KEY and HUME_SECRET_KEY from the Hume platform, click Deploy, and in about a minute you’ll have a live URL with the EVI chat widget running.
This will, however, default to the standard Hume voice. To use your designed voice configuration, add “NEXT_PUBLIC_HUME_CONFIG_ID” to your Vercel environment variables with your config ID (found in any configuration you make). Then redeploy. This will bring in the voice you have configured in Hume’s playground environment.
WordPress
Once your Vercel deployment is live and working, embedding it in WordPress takes about two minutes.
Log into your WordPress admin dashboard and navigate to the page or post where you want the voice chatbot to appear. In the block editor, click the + button to add a new block and search for Custom HTML. Select it, and you’ll get an empty code box.
Paste in the following (my setup here, but you can add any styling you wish), replacing the URL with your actual Vercel deployment URL:
<div style="max-width: 800px; margin: 0 auto;">
<iframe
src="https://your-project-name.vercel.app"
width="100%"
height="600"
allow="microphone"
style="border: 1px solid #e0e0e0; border-radius: 12px; box-shadow: 0 4px 16px rgba(0, 0, 0, 0.12);"
></iframe>
</div>
The allow="microphone" attribute is critical. Without it, the browser will block the widget from accessing your visitor’s microphone, and the voice chat won’t work. Your site also needs to be running on HTTPS, which most WordPress hosts provide by default these days.
Update or publish the page, then visit it to test. You should see the EVI chat widget inside a clean, rounded container with a subtle shadow. Click the Start Call button, grant microphone permission when your browser asks, and you should be talking to your voice-cloned EVI assistant.
A few things to keep in mind. Every conversation uses Hume API credits, so visitors to your site will be consuming your account’s usage. On the free tier that’s limited, so if you expect meaningful traffic you’ll want to upgrade your Hume plan. Your visitors will each need to grant microphone access in their own browser, and the widget won’t work on browsers that don’t support the Web Audio API, though all modern browsers do.

Add your voice...