We’ve seen incredible advancements in artificial intelligence over the last year. Not just language models like ChatGPT, but also artistic expression models like Stable Diffusion, DALL-E 2.0, and MidJourney. We’ve seen experiments that give you the ability to make AI-generated music based on a prompt, and now Microsoft is experimenting with a new one called VALL-E, which is a voice language model that can recreate a person’s voice, complete with the ability to convey emotion, based on only 3 seconds of audio.
Where’s this all going? It feels like we’re on the precipice of something big and interesting here.
I have to imagine that the convergence of these technologies is the logical conclusion of the advancements we’ve seen so far. Currently these AI systems are silo’d from one another, but I imagine a world where they work together to perform much more complicated tasks.
…we’re literally moments away from the ability to create a synthesized version of anyone.
For better or for worse, there are currently a lot of people and companies very interested in the idea of creating simulated versions of real people. The wild part of this is, the technology is already there. From the language models that can be fine tuned based on their training data, to the ability to create completely new images of a person based on previous images combined with customized prompts, to the ability to generate a voice model based on a few disparate audio recordings, we’re literally moments away from the ability to create a synthesized version of anyone.
Here’s a photo of a guy on a beach, except he’s never been to this beach, ever.
On the one hand, the idea of conversing with your deceased dad, pouring your heart out about the things you wish you had told him when he was alive, receiving words of comfort from him, as he replies in the subtle ways that only he would, might sound incredibly creepy and unsettling. On the other, though, there are millions of people who need to have this sort of conversation in order to heal.
This is only one of a thousand potential applications of such a technology. At the very least, I imagine a world where we’ll be able to customize our voice assistants to our liking: from the voice, to the personality, to the name. We’re a ways off from realizing the movie Her, but you can probably already see a near future version where our digital assistants have gotten a lot better at showing us compassion and empathy, and connecting with us on a more personal level.
It’s a new and exciting time to be alive. I find it fascinating to witness these new emerging technologies. There are indeed murky waters ahead, and we’re for sure going to make mistakes along the way, but I do imagine that there does exist a better version of the future where these technologies are harnessed for good, and can have a net-positive impact on our world.