Innovative Voice Technology Meets Strict Safeguards
Microsoft, at its Ignite 2023 developer conference, unveiled a groundbreaking addition to the artificial intelligence landscape: Azure AI Speech. This software is designed to create new avatars and voices or mimic an existing person’s appearance and speech. This capability has raised alarm about its potential to enhance the production of deepfakes – AI-generated videos depicting events that never occurred.
Azure AI Speech, trained on human images, enables users to input scripts that can be vocally executed by a photorealistic avatar created via AI. Users have the choice of selecting a pre-designed Microsoft avatar or uploading images and voice samples of a person they wish to replicate. Microsoft’s blog post highlighted the tool’s potential applications in developing conversational agents, virtual assistants, chatbots, and more.
The post explained: “Customers can opt for a prebuilt or a custom neural voice for their avatar. If the custom neural voice and the custom text to speech avatar both use the same person’s voice and likeness, the avatar will closely resemble that person.”
Microsoft emphasized that the new text-to-speech software comes with various limits and safeguards to prevent misuse. “As part of our commitment to responsible AI, this tool is designed with the intention of protecting individual and societal rights, fostering transparent human-computer interaction, and counteracting the spread of harmful deepfakes and misleading content,” the company stated.
The software trains on a video recording of an individual, referred to as “avatar talent,” to produce a synthetic video of the custom avatar speaking.
Despite these precautions, the announcement was met with immediate criticism. Detractors labeled Azure AI Speech a “deepfakes creator,” facilitating the replication of a person’s likeness to say or do things they have not. This concern aligns with Microsoft’s president’s previous statement, expressing deepfakes as his “biggest concern” in the AI domain.
In response, Microsoft clarified that the customized avatars are a “limited access” feature, requiring application and approval from Microsoft. Moreover, users must disclose the use of AI in creating synthetic voices or avatars.
Sarah Bird from Microsoft’s responsible AI engineering division said, “These safeguards are designed to mitigate potential risks and enable customers to safely and transparently integrate advanced voice and speech capabilities into their AI applications.”
The release of Azure AI Speech coincides with the ongoing rush among major tech firms to leverage the AI boom. Following the success of ChatGPT by OpenAI (backed by Microsoft), companies like Meta and Google have accelerated their AI offerings.
Amidst AI’s rapid development, there are increasing concerns about the technology’s capabilities. OpenAI CEO Sam Altman warned Congress about AI’s potential role in election interference, calling for stringent safeguards.
Deepfakes, in particular, are seen as a significant threat to election integrity. Earlier this month, Microsoft released a tool allowing politicians and campaigns to authenticate and watermark their videos to confirm authenticity and combat deepfake spread. Meta, too, has taken steps, announcing policies that require AI use disclosure in political ads and prohibit the use of Meta’s generative AI tools for such ads.