There’s no denying the advancement of smart voice assistant technologies into people’s homes. According to a 2021 Statista report, shipments of smart speaker units were at an all-time high in the fourth quarter of 2020. Amazon led the way with 16.5 million and Google Next was next at 13.2 million. These smart speakers come with integrated virtual assistants that respond to user voice commands. They also provide control over smart home functions.
But how do marketers and customer experience professionals leverage these technologies for voice content experiences that support their brand? How do you embed voice content into digital experience programs? Is that a road you’re even ready to travel?
CX Decoded invited voice-content expert and author Preston So, a CMSWire contributing author who writes about voice and digital customer experience. In this episode, So joins CX Decoded’s Rich Hein and Dom Nicastro to cut through the noise, discuss successful case studies and dig into the challenges facing the modern marketer.
Note: This transcript has been edited for space and clarity.
Rich Hein: Hello, and welcome to CX Decoded by CMSWire, where we explore the technologies, people and practices defining the next generation of digital customer experiences. I’m Rich Hein, editor-in-chief of CMSWire brought to you by Simpler Media Group. CMSWire is the world’s leading publication focused on digital customer experience and modern marketing. Our mission is to keep you up to speed on all the latest information, trends, news and best practices from the world of customer experience, digital marketing and the modern workplace. Come join our growing community at CMSWire.com. As always, I’m joined by my co-host and CMSWire senior reporter, Dom Nicastro. Dom, great to be here with you today.
Today we’re joined by Preston So, a product architect and strategist, a digital experience futurist, a developer advocate, a three-time SXSW speaker, and author of Voice Content and Usability, Gatsby: The Definitive Guide and Decoupled Drupal in Practice. He’s also one of our favorite CMSWire contributing authors. Preston, thanks for joining us today.
Preston So: Hey, Rich. Hey, Dom. Real pleasure to be here with you today.
Dom: Preston, give our audience the who’s Preston. So really I mean, Rich did a nice intro there, but how did you arrive at the space of digital customer experiences?
Preston: Sure. It’s a great question. I mean, one thing I always like to say is that I’ve been in this industry for so long, but it’s always changing. And there’s so many things happening all the time. I was very lucky to start out as a web designer, print designer, a typical agency background back in 2001, which was about 20 years ago. And ever since then, I’ve been really working in the area of primarily web-focused customer experience, so websites, web applications, all of these really important digital experiences that we work on today.
And I’ve been on all sides of the equation, from the agency mindset over to working for an actual B2C brand, Time Inc., as well as now working on the platform side, and on the partner side, working primarily on that underlying foundation of software: content management systems, all of those marketing technology applications that help out with a lot of things on that front. And ever since I got into web design, I’ve really transitioned over into working primarily on content as my focus. Ever since 2007, I’ve been working in the content management space. And ever since 2016, I’ve been working in the voice content space. So I’ve been in this space for a long time and got several other books that I have written on this topic as well, not just about voice content, but also about really modern content architectures. So it’s a real pleasure to be here today.
Rich: Could you start by telling us a little bit how you define voice content? And what are the popular platforms in this arena and the technology behind it?
Preston: Sure. That’s a great question. And I’ll preface this by saying that voice content as a term is still very much in its infancy when it comes to how we’re thinking about it from the CX perspective, how marketing organizations are thinking about it. But, simply put, voice content is content delivered through the medium of voice, and it’s spoken content, it’s conversational content.
But there’s a couple of really important nuances in this notion of what voice content is. And the first is that there’s a very big difference between written content and conversational content. The written content that we find on our web browsers that we scroll endlessly through when we’re reading the news, or do scrolling on social media; and the really distinct kind of content that we deal with when it comes to audio content and spoken content. And there’s a further distinction as well, where one of the most important things I always tell people, is that when you’re talking about voice content, it’s not just about the difference between written content as in text and the sorts of images that we deal with, versus conversational content. There’s also that distinction within conversational content, because the kind of content you’re delivering through a chatbot, or that you’re delivering through a textbot with SMS messages, is really not the same thing as the kind of content that you’re going to interact with on some of the major platforms or devices that are voice assistants, or smart speakers or, you know, these other voice interfaces that don’t have a screen and operate primarily on this medium of sound.
So we’re talking about some of these really important devices that have come out in recent years like Amazon Alexa, Google Home, Apple Siri as well from the application standpoint. And there’s a variety of different platforms that people who are in this space are beginning to use on a regular basis to build these conversational interfaces, platforms like Dialogflow, which is now owned by Google, Oracle Digital Assistant, which is by my employer, Oracle, as well as some really interesting new startups.
I think one of the really interesting trends we’re seeing now, and I talked about this in my book, Voice Content Usability, is the fact that a lot of people are moving into a realm where they want to be able to build once and deliver everywhere, or publish everywhere, which I know is a really big tenant of content marketing these days. But it’s really also about this notion of let me build a single conversation, as opposed to building you know, a single page. And let me focus on some of the ways in which I can unify all of these different platforms so that I build once and it goes out to Alexa, it goes out to Google Home, it goes out to a textbot that I’ve got on my phone. One of the dangers, of course, with that approach to agnosticism or unification of these different approaches to these devices, of course, is once again that distinction between spoken conversational content and written conversational content.
Rich: Could you just share a few examples of like the way people search? Because I mean, clearly, there’s a different way that people would search for something when they’re looking in Google and when they’re asking their Alexa for something, and I would like to highlight some of the ways that is.
Preston: I think one of the really interesting things about voice search or voice content and the ways in which we deliver results through search, it’s really about asking questions these days, and commanding tasks or asking for tasks to be performed. And this brings me to, I think, a really interesting distinction that I cover in my book, Voice Content and Usability as well, which is that voice content is really actually a subset of the kinds of voice interfaces that we interact with on a daily basis.
When I first started out with voice about five or six years ago, one of the interesting trends that I noticed was that there were all sorts of people building really great and interesting, compelling voice interfaces and voice assistants that would help you order a takeout pizza, or check your credit card balance or check when a store was open or book a flight or book a taxi. But those are predominantly what I call transactional voice interfaces, which means that you’re primarily asking for somebody to do something on your behalf.
When it comes to the other category of voice interfaces, those that sling voice content, we’re talking about informational voice interfaces, and informational voice interfaces are very different, because you’re really dealing with this realm of conducting searches of asking questions, you’re wanting to discuss a movie that just came out, you’re wanting to talk about certain music that just came out, or you want to be able to have a conversation about a museum and certain features that it has. And then maybe go into booking a ticket or registering for an event. You don’t necessarily want to have all of those transactions be the only function that your voice interface can provide.
And so this is a really important distinction because transactional voice interfaces and transactional conversational interfaces have been the bread and butter of a lot of brands for a long time. If you think about airlines, if you think about hotel conglomerates, they’re constantly helping people with canceled reservations or flight cancellations. But there’s still comparatively few organizations that are working through this problem of how do we deliver content out to these audiences? How do we have a conversation that’s not so much about getting the job done, but more about informing them or giving them information that they’re looking for, which is a very different concern.
Rich: So Preston, you mentioned a couple of different types of this technology, transactional versus informational? Are there others?
Preston: So one thing I will say is that there’s not really a distinction between transactional and informational voice interfaces that are encoded into the various platforms, the various players that are in the voice interface market. But you definitely do see some players in the market that focus on one or the other. And one of the things I’ve noticed is that voice content, informational voice interfaces, are still something that’s not actually really handled very commonly by a lot of these players.
One example of this is Apple Siri. For the longest time throughout Apple’s Siri history, it was really focused on doing things that were very simple. So either transactional things like booking a cab or scheduling a calendar event. And you know, maybe conducting a search on your behalf, but not necessarily taking you all the way through those results and giving you maybe the best results you’re looking for or helping you lead through those. And now, one of the really interesting things about a lot of these platforms is that they’ve begun to offer a lot of custom flexibility for people to build in their own types of voice interfaces.
And there’s a really interesting problem in voice interface design, which is, how do you manage to put in this personality into this voice interface that doesn’t have any humanity or any sort of personality of its own in a way that allows for a conversation to proceed naturally? So it’s not just the fact that there are voice interfaces that fall into these categories of transactional and informational. But the individual conversations are the individual voice interactions that we have within the purview of these interfaces are very different.
When I start talking to Alexa, for example, if I’ve never used Alexa before, I might start off by saying, hey, Alexa, how’s your day, because that’s the way I kick off a conversation with anybody. And that’s an example of a pro-social voice interaction, or something that really helps us to kind of set the stage with the other person, make sure that they know that we’re of the same mindset when it comes to conversations.
And then once that’s done, once the glad-handing, the small talk is done, then you jump into either that transactional or informational voice interaction. One of the interesting things is that today with Alexa and Google Home, they’re really competing with each other, right? If you look at Google Home, Amazon Alexa, you know, Samsung’s Bixby, they’re all competing with one another on their ability to shift seamlessly, and at any given moment, move into these different modes of conversation at any given time, which is a really interesting prospect.
Rich: Yeah, I actually have both I have Siri and Alexa. And this is absolutely 100% opinion. But I am definitely an Alexa guy. I have found at this point that Alexa does things far better for me than Siri has been able to. I don’t know if you guys have any experience there. But I’m just sharing mine.
Dom: Rich. Did I see in the Simpler Media Group mailing slot a check from Alexa?
Rich: Yeah, they’re covering my prime for this ringing endorsement.
Dom: Everyone has their preferences. Preston, I want to just ask you about something. if I’m on my marketing team, I’m on a customer experience team. And and we’re actually thinking about this, where are they even beginning? Like, what’s the market research on this, whether it’s a good idea for them to even begin to think about getting into voice content programs? Because, look, when we want to boost our SEO, we want to boost our website, we all have an idea how to go about that, right? You start looking at some tools with SEO, you start to look at maybe some keywords here and there, see what people are searching for in Google, kind of mimic that? Where do they start with this?
Preston: That’s a really good question. And I think this really revolves around all the questions that are coming up more and more about content strategy, about content design, about really looking at how content is going to manifest beyond the web. And one of the things I’ll say is, for better or worse, you know, and this is not to say that we’ve done anything bad in terms of us as an industry. But I think one of the things that marketing organizations today that are on their digital transformation journeys, what they have to think about is we’ve been really biased toward the web for a long time. We’ve really predominantly been focused on the website as the primary and even oftentimes only conduit into our content. And it’s a very big difference from how we’ve been operating for a long time. And I think there was a little bit of this interesting transition where, in a lot of ways, the content that we work with on our websites, the content that we put together for our websites, the content that we manage for our websites, is really just an extension of the printed content that we were interacting with before.
If you think about newspaper websites, and microfilm archives, there’s not really much difference, right? You can just endlessly scroll and read more and more text. And you know, nowadays with websites, we have infinite scrolls, so you can keep on reading however long you want to want some of these websites.
But when it comes to things like SEO, in particular, we’re really operating in a completely different universe. And there’s a really important distinction I like to make between the notion of what I call macro content, which is these big chunks of long form Russian novel style content, you know, these epics that stretch across the entire browser extend from miles down a browser page. And what Anil Dash calls microcontent, which is these text messages that we might get from a friend’s tweets. We’ve had a recent president that’s very tweet-happy. But a lot of this kind of content that is more microcosmic or atomic is really different from the content that we’ve been working with on the web, because there’s a lot less context. And there’s a lot less length to this kind of content.
And so I think one of the issues that we have to think about is, a lot of the content that we currently have on the web is not ready for what I would call a voice interface or a voice-ready content strategy. And a lot of that is because so much of our content is trapped in these gigantic body fields and a content management system or in a gigantic text box in our blogging tool. And it’s not something that’s really been sliced and diced in a way that allows for it to be bereft of any context or unmoored from the browser or from a screen.
So the way that I always recommend that organizations start out, and this is what I tell all my clients, and what I’ve told all of the people that we’ve done discovery workshops with is, you want to look for the lowest hanging fruit. And I think this is true of all content strategy when it comes to experiences beyond the web, especially those that deal with the channels that I’ve worked in: voice, chatbots, augmented reality, virtual reality.
Frequently asked questions are a great way to help the organization start off and think about, well, you know, a lot of these questions are very conversational in tone. They’re things that we know a lot of our customers want to know about. And they’re also sliced and diced already. So just to kind of share, this is exactly what we did for the first ever voice interface for the residents of the state of Georgia, Ask GeorgiaGov (when So was with Acquia), and this was one of the first ever content driven voice interfaces or voice content interfaces in the history of conversation design, which is a massive achievement for the residents of Georgia.
Rich: And what was the goal of that project?
Preston: Georgiahas always been focused and very much ahead of the curve when it comes to accessibility, when it comes to really finding out and honing in on what are the ways that we can reach some of our audiences, namely, the citizens of Georgia, who don’t necessarily have access to a computer or don’t have access to an agency office or might not have internet access even. And their big question was, we’ve got this website, it’s accessible, it’s great. It operates really well. But a lot of people, especially elderly Georgians, and retiree Georgians, don’t have a whole lot of ability to use a computer really well.
And to this end, they’re much more comfortable having a conversation, just like you go to the supermarket and have a conversation with your favorite friend behind the deli counter. You want to be able to have a conversation that’s relaxed with somebody who’s able to actually carry a conversation with you. So their goal was, hey, we’ve conquered all of the questions when it comes to web accessibility, right? Georgia.gov is one of the paragons of web accessibility out there for government content for public sector.
Now how do we extend that legacy of accessibility and make our experiences for our residents even more equitable, by reaching them in their homes through Amazon Alexas through Google Homes through chatbots on their phones, and Ask GeorgiaGov, which sadly, has now been decommissioned because it was built on a little bit of an older version of Alexa.
Dom: Preston, what was the market research on that — determining that there was actually a need for voice content programs with the state of Georgia? Because were they able to determine that through surveys?
Preston: That’s a really good question. I definitely don’t have let’s say the data right in front of me. But I can definitely share that the average age of a lot of folks in Georgia, especially in rural areas of Georgia, was a big factor in their decision to think about Amazon Alexa as one of their first forays into this realm of content beyond the web.
One thing I’ll also share as well is that they received feedback from folks across the state who were unable to reach, let’s say, agency offices. So one example of this is: let’s say that you need to get your driver’s license renewed, but you don’t necessarily have the means or you don’t have a vehicle handy to make your way over to your county seat to visit the local DMV, or you might not necessarily have a phone handy or great phone service that allows you to call somebody and get those questions answered.
A lot of this also was predicated on the fact that a lot of organizations everywhere, I think, are dealing with this fact that a lot of customer service agents, a lot of these frontline agents who are responsible for answering calls on these government hotlines around these 1-800 numbers are very overloaded; they get tons of calls every day. And part of their calculus for this as well was to say, hey, how can we think about potentially offloading some of this burden? Because a lot of these phone agents are completely overwhelmed by a lot of these calls. How can we help people who might have an alternative actually use those conduits instead?
And I think this is really relevant, especially for state and local government. You know, budgets have been slashed in a lot of different jurisdictions. A lot of these hotlines in these offices have fewer and fewer people able to handle a growing volume of calls. So for the public sector, I think this is a really important and useful consideration. But also for those organizations that have customer service departments that also were overloaded, or certain locations that just can’t handle the flood or the inundation of people who are constantly coming in asking questions — that could potentially be answered more efficiently in the comfort of their own home sitting on the couch having a conversation with their Alexa device.
Rich: So when they went about this process, was it like a headless CMS situation where you developed all of the content, and then you were just able to push it out to those other platforms like we had discussed earlier?
Preston: Really good question there to follow up good questions. So what I’ll share is that this is an interesting situation, right? Because I think one of the big advantages of headless CMS and obviously I write about this quite a bit on my blog, I think one of the things that a lot of folks forget when it comes to a lot of the older CMS or content management systems that we work with, is that they’re also capable of not only managing the websites that we work in on a daily basis, but also these other interfaces.
So one good example of this is that Georgia.gov was already using a Drupal website. Drupal is an open-source content management system that’s used by you know, approximately 2% of the entire web. It’s used by a lot of government organizations. It was used by whitehouse.gov at one time, and one of the things that Drupal does really well is, I actually wrote the book on this, by the way, Decoupled Drupal in Practice a couple years ago, Drupal lends itself really well to an architecture where you can not only leverage it as that website builder, that amazing foundation for your website, but also as that headless CMS back-end or that headless CMS data layer for all of the experiences that are off the web or that are separate from the website that you need to serve.
So what was really interesting about this project is, I think there’s a lot of counter reaction that’s happened recently to some of the ways in which content silos have appeared. One of the biggest problems I think a lot of organizations have faced, especially in the realm of conversational interfaces, is you start building a chatbot. You have your chatbot text over here, but then suddenly that’s completely separate and decoupled from your main content repository, your main content store. And so when you have your content team going in there to try to edit your website and edit this chatbot text, you’re suddenly having two different versions and you build another chatbot for the next season, you build another voice interface for this purpose. Suddenly, you have 15 different content databases that are all out of sync with each other.
A lot of marketers only have one content team. We can’t be in the business of managing five or six different versions of content, one for voice, one for chatbot, one for web one, one for mobile. It has to be part of the same content repository. So I think one of the interesting things about this project is that it involves a single source of truth for content. It involves what a lot of omnichannel content strategists today call single sourcing so you avoid those silos, and it involves content reuse to a very large scale.
What that means is that the same exact content that you find on Georgia.gov is the exact same content that you’re going to encounter on the Amazon Alexa, and you update it in one single place in the CMS. And you don’t have to update it anywhere else, those percolate out to the respective digital experiences that you have to serve.
And this is a model that I think is really important for a lot of organizations to recognize, especially as we move into more page lists, experiences, or these user experiences that are very much off the web. How do we actually make sure that the content that we’re managing stays in this really cohesive corpus that doesn’t get out of sync within itself, and doesn’t require us to chase down a whole bunch of different versions of content?
Rich: You bring up a good point. I mean, you talked about Drupal. And so the first question that comes to mind is what are the other tools that marketers should be looking at? But really, the question is, you know, how is the enterprise going to create and manage all this? What are the foundational tools that they’re going to need to get this done? Drupal is just a CMS, I don’t mean to say they’re just a CMS, they’re a CMS, that’s obviously one, but what are the other items?
Preston: Yeah, this is an example of how the voice interface world has kind of mirrored the evolution of the Internet of Things (IoT) universe as well as the augmented reality/virtual reality world and the mobile world as well. We’re finding that what you need is essentially that baseline for your content, which is a CMS, and it could be you know, any CMS, it could be Oracle Content Management, right, it could be a headless CMS.
But the second thing that you really need, apart from obviously, the capability on that CMS platform to be able to provide for a sort of delivery to voice interfaces, and that’s usually through an API, which is basically a layer that exposes this content out for any other application, be it AR/VR, be it IoT, to be able to take in. The other side of it, though, is that you really also need to have that development experience on the other side for a technical team to actually implement this content. Because one of the issues with voice content is that great, you’ve got the voice content, it’s ready for your voice interface. But then you got to glue the pieces together, actually connect the dots between all of the different pieces of content that you have. And that really involves mapping out flows, designing these dialogues that eventually become code on the other side. And there are certain platforms that allow you to do this.
One of those is Alexa Skills Kit, which is what we use to build Georgia.gov, which allows you to basically create all of this glue, and all of this dialogue, this connective tissue between all of these content items in a way that really makes sense. There’s also visual tools or low code tools that exists. There are companies out there that have done a lot of work in this regard, especially people like Google Dialogflow, for example, with my employer as well, with Oracle, you’ve got the ability to do this.
And then there are some startups that are coming up as well. There’s one interesting one that’s been on my radar for quite a long time, which is Voxable, as well other ones that are agnostic to these technologies. So you can build once and have those percolate out to all these other platforms like Botsociety and certain other tools that are friendly for design organizations, that are friendly for editorial groups, that are also friendly for marketing organizations that don’t necessarily want to get into the business of writing code.
Because a lot of what has happened, especially in the last five years when it comes to a lot of these platforms, and this really started I would say with the advent of Alexa Skills Kit and Dialogflow, is this notion of low-code or no-code voice interface design, which is something that really didn’t exist before. Nowadays, it’s really not about understanding these acronyms, speech synthesis, markup language, or some of these other formats that are out there, because the tools have gotten a lot better. Of course, one of the issues with that, though, is that it’s created these walled gardens that you see in the mobile landscape and IoT landscape.
Rich: Do you think there will be a standard at some point?
Preston: What worries me the most about the future of voice interfaces and the future of voice content and voice-driven marketing is this fragmentation that we see in the landscape. We see this also occurring in the mobile universe as well. Android versus iOS. It’s one thing that really concerns me, but honestly, to be frank, I don’t think there’s going to be a standard that emerges until unless one of these upstarts that is trying to level the playing field and allow you to build or all of these different platforms is somehow able to gain enough of that market share and that mindshare among marketing organizations that that becomes kind of the de facto standard and much the same way that you know, Salesforce became the de facto CRM.
Dom: How do you measure ROI on a voice content program?
Preston: Sure, this is a great question. And I think it really harkens back to a lot of the issues that marketing organizations that customer experience groups deal with on a daily basis. We’ve got these experiences out there, we’ve got content out there, we’ve got people consuming it, how do we actually instrument that or introduce analytics or log that so we can understand how that’s going?
So ROI is obviously a big, big concern. You know, you don’t want to go out into a direction especially that’s as experimental as voice without having some ground rules or some understanding of how it’s going to perform. One of the things that we introduced for Georgia, and this is one of the other reasons why I always recommend that organizations use the same underlying content store or the same underlying content corpus or content repository to actually manage all of these different customer experiences. Georgia wanted to have the same kind of understanding of 404 errors, of most commonly search terms, most commonly viewed FAQs, in the same way that they had their web analytics and their web reporting.
So what’s great about this is that Drupal, obviously already has a lot of this logging and performance tracking in place. And what we did is we simply implemented a custom built logging and analytics mechanism that sat right alongside their web analytics and web logging. And now the really big benefit of this is that you can cross-compare, and you can look at, hey, what’s the content that’s performing well on the website, but that might not be performing as well on voice? And how do we address how those things actually might match up in the future?
So one example of this is that first and foremost, one of the things that we tracked was success rates, which is one of the most challenging things to measure when it comes to a voice in interface, because it’s a metric that can really depend on a lot of different nuanced considerations. When you think about success rate, what does that mean? Does that mean successful delivery of content? Does that mean that the machine returned a response? Does that mean that content was delivered? So settling that underlying metric was a really important benchmark for us.
Rich: Where do you see voice content evolving over the next two to five years?
Preston: That’s a really interesting one. You know, I think that voice content itself is is still very much in its infancy, and I think we’re going to see a lot more organizations today, just starting to dip their toes in voice get much more involved. But I think there’s going to be an interesting paradigm shift here in the next two to five years. And that is that what we’re going to see is a lot more focus on not these, let’s say manually defined voice interfaces or really pixel perfect, which is, you know, mixing metaphors here. But not this approach where you necessarily define every single possible conceivable response that the voice interface will issue, but more flexible approaches where you might introduce a little bit more flexibility where Alexa can respond with a little bit more of a range of different responses, and you can maybe not necessarily write those in, but those will be automated.
You know, one thing that I think is really interesting, and I talked about this at length in my book is when it comes to conversation centric design, right, is this notion of being able to have a natural sounding conversation that veers between the social, the informational, the transactional, and also has no barriers in terms of topic range. That’s still very much a ways away.
Dom: Preston, where are you seeing the most innovation in general in customer experience?
Preston: I think the place where I’m seeing the most innovation right now is in two separate realms. And I’ll share the first which is obviously voice content. Voice content I think is is huge right now, and I think everyone, especially over the course of this coronavirus pandemic, we’ve been stuck at home stuck in our routines, we’ve had to engage with our interfaces that give us a little bit of that escape. Voice interfaces are a great way to provide that escape, a lot of brands have now latched on to this idea that OK, voices part of the future. Obviously voice isn’t the most accessible to everybody, folks who are deaf or hard of hearing still need other conduits to experience their content as well.
The other area that I think that’s going to be really pressing and really urgent for a lot of brands in the near future, especially in the next two to five years, is what I call immersive content. And immersive content is just like voice content, which operates in the realm of sound and speech. And this is really about things like digital signage, augmented reality, virtual reality, geolocation content. And I think that’s another area that we’re going to see a lot more change because let me tell you over the course of the coronavirus pandemic, not only have sales of voice interfaces and smart speakers skyrocketed, like Rich said earlier, we’ve also seen a huge uptake of gaming headsets, of virtual reality interfaces, of these formerly really kind of wonky interfaces that are now commonplace in homes across the United States.
So immersive content is definitely a huge area that I imagine we’ll see a huge amount of growth in the near future, especially with regard to how to deliver, let’s say, these experiences where people don’t want to set foot in a brick and mortar store anymore. They want to bring the brick and mortar store into their own homes, and actually maybe browse some of these products within a VR headset, as opposed to going down the street to the big box store.