“Alexa, play Grand Tour!” If you always wanted to bark commands at someone without getting an angry response back, you might just love Alexa. Alexa is the smart assistant behind Amazon’s smart speaker Echo. But there are better reasons to use voice-controlled smart assistants than barking commands.
Amazon’s Alexa is similar to their the Kindle platform and the Prime Video platform in some respects and very different in other ones. And it is also different to Apple’s Siri and Google’s smart assistant.
Join me today to a tour through one of Amazon’s biggest bets and one that promises to change how humans interact with machines.
(1) The installed-base
It started with smart speakers but by now there thousands of types of devices that can be used to talk to Alexa.
Amazon’s Alexa is by far not the only smart assistant. Google, Microsoft and Apple have started the race to capture market share on the smart assistant battlefield. Amazon seems to be the one pushing their platform into as many other devices as possible. This is not a surprise. Apple’s assistant Siri is built-into their iPhones and Google’s assistant can be accessed via Android that is on more than 80% of smartphones (though it depends on which type of Android the respective phone provider has implemented).
With this, a few things are clear. One, Amazon has to work hard to get onto a physical install base as their own phones have not captured a large installed base. Secondly, looking purely at smart speakers sales/ownership (as per the chart below) does not represent the real install base of smart assistants. And lastly, multi-homing is likely wide-spread, i.e. there will be many people using more than one smart assistant depending on the job they are trying to get done.
(2) Use cases and customer value propositions
With this said, it is not unlikely that Amazon will (at least initially) focus on certain types of use cases. Here is a short excerpt of what Alexa and connected devices can do at this stage.
- Voice control Amazon Prime Video (“Alexa, play Star Trek Beyond”, “Alexa, what’s on TV tonight?”)
- Voice control music streaming, e.g. from Kindle Unlimited or Amazon Prime (“Alexa, play Pink Floyd Dark Side of the Moon”)
- Get Alexa to read books or play audiobooks (“Alexa, read Philip K Dick’s Ubik”) it will continue from where you left off on your Kindle device
- Get one of the streaming services started, such as TuneIn, Spotify, etc (“Alexa, play John Denver on Spotify”)
- Use Alexa devices in different rooms of your house as an intercom to communicate with your family (“Alexa, drop in kitchen”)
- Communicate/call others on their Alexa-enabled devices (incl their smartphones or landline) (“Alexa, call dad” or “Alexa, call 555-0000”)
- Message (voice message or SMS) to Alexa enabled devices or smartphones (“Alexa, message dad” followed by your message o prompt)
- Voice control your lights (e.g. as you are about to enter your home) (“Alexa, turn on the kitchen [bathroom, bedroom] lights”)
- Smart plugs to turn on/off anything, e.g. lamps, fans, electrical heaters, etc (“Alexa, turn on the coffee maker”)
- Thermostats, air conditioners, etc (“Alexa, heat the room to 24 degrees (Celsius)”)
- (Security) cameras or video devices (“Alexa, show the backyard”)
- Define routines of several actions combined, e.g. start the coffee maker, the lights, the heater in the kitchen as you are getting up
Laptops & PCs:
- Alexa-enabled PCs and laptops (here Acer) can save some typing and be used as an interface to control other Alexa devices
- Vehicle entertainment systems can be controlled via Alexa and as an input device
- Reordering of items, mainly through Prime Now (“Alexa, reorder apples”)
- Ordering new stuff (“Alexa, add bananas to my Whole Foods cart”)
- At this stage, shopping is not yet that sophisticated, good for grocery ordering, but limited otherwise
- There is massive innovation potential foreseeable in this space and surely an area that Amazon is focusing on (see below)
- Daily updates: Weather, traffic,
- News, information, sports updates, etc
- Simple queries such as calculations
- To-do lists, shopping lists, reminders, alarms
- Local search: restaurants, shops(!), etc
Skill Blueprints: These are skills with a template for you to fill out within minutes (here is the blueprints portal you can fill out one of the templates to see the ease). You can activate the skill using “Alexa, start My Workout” (if that is how you names the skill). Examples:
- Workout routines: a skill that allows populating your own workout routine for the week
- Inspirations: create your own list of inspirational quotes
- Chore chart: weekly household to do list
- And lots more
A lot of these use cases are also available via smartphones. But given Amazon has been late (and not particularly successful with their smartphone) voice control might just be the way for them to enter important markets such as communication, home and car entertainment systems, etc. Amazon has been working secretly for years to be one of the early/first movers on this new value proposition and platform.
Since the success of smartphones, we know of their importance of having an ecosystem. They are an important part of the customer value proposition and success.
The ecosystem of smartphones is heavily centred on media, games, the internet and apps. Alexa’s ecosystem incorporates physical extensions heavily from these early days. This is obvious in the fact that many physical devices have built-in Alexa-enablement, can be voice-controlled or are sold in a bundle with Alexa.
There are currently over 20,000 Alexa controllable devices in many categories:
- Smart speakers, headphones
- TVs, set-top boxes
- Phones, tablets, cameras
- Laptops and desktops
- Smart home devices
- Lights, plugs
- Vehicle entertainment systems
- And more
(4) The platform: deep integration
Now we are getting to the core of the system. The platform revolves around the voice-controllability. It is of course not about the gadget (smart speakers) but the capability behind it. It is a true platform business model in that it brings together a demand side (users) and various supply sides (those providing devices, voice-controllable services and skills).
To understand the platform of Alexa-enabled devices and services let’s first look at some important technical characteristics:
- It is a client-server architecture
- The equivalent to apps are called skills
- Like apps, skills are open to 3rd parties developers
- Unlike most smartphone apps, most of the code runs on Amazon’s servers (Amazon Web Services Lambda) on to be paid for on a pay-per-use model
- Amazon provides interfaces and programming tools to develop the 3rd party code
- The 3rd party code allows the definition of utterances which are the words used to control the respective 3rd device or skill
- The voice controls can be captured through (1) devices with micro and speaker or (2) through the Alexa app, e.g. Amazon Echo, Fire TV, Fire tablet, 3rd party devices, such as laptops, smartphones, tablets, etc
- Devices require to be set up via the Alexa app, e.g. name rooms-device combinations, which then will be available to Alex on the cloud
- Any of the input devices can be used to control any of the Alexa-controllable devices/skills
- Some devices are pure output devices, e.g. bulbs, plugs, etc as they have no micro or speaker
- Alexa will interpret the user’s commands (through the code on the cloud), identify the relevant device and send the commands to the device for execution
The platform architecture is very different to what we know of smartphones in that most of the code runs on Amazon servers. This allows devices and skills that have been set up to be controlled through any of the input devices.
Take the example of turning your home’s lights and thermostat via your car’s Alexa-enabled entertainment system (or via your Echo Auto) as you are driving into the garage. There is no trace of code on the car entertainment system or the Echo Auto. They are input devices that connect to Amazon Voice Services which then handles most of the rest.
If you wanted to control your smart lights or thermostat through your smartphone you would need to install the respective control app onto your phone. If you wanted to also control is via your tablet you would need to install the app also onto this device. And you would need to do the same with each device. I.e. you would have apps for each device that you want to control on each device through which you want to be able to control them.
The two APIs (application programming interfaces) at the heart of the Alexa platform are Amazon Voice Services (AVS) and Alexa Skills Kit (ASK). It is important to understand at least on a high level how these work.
Amazon Voice Services (AVS)
Devices with a micro and speaker can choose to become an Alexa built-in product by using Amazon Voice Services (AVS). Here are some characteristics of this interface that shows the deep integration of the service:
- The AVS API (application programming interface) provides access to Amazon’s speech recognition capabilities and to the skills that are available on the respective user account
- Speech recognition and natural language understanding happen in the cloud, eliminating the need for developing such complex capabilities for the device/skills developer (and making Amazon’s part indispensable)
- AVS-enabled devices (i.e. those with micro and speaker) listen to the user constantly but don’t transmit anything to Amazon AVS until they pick up on the wake word “Alexa”. Once they do so, the device starts streaming what it records to the Amazon servers plus the 0.5 seconds prior to the wake word so that AVS can calibrate to the ambient noise levels. AVS then will verify whether it was really the wake word or a similar word (this is done by the Cloud-Based Wake Word Verification Alexa API)
- The speech recogniser Interface which is at the heart of the interaction between AVS and the client device will then tell the device when to start and stop capturing. The latter happens if it was a false wake or when the user’s command is complete. This interface receives the voice recording but also important parameters such as the distinction between near and far field which makes a big difference for the speech recognition algorithms. Alexa will also trigger the specific commands that a device can execute and the parameters necessary
- While these architectures are quite different to smartphone apps, one of the benefits is that of a consistent customer experience across the broad variety of devices and skills. The strong dependency on Amazon AVS is a welcome side effect. This architecture helps Alexa’s machine learning algorithms to get better and better (direct data network effects)
Alexa Skills Kit (ASK)
The other significant interface for developers is the Alexa Skills Kit API (ASK). It gives developers a way to develop skills (=apps) for Alexa and get them distributed via the skills store – very similar to apps. Alexa’s customer value proposition expands with an increasing set of devices and skills.
- We can distinguish between skills linked to a physical device
- and skills that are “virtual” which are similar to smartphone apps, such as games, productivity, timers/alarms, calculator and much more
This said I believe the biggest mistake would be to think that skills are voice-controlled apps.
- Sure in some cases, voice control eases using the respective services. You don’t have to find your smartphone, unlock it, scroll to the respective app and then navigate to what you wish to do (provided you make it through the notifications and other distractions that will be in your way
- This in itself is a great value proposition and Alexa users report how much different voice control is to using a smartphone
- And it opens creative, new opportunities similar to the way the Nintendo Wii did (but more powerful so), maybe in the travel industry
- And of course, there will be things that are still better done on smartphones
Apple, realising that they have fallen behind on voice control, has recently announced that Siri (their voice assistant) allows apps to be accessed via voice control shortcuts. Apple’s equivalent the ASK is SiriKit for developers. Apple now has a new layer called “Your App Services” hosted on their cloud (but the architecture keeps app code within the app and not on the server).
The downside of the chosen deep integration into AVS is also obvious:
- Lose your internet connection and all voice services and devices become unusable
- Concerns about privacy due to the amount of data captured on Amazon’s servers. I have been arguing for a while that these type of concerns are the biggest risks to the platform business model and that they remain often dormant until something triggers them to surface
(5) Network effects
If you are a regular reader of my posts, you know by now of the importance and power of network effects in combination with the platform business model. The architecture of the Alexa platform enables powerful network effects of which I want to call out only a few:
- Direct network effects on the user’s side: as users add devices to their personal Alexa ecosystem, so grows its value. This is also an example of indirect cross-side network effects. Moreover, as the number of users increases, there is more incentive to develop devices due to the potential customer base
- Direct data network effects on Amazon’s side: as more users use Alexa voice control it gives Amazon machine learning algorithms more data to improve the voice recognition algorithms, in turn, enhancing the customer experience (and providing a consistent experience as opposed to every device maker coming up with their own voice control philosophy)
Many factors will determine who will succeed in the smart assistant race. Among those factors are product-related one and non-product related ones (such as business model, marketing, etc).
Among the product-related attributes two of the most important determinants are
- the user experience which will be enabled by the Alexa Voice Services, esp how well the voice recognition works
- the customer value proposition which will be influenced by the skills and devices available
More network effects
“One of the things that made Alexa so attractive to me is that once you have a device in the market, you have the resource of feedback. Not only the customer feedback, but the actual data that is so fundamental to improving everything—especially the underlying platform,” says Ravi Jain, an Alexa VP of machine learning “So as more people used Alexa, Amazon got information that not only made that system perform better but supercharged its own machine-learning tools and platforms—and made the company a hotter destination for machine-learning scientists. The flywheel was starting to spin.”
If you want a leading indicator of what Amazon is focusing on in terms of Alexa’s development, look no further than the advertised jobs. Here are some of the leading categories (as of Oct 2018):
- Alexa AI: 180 open jobs
- Alexa shopping: 155
- Alexa Engine: 140
- Alexa Skills: 139
- Alexa Voice Services: 90
- Alexa Communications: 88
- Alexa Information: 73
- Alexa Smart Home: 62 open job
- and more
This is on top of the >5,000 people reported to be working on Alexa as of Sept 2017.
(6) Business models
Now, we are equipped with all we need to know to understand their business models. There are some overlap and some complementary elements to what I have said about Amazon’s Kindle platform business model and Fire TV/Prime Video business model. Thus, I will be focusing on the differences.
Stimulation of consumption: a whole new shopping experience?
The most obvious business model element one is shown in the figure below.
You may have noticed in the open jobs statistics above that Amazon Shopping ranks second. As I mentioned further above, Alexa’s shopping functionality is limited at this stage. There is still massive potential.
It is one thing to repeat order groceries but browsing for books, apparel, furniture or electronics is a different use case. This will be an interesting test for voice control as browsing for products, especially on a tablet, is a naturally fitting tactile experience. People don’t mind spending some time to do so. In this use case, clicking, tapping, swiping is as such not a big time consumer and it may not be easy to voice control such process.
It is one of the things where tablets, smartphones, laptops are well suited. Trying to just mimic the known browsing experience may not do the job. It might be necessary to change the whole presentation and navigation layer of the shopping experience (e.g. it might be more useful to have different types of pages for voice controlled shopping pages – similar to having different versions of the pages for desktop and mobile but potentially with more far-reaching differences). Amazon has already display card API functionality (and UX guides) that they can build upon and even use 3rd party screen devices via this API.
Due to their network, infrastructure, shopping value proposition Amazon is well-positioned to succeed on this important vertical.
Here’s now a quick summary of business models enabled, affected or enhanced by the Alexa platform:
- Stimulation of consumption: Eases greatly shopping of previously purchased products, currently works well for Prime Now groceries (“Alexa, buy 1kg oranges”). Amazon is working on expanding this. Alexa may also bring new customers into Amazon and or the Prime subscription
- Bundling/cross-/up-sell: all sorts of bundling, cross- and up-selling of Alexa-enabled devices with Echo, e.g. smart light bulb + smart plug + Echo Dot, smart thermostat and Echo Dot, and many other bundles; other implied bundles are Alexa and Prime subscription expanding available functionality
- Partnering business model: The integration of the Alexa interface into all sorts of devices is a partnering business model for the purpose of expanding the ecosystem (=Alexa’s value proposition) and gaining a larger installed base
- Razor-and-blade model v2.0: The required hardware is sold (likely) at low/no margin (Echo Plus 2G = $149; Apple HomePod = $349) in order to gain an installed base and to capture data that improves the voice recognition through machine learning
- Online retail business model: Well there is always the classic sales model for all the hardware components of the Echo and 3rd party Alexa-enabled devices
- Platform business model: a multi-sided platform with various supply-side actors increasing the value proposition and leveraging various network effects
- Premium/extensions retail model: more premium versions of the hardware at higher price points (e.g. Echo Plus) or other items
- Subscription business model: Amazon is exploring some subscription models e.g. the cloud security cam. Now, it does not necessarily require Alexa for this business model but there is the appeal of a smart home and also the ease of use (“Alexa drop in show security cam” which would display it on the screen
- Pay-per-use: Haven’s said much about this but the Amazon Web Services Lambda (which is a code execution platform paid for by usage only) is the way to host the server side Alexa-device code in the cloud. Basically affects all Alexa skills thus Alexa-enabled 3rd party devices
There are many strategic considerations at play in conjunction with Alexa. It is not easy to predict which one will be the most important one.
- Traffic shift from other devices (laptop, pc, mobiles) to voice-controlled devices, due to a new user experience for some(?) or many(?) fields of application
- Various forms of direct and indirect network effects of the Alexa platform
- Multi-homing vs single-homing: will any of the smart assistant platforms become the predominant one (i.e. winner-take-most?) or, more likely, will they become a complementary way of accessing their set of value propositions (=use cases). I.e. people may be using Alexa for certain use cases and Google assistant for other use cases (related to smartphones, search) and Cortana yet for other (e.g. in relation with Microsoft office use cases). In this context, how will switching barriers be designed?
- How and by whom will the smart assistant service be monetised (other than through data)? Where will the value be captured?
With all this said, I’d like to wrap up with a brief comparison among the three Amazon digital platform business models that we looked at over the last few weeks:
- Kindle: A relatively open platform with a high share of independent self-publishing authors
- Prime Video: Quite a strongly controlled platform where most of the content is owned/licensed by generally large commercial players who have many alternate distribution channels for their content
- Alexa: A platform fuelled by various network effects with the value proposition enhanced by the ecosystem of smart devices and a customer experience driven by the voice control
I hope you have found this journey as intriguing as I did!
This article by Murat Uenlue is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.