How to use Google speakable schema markup

how to use google speakable schema

The Google speakable markup schema is structured data — currently in beta testing by Google — to designate the best sections of a website or article for text to speech (TTS) playback by voice assistant devices. You can use it to signal to play a certain word, sentence or paragraph.

Keep in mind that Google or any other TTS tool may choose to play any section of the website it may find a better fit, though. You can follow this thread on GitHub to see how engineers discuss the latest TTS schemas trends.

You may wonder why this type of schema markup may matter for businesses. If you create and enable speakable schemas for your B2B resource center, anyone searching for answers to questions will hear your content as the response. And for B2C businesses, as end users often seek products, solutions and answers to their problems, speakable schemas will signal Google what to play.

These latest search engine optimization (SEO) developments resonate with the increasing popularity of audio content. According to Edison Research, three out of four Americans over 12 years old have listened to online audio in the previous month.  

Why did Google release the speakable markup schema?

Google released speakable structured data with the purpose to help webmasters control the content that will play on audio devices for TTS purposes. For now, speakable markup schemas work with Google Assistant only. However, other applications may pick up and play these sections as well.

Speakable schemas will allow web publishers to annotate sections of their content that are recommended for audio playback, enabling better accessibility, user experience and voice-based search experiences.

☝️Interested in all the benefits of structured data? Read our ultimate guide on schema markups.

Sample Google speakable markup schema

Our table below shows all the variables for this structured data, and which of them are required for the schema to be complete. They are mentioned along with their respective purposes so you can understand the reasoning:

PropertyIs this required?Purpose
NameYesThis property will contain the page name for the schema
SpeakableYesThis property will sub-house the type, css selector or xPath properties
TypeYesThis property will contain “SpeakableSpecification” by default to signal the correct schema type
cssSelectorYes - or xPath (1 only)This property will address the content in annotated pages
xPathYes - or cssSelector (1 only)XPath is a language for navigating and selecting elements in XML or HTML documents.

Sample schema:

“@context”: “”,
“@type”: “WebPage”,
“name”: “Sample Website Name – Blog Post”,
“@type”: “SpeakableSpecification”,
“cssSelector”: [“headline”, “summary”]
“url”: “”

What is the best Google schema markup generator tool?

The best tool to generate schema for speakable markups is the syntaxes provided by Presently, there is no tool to generate schemas automatically, so you will have to do this manually.

What sections of an article should be marked with speakable schema?

You will have to mark out the sections of an article that you think are the best for answering users’ possible questions or for the general content of the webpage.

Focus on intent and the general questions being answered. There is always a single paragraph or two that are critical for users. These should surely be marked.

Types of businesses that would benefit from speakable structured data markup

Here are the top types of businesses that could benefit from using speakable schemas for the text content on their web pages:


If you’re wondering about reservation policies, recipes or trends in the food industry (assuming the restaurant’s blog is covering these topics), the speakable schema will allow the easy playback of these items. 

See a restaurant’s reservation policy example below. You can mark it up for users making a voice search.

restaurants using google speakable schema

Parcel shipping industry

Any questions around tracking, last mile tracking, shipping and returns answered and marked up by the schema will be played to the user. For example, you could make up the following text for voice enablement:

post-purchase company using speakable schema


There’s nothing like a morning coffee paired with news content read aloud to you. In fact, more and more people are getting their morning briefings from smart home devices. And research from Reuters Institute indicates that with 80% of media publishers are planning to invest more in audio content to meet this demand.

SaaS businesses

Software as a service (SaaS) tools can be around productivity, task management, cybersecurity, inventory management and more. Google’s schemas will help mark up any questions users have about all these businesses’ value props, considering that there is text on page to answer them.

Customer service 

If your users search for definitions, key traits and examples of customer service, the marked-up playback will be played to them. Below you can see how Zendesk could mark up the summary of their blog “What is customer service” and have voice assistant play that initial section as the answer.

zendesk customer service google speakable schema markup

🗣️Read our guide to learn how text to speech tools enhance B2B marketing strategies.

Pair indirect voice assistant playback with live audio listening

With direct listening integration provided by our TTS tool, your website’s users will be enabled to listen to the precise content of their choice, with the ability to fast forward or play any singular heading they want.See below how your SaaS business can use our TTS tool and learn about its features:

SaaS and TTS: Implementing text-to-speech for data-based user analysis

With Productive Shop’s TTS tool for WordPress and other CMS platforms, you can integrate a playback module within your page. 

productive shop's tts tool

TTS features

Productive Shop’s text-to-speech tool has the following features for your B2B and B2C website:

  • Aesthetics: Our tool allows for complete visual customization of the player to match your brand’s guidelines. 
  • Easy integration with WordPress: After you upload the plugin, individual audio snippets will be applied to posts.
  • Easy embedding within any other CMS: We are compatible with Webflow, Contentful, Sanity, Drupal etc.
  • Content tagging: Our tool also allows individual tagging of the audio files for easy content management. You can manage multiple sets of content based on your topical clusters. 
  • Premium AI voices to match the perfect voice for your brand. 
  • Audio segments: Do you have many headings on your page? No problem. TTS automatically recognizes them and lets users select or skip the heading to listen to.
  • Analytics: Our TTS solution will measure the number of playbacks of a given file, for which articles this playback was made, demographics of users who are playing etc. 
  • Multiple users: This feature enables multiple marketing team members to be on the platform simultaneously to manage content.
  • Easy backups: TTS is a standalone app. If you migrate to another website platform (e.g., from WebFlow to WordPress), all your audio files that correspond to blogs will not be lost.

 👉 Easily add an audio widget to your content with the TTS tool we’ve built for growing businesses. Get started now.

Frequently Asked Questions

What are the benefits of the audio widget for UX?

Users love to skim content. They’re more likely to leave the site when they see an endless scrollbar. The audio tool is your last chance at user retention as busy users rather listen to a blog post versus reading it.

Let’s fly through the benefits of audio widgets for UX: 

  • Increased engagement: By providing an audio option, blogs can captivate and engage users differently. Combining visual and auditory elements can create a richer and more immersive experience, making the content more engaging and memorable.
  • Enhanced accessibility: The audio widget allows users with visual impairments or reading difficulties to access blog content more easily. Converting text into speech enables individuals to consume information through auditory means, making the content more inclusive and accessible to a wider audience.
  • Improved multitasking: TTS enables users to listen to blog posts while performing other tasks, such as driving, exercising or working. It frees them from the need to focus solely on reading, providing a convenient and time-saving option for consuming content.
  • Personalized experience: Audio widgets allow users to choose between reading or listening, catering to individual preferences. Users can switch between text and audio modes based on their preference, creating a more tailored and enjoyable browsing experience.
  • Better language comprehension: Listening to text can enhance language comprehension, allowing users to hear the pronunciation, intonation and emphasis that may not be conveyed through reading alone. Audio content can particularly benefit users learning a new language or those who prefer auditory learning styles.

What are the benefits of TTS for accessibility?

Text to speech (TTS) tools provide accessibility by converting text into spoken words, aiding individuals with visual impairments or learning disabilities. They also enhance multitasking capabilities as users can listen to content while engaging in other activities, promoting efficiency and convenience.

What is cssSelector?

cssSelector is a pattern-based syntax used in web development to target and select specific HTML elements for styling or interaction purposes. It allows precise control over element selection and is commonly employed in conjunction with CSS to apply styles or perform actions on specific elements within a web page.

What is xPath?

xPath is a language used in web development to navigate and locate elements in XML or HTML documents based on their structure, attributes or text content. It provides a powerful way to address and extract specific elements for manipulation or automation purposes.

Can speakable schema be multilingual? 

Yes. Speakable schema handles multilingual content by associating the spoken representation of the content with its corresponding language code. Webmasters can mark up different language versions of their content using the “speakable” property and provide the appropriate language code for each section. This way, voice assistants can deliver the content in the user’s desired language.

Momin M

Momin Malik

Momin Malik is Senior SEO Consultant and Project Manager with experience in optimizing search engine rankings for B2B SaaS clients. He believes a deep understanding of search engine algorithms and data-driven strategies is important to drive measurable results. Here he posts his musings to help viewers understand Search and manage SEO and Web projects.

Get the latest blog updates from Productive Shop! Subscribe to our blog: