PART 1 - Voice System FoundationsChapter 1: Say Hello to Voice SystemsChapter goal: Introduce the reader to voice-first technology, its core concepts, and typical phases of development through an explanatory background for the current state and challenges of voice.No of pages - 20
Sub-topics
1. Voice-first, voice-only, and conversational everything2. Introduction to voice technology components (Speech to text, Natural languageunderstanding, Dialog management, Natural language generation, Text to speech)3. The phases of voice development success (Plan, Design, Build, Test, Deploy &Assess, Iterate)4. Hope is not a strategy - but to plan & execute is
Chapter 2: Keeping Voice in MindChapter goal: Explain to the reader how humans and computers talk and listen.What's easy and hard for the human user and the technology in a dialog, and why.No of pages - 15Sub-topics1. Why voice is different2. Hands-on: A pre-coding thought experiment3. Voice dialog and its participants* The Human: spoken natural language understanding* The Computer: voice system recognition and interpretation* Human-computer voice dialog - Successful voice-first development is all aboutcoordinating human abilities with the technology to allow conversations betweentwo very different dialog participants.
Chapter 3: Running a Voice Implementation-and Noticing IssuesChapter goal: Allow the reader to put into practice their newly learned foundation byimplementing and running a simple voice application in the Google Assistant framework,and experiencing how quickly even a simple voice interaction needs improvement.No of pages - 15Sub-topics1. Hands-on: Preparing a restaurant finder2. Introducing voice platforms3. Hands-on: Implementing the restaurant finderBasic setup, Specifying a first intent, Doing something, What the user says,What the VUI says, Connecting Dialogflow to Actions on Google, Testingthe app, Saving the voice interaction4. Google's voice development ecosystem, and why we're using it here5. The pros and cons of relying on tools6. Hands-on: Making changes - testing and iterating (Adding phrases to handle the same meaning, additional content, and more specific)
PART 2 - Planning Voice System InteractionsChapter 4: Defining your Vision: Building What, How, and Why for WhomChapter goal: Introduce voice-focused requirement discovery, highlighting differencesfrom other modalities and devices and showingNo of pages - 25Sub-topics1. Functional requirements: What are you building? (General and detailed functionality)2. Non-functional business requirements: Why are you building it? (Purpose, underlyingservice and existing automation, branding and terminology, data needs, access andavailability, legal and business constraints)3. Non-functional user requirements: Who will use it and what do they want? (Userpopulation demographics and characteristics, engagement patterns, mental modelsand domain knowledge, environment and state of mind)4. Non-functional system requirements; How will you build it? (Available options forrecognizer, parser, and interpreter, external data sources, data storage and data access, other system concerns)
Chapter 5: From Discovery to UX and UI Design: Tools of the Voice-First TradeChapter goal: Show how to turn discovery findings into high-level architectural designs,using flows diagrams, sample dialogs, and detailed dialog management specs.No of pages - 20Sub-topics1. Where to find early user data on any budget (online research, crowd sourcing, dialogparticipant observation, focus groups, interviews, and surveys)2. How discovery results feed into VUI design decisions (dialog manager graphs)3. Capturing and documenting VUI design (dialog flows, sample dialogs, detaileddesign specifications, VUI design documentation approaches)4. Prototyping and testing your assumptions (early voice UX and prototypingapproaches)
PART 3 - Building Voice System InteractionsChapter 6: Applying Human 'Rules of Dialog' to Reach Conversation ResolutionChapter goal: Learn that voice-first dialogs have resolutions. Learn how to design andimplement fully specified requests in the 3 core dialog types: question-answer, actionrequests, and task completion requests.No of pages - 30Sub-topics1. Dialog acts, games and turns - and Grice2. Question answering3. Action requests4. Task completion requests5. Fully specified request (Single slot and Multi-slot requests)6. Determining dialog acts based on feature discovery7. Dialog completion (Responding to 'goodbye' and 'thanks')
Chapter 7: Resolving Incomplete Requests Through DisambiguationChapter goal: Explain how to handle incomplete and ambiguous requests, includingcommon disambiguation methods (yes/no, A/B sets, lists and menus) and when to apply each.No of pages - 30Sub-topics1. Incomplete requests - how to reach completeness2. Ambiguous requests3. Disambiguation methods (Logic-based assumptions, Yes/No questions, A/B sets,Static lists, Dynamic lists, Open sets, Menus)4. Testing on the device to find and solve issues5. Toward code independence: using webhooks (fulfillment, contexts, contextparameters, and follow-up)
Chapter 8: Conveying Reassurance with Confidence and ConfirmationChapter goal: Teach the importance of conveying reassurance and how to apply different confirmation strategies. Introduce discourse markers and backchannels.No of pages - 30Sub-topics1. Conveying reassurance and shared certainty - Setting expectations2. Webhooks, Take 2 (Dialogflow system architecture, webhook request and response,Implementing the webhook)3. Confirmation methods (Non-verbal confirmation, Generic acknowledgment, Implicitand Explicit confirmations)4. Confirmation placement - confirming slots versus intents5. Disconfirmation: dealing with no6. Additional reassurance techniques and pitfalls (System pronunciation, Backchannels,Discourse markers, VUI architecture)7. Choosing the right reassurance method
Chapter 9: Helping Users Succeed Through ConsistencyChapter goal: Explore how to navigate an audio interaction that is by nature fleeting andsequential. Provide design and implementation that incorporates consistency throughcorrectly scoped global commands, landmarks, non-verbal audio.No of pages - 20Sub-topics1. Universals (Uses: clarification and additional information, allow a do-over, providean exit)2. Navigation (Landmarks, Non-verbal audio, Content playback navigation, Listnavigation)3. Consistency, variation and randomization (built-in global intents, consistency acrossVUIs and frameworks
Chapter 10: Creating Robust Coverage for Speech-to-Text ResolutionChapter goal: Teach the nuts and bolts of the computer-side of listening, starting withthe mapping of sounds to words and how to create solid synonym coverage. Topicsinclude different approaches to recognition, including regular expressions and statisticalmodels, dictionaries, domain knowledge, normalizing, and bootstrapping.No of pages - 25Sub-topics1. Recognition is speech-to-text interpretation2. Recognition engines3. Grammar concepts (Coverage, Recognition space, Static or dynamic, End-pointing,Multiple hypotheses)4. Types of grammars (Rule-based grammars, Statistical models, Hot words, Wakewords and invocation names)5. Working with grammars (Writing rule-based regular expressions)6. How to succeed with grammars (Bootstrapping, Normalizing punctuation andspellings, Handling unusual pronunciations, Using domain knowledge, the strengthsand limitations of STT)7. A simple example (Sample phrases in Dialogflow, Regular expressions in thewebhook)8. Limitations on grammar creation and use
Chapter 11: Reaching Understanding Through Parsing and Intent ResolutionChapter goal: Explore the second part of computer listening: interpreting the meaning.Topics cover intent resolution, parsing and multiple passes, the use of tagging guides and middle layers.No of pages - 20Sub-topics1. From words to meaning (NLP, NLU)2. Parsing3. Machine learning and NLU4. Ontologies, knowledge bases and content databases5. Intents (Intent tagging and tagging guides, Middle layers: semantic tags versus systemendpoints)6. Putting it all together (Matching wide or narrow, Multiple grammars, multiple passes)7. A simple example (The Stanford Parser revisited, Determining intent, Machinelearning and using knowledge)
Chapter 12: Applying Accuracy Strategies to Avoid MisunderstandingsChapter goal: Explain how misunderstandings happen and how to avoid them throughtechniques that minimize errors and the need to start over. Topics include design andimplementation of a wide set of robustness techniques, including powerful advancedtechniques.No of pages - 25Sub-topics1. Accuracy robustness underlying concepts2. Accuracy robustness strategies (Examples, Providing help, Just-in-time information,Hidden options and none of those, Recognize-and-reject, One-step-correction,Tutorials, Spelling, Narrowing recognition space)3. Advanced techniques (Multi-tiered behavior and confidence scores, N-best and skiplists, Probabilities, Contextual latency)
Chapter 13: Choosing Strategies to Recover from MiscommunicationChapter goal: Explore how to recover when miscommunication happens. Show how torecover and get users back on track quickly, and when to stop trying. Topics includedesign and implementation of several recovery strategies.No of pages - 15Sub-topics1. Recovery from what?2. Recovery strategies (Meaningful contextual prompts, Escalating prompts, Taperedprompts, Rapid reprompt, Backoff strategies)3. When to stop trying (Max error counts, Transfers)4. Choosing recovery strategy (Recognition, intent, or fulfillment errors)
Chapter 14: Using Context and Data to Create Smarter ConversationsChapter goal: Explain why context is king in spoken conversation. Show how to accessand update data from various sources, and how to use that data within and across dialogs to create smarter interactions. Topics focus on how to design and implement context aware dialogs using anaphora, proactive behaviors, proximity, geo-location, domain knowledge, and other powerful methods.No of pages - 25Sub-topics1. Why there's no conversation without context2. Reading and writing data (External accounts and services)3. Persistence within and across conversations4. Context-aware and context-dependent dialogs (Discourse markers andacknowledgments, Anaphora resolution, Follow-up dialogs and linked requests,Proactive behaviors, Topic, domain and world knowledge, Geo location-basedbehavior, Proximity and relevance, Number and type of devices, Time and day, Useridentity, preferences and account types, User utterance wording, System conditions5. Tracking context in modular and multiturn dialogs
Chapter 15: Creating Secure Personalized ExperiencesChapter goal: Cover personalization and customization. Topics include identification,authentication, privacy and security concerns, system persona audio, and working withTTS versus recorded prompts.No of pages - 25Sub-topics1. The importance of knowing who's talking2. Individualized targeted behaviors (Concepts in personalization and customization,Implementing individualized experiences3. Authorized secure access4. Approaches to identification and authentication (Implementing secure gated access)5. Privacy and security concerns6. System persona (Defining and implementing a system persona, How persona affectsdialogs7. System voice audio (TTS or voice talent, generated or recorded, Finding and workingwith voice talents, One or several voices, Prompt management)8. Emotion and style9. Voice for specific user groups
PART 4 - Verifying and Deploying Voice System InteractionsChapter 16: Testing and Measuring Performance in Voice SystemsChapter goal: Explain the do's and don'ts of QA testing a voice system. Topics includeuser testing methods that work best for voice, the code needed to support them, and how to improve system performance based on findings.No of pages - 20Sub-topics1. Testing voice system performance (Recognition testing, Dialog traversal: functionalend-to-end testing, Wake word and speech detection testing, Additional systemintegration testing)2. Testing usability and task completion (Voice usability testing concepts, Wizard of Ozstudies)3. Tracking and measuring performance (Recognition performance metrics, Taskcompletion metrics, User satisfaction metrics)
Chapter 17: Tuning and Deploying Voice SystemsChapter goal: Show how to improve, or tune, voice solutions before and after deploying a voice system. Teach what real user data says about the system performance, what to log and track, how to measure accuracy, and how to interpret the data.No of pages - 25Sub-topics1. Tuning: what is it and why do you do it? (Why recognition accuracy isn't enough,Analyzing causes of poor system performance)2. Tuning types and approaches (Log-based versus transcription-based tuning, Coveragetuning, Recognition accuracy tuning, Finding and using recognition accuracy data,Task completion tuning, Dialog tuning, Prioritizing tuning efforts)3. Mapping observations to the right remedy (Reporting and using tuning results)4. How to maximize deployment success (Know when to tune, Understand tuningcomplexities to avoid pitfalls)