{"id":11895,"date":"2020-11-02T15:24:36","date_gmt":"2020-11-02T14:24:36","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=11895"},"modified":"2022-01-05T20:06:14","modified_gmt":"2022-01-05T19:06:14","slug":"voice-control","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/","title":{"rendered":"Voice Control: Building Your Voice Assistant"},"content":{"rendered":"\n\n\t\t\t\t<div class=\"wp-block-uagb-table-of-contents uagb-toc__align-left uagb-toc__columns-1  uagb-block-e52a53a3      \"\n\t\t\t\t\tdata-scroll= \"1\"\n\t\t\t\t\tdata-offset= \"30\"\n\t\t\t\t\tstyle=\"\"\n\t\t\t\t>\n\t\t\t\t<div class=\"uagb-toc__wrap\">\n\t\t\t\t\t\t<div class=\"uagb-toc__title\">\n\t\t\t\t\t\t\tTable Of Contents\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"uagb-toc__list-wrap \">\n\t\t\t\t\t\t<ol class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#speech-recognition-in-the-real-world\" class=\"uagb-toc-link__trigger\">Speech recognition in the real world<\/a><li class=\"uagb-toc__list\"><a href=\"#the-search-for-voice-control\" class=\"uagb-toc-link__trigger\">The search for voice control<\/a><li class=\"uagb-toc__list\"><a href=\"#a-practical-voice-control-implementation\" class=\"uagb-toc-link__trigger\">A practical voice control implementation<\/a><li class=\"uagb-toc__list\"><a href=\"#going-beyond-simple-voice-recognition\" class=\"uagb-toc-link__trigger\">Going beyond simple voice recognition<\/a><\/ol>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\n\n\n<p><span id=\"urn:enhancement-47c3b3a\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speaker_recognition\">Voice control<\/span> was the stuff of science fiction throughout the 20th Century. But in the last two decades, voice control has entered the mainstream. Voice assistants like <span id=\"urn:enhancement-5ef88194\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/siri\">Siri<\/span> and <a href=\"https:\/\/www.codemotion.com\/magazine\/dev-hub\/machine-learning-dev\/how-to-teach-alexa-to-pronounce-your-name\/\" class=\"ek-link\">Alexa<\/a> are embedded in home devices, headphones, and even cars.&nbsp;<\/p>\n\n\n\n<p>But, how did we get to this point? What is the connection with <strong>machine <span id=\"urn:enhancement-6860b6a0\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learning<\/span><\/strong> at the network edge? And how can you create your own voice-activated <span id=\"urn:enhancement-2ab67dc9\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/edge_device\">edge device<\/span>? This article, part of a series on machine <span id=\"urn:enhancement-19193f45\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learning<\/span> at the edge, answers all these questions and more.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-speech-recognition-in-the-real-world\">Speech recognition in the real world<\/h2>\n\n\n\n<p><strong><span id=\"urn:enhancement-bda4862f\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speaker_recognition\">Voice control<\/span><\/strong> always fascinated futurologists and Sci-Fi authors alike. Back when it was first proposed, it must have seemed like a distant dream. But over the past decade, voice control has become routine and mainstream. This is thanks to a combination of <span id=\"urn:enhancement-2bb28cde\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/risk_factor_computing\">factors<\/span>.&nbsp;<\/p>\n\n\n\n<p>Advances in <em><span id=\"urn:enhancement-dcdb9fe5\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span><\/em> and <em><span id=\"urn:enhancement-7990bbb4\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/natural_language_processing\">natural language processing<\/span><\/em>, the availability of powerful <span id=\"urn:enhancement-2063998d\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computers<\/span> for <span id=\"urn:enhancement-2baee8c4\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/machine_learning\">machine learning<\/span>, and the growth in high-power edge-devices. Nowadays, we can see examples of voice control all around us.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-virtual-voice-assistants\">Virtual voice assistants<\/h3>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img decoding=\"async\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/foto_blog_69-1024x725.jpg\" alt=\"virtual assistant: chatbot\" class=\"wp-image-8070\"\/><\/figure><\/div>\n\n\n\n<p><strong>Virtual assistants<\/strong>, such as <span id=\"urn:enhancement-45dd0c65\" class=\"textannotation disambiguated wl-organization\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/alexa_internet\">Alexa<\/span>, <span id=\"urn:enhancement-3bb19357\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/siri\">Siri<\/span> and <span id=\"urn:enhancement-ff08abfe\" class=\"textannotation disambiguated wl-organization\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/google\">Google<\/span> <span id=\"urn:enhancement-6e7c0685\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/google_now\">Assistant<\/span>, have driven a huge take-up in voice <span id=\"urn:enhancement-de0cb81b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/control_system\">control systems<\/span>. In essence, a virtual assistant listens to your instructions and acts on these. For <span id=\"urn:enhancement-88cfe33d\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/instance_computer_science\">instance<\/span>, you can ask it to play music, to tell you the weather, or to navigate you to your destination.&nbsp;<\/p>\n\n\n\n<p>In general, these virtual assistants all work similarly. They require a suitable <span id=\"urn:enhancement-9248401\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/edge_device\">edge device<\/span> with network connectivity and a powerful <span id=\"urn:enhancement-2d03c834\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/front_and_back_ends\">backend<\/span>. Typically, the <span id=\"urn:enhancement-576a62ec\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/edge_device\">edge device<\/span> may be a smartphone, a smart speaker, or, increasingly, some other <span id=\"urn:enhancement-d0f3ad67\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/information_appliance\">device<\/span> like a TV or pair of headphones.&nbsp;<\/p>\n\n\n\n<p>The <span id=\"urn:enhancement-e05f06b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/edge_device\">edge device<\/span> just tries to detect a \u201cwake word\u201d. It sends the <span id=\"urn:enhancement-5d714ea2\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/voice_message\">voice message<\/span> to the <span id=\"urn:enhancement-dd9e077c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/front_and_back_ends\">backend<\/span> for <span id=\"urn:enhancement-8bec8b82\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_processing\">processing<\/span> and then handles the result that gets returned. This process clearly depends on a good <span id=\"urn:enhancement-e625b57b\" class=\"textannotation disambiguated wl-organization\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/internet_access\">Internet connection<\/span>. However, recent improvements in edge technology mean more and more <span id=\"urn:enhancement-e4ff584c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/function_engineering\">functionality<\/span> can be kept on the <span id=\"urn:enhancement-7760fc4d\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/information_appliance\">device<\/span>.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-cars\">Cars<\/h3>\n\n\n\n<p>Driver distraction is one of the biggest causes of deaths and injuries on our roads. As a result, car manufacturers have invested billions into driver aids designed to help reduce distractions. One of the most powerful is adding voice control to cars. This allows the driver to interact with the infotainment system in a totally <span id=\"urn:enhancement-162e3edc\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/handsfree\">hands-free<\/span> manner.&nbsp;<\/p>\n\n\n\n<p>Unlike the voice assistants above, such <span id=\"urn:enhancement-fb50c000\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> cannot rely on having network connectivity. As a result, all the <span id=\"urn:enhancement-7955eede\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">voice recognition<\/span> and processing must be done within the <span id=\"urn:enhancement-a64efc9e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/edge_device\">edge device<\/span>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-the-search-for-voice-control\">The search for voice control<\/h2>\n\n\n\n<p>As mentioned above, voice control grew out of advances in <span id=\"urn:enhancement-941fa029\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> and <span id=\"urn:enhancement-26caf14e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/natural_language_processing\">NLP<\/span> along with increased <span id=\"urn:enhancement-d94182d8\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer_performance\">computing power<\/span>. Creating functional voice control required <span id=\"urn:enhancement-b87ab50e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer_science\">computer scientists<\/span> to solve a number of problems.&nbsp;<\/p>\n\n\n\n<p>Firstly, how do you record a person speaking and convert this into text? Next, how do you parse that text to extract the meaning? Finally, how do you work out the correct response?&nbsp;<\/p>\n\n\n\n<p>These problems have interested <span id=\"urn:enhancement-f46b6b6a\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer_science\">computer scientists<\/span> since long before the invention of the modern <span id=\"urn:enhancement-c5d8c949\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computer<\/span>. Indeed, the idea of teaching <span id=\"urn:enhancement-4604d156\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computers<\/span> to understand humans dates back to the earliest days of <span id=\"urn:enhancement-c9b9113b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computing\">computing<\/span>. Each of these problems required a different solution<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-speech-recognition\">Speech recognition<\/h3>\n\n\n\n<p>In the 1990s, more and more people got access to <span id=\"urn:enhancement-bfd4c89b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computers<\/span> at work and at home. Few of these people were able to type though. So, a lot of effort was invested in creating <span id=\"urn:enhancement-65231dd\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> that would allow a human to dictate to a <span id=\"urn:enhancement-c22c1d2b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computer<\/span>.&nbsp;<\/p>\n\n\n\n<p>This process of converting your voice into <span id=\"urn:enhancement-3fde9c3a\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/word_computer_architecture\">words<\/span> on the screen is known as speech to text or <span id=\"urn:enhancement-f584dbb\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span>.&nbsp;<\/p>\n\n\n\n<p>The earliest <span id=\"urn:enhancement-70d928e7\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> systems were created in the 1950s. They were able to distinguish single spoken digits. However, it took until the late 1960s until <span id=\"urn:enhancement-5603090e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> became a serious research area. By the 1970s, <span id=\"urn:enhancement-938cba48\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> were being developed that could recognise longer <span id=\"urn:enhancement-57798ffa\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/word_computer_architecture\">words<\/span> and even phrases.<\/p>\n\n\n\n<p>The real breakthrough came with the <span id=\"urn:enhancement-ca6ddfc0\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/application_software\">application<\/span> of hidden Markov models to the problem. By 1987, this resulted in the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Katz%27s_back-off_model\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Katz back-off model<\/a>, which allowed practical <span id=\"urn:enhancement-6b55a1d2\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> on <span id=\"urn:enhancement-9a9d3026\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computers<\/span> or specialised processors.<\/p>\n\n\n\n<p>Throughout the next decade, <span id=\"urn:enhancement-b029eafc\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> got better and better at <span id=\"urn:enhancement-58f524ec\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> across multiple languages. By the mid-1990s, the <span id=\"urn:enhancement-2aa9413d\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/technology\">technology<\/span> had advanced sufficiently for companies to start selling commercial <span id=\"urn:enhancement-287c4680\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> systems.&nbsp;<\/p>\n\n\n\n<p>Early <span id=\"urn:enhancement-ba7d234f\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> often relied on training the <span id=\"urn:enhancement-ffcaa72e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">system<\/span> to recognise a single voice. This was achieved by asking the <span id=\"urn:enhancement-3c2a8047\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/user_computing\">user<\/span> to read out a specific passage of text. This text included all the possible phonemes, <span id=\"urn:enhancement-5da9eade\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/part_of_speech\">parts of speech<\/span>, etc. to allow the <span id=\"urn:enhancement-7803c163\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">system<\/span> to <span id=\"urn:enhancement-c50dc7ee\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learn<\/span> that <span id=\"urn:enhancement-1ac4adba\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/user_computing\">user<\/span>\u2019s voice.&nbsp;<\/p>\n\n\n\n<p>More recently, we have seen <span id=\"urn:enhancement-930aa821\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/speech_recognition\">speech recognition<\/span> applying machine <span id=\"urn:enhancement-4158354c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learning<\/span> approaches to <span id=\"urn:enhancement-e68c89c4\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learn<\/span> to understand different accents. This avoids the classic problem where the <span id=\"urn:enhancement-d11f8eaa\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computer<\/span> is unable to understand someone with a strong accent. Modern <span id=\"urn:enhancement-78f04a0c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> are now able to understand multiple regional and national accents.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-natural-language-processing\">Natural Language Processing<\/h3>\n\n\n\n<p>Of course, just being able to write down what a person says is not enough. Voice control also requires the <span id=\"urn:enhancement-30468a28\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computer<\/span> to understand what is said. This is a much harder problem known as <span id=\"urn:enhancement-177b6523\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/natural_language_processing\">natural language processing<\/span> or <span id=\"urn:enhancement-75499a6f\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/natural_language_processing\">NLP<\/span> for short. Here, the <span id=\"urn:enhancement-daf6272b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computer<\/span> must learn what you actually meant. If you have learned a foreign language, you will know how hard this can be.&nbsp;<\/p>\n\n\n\n<p>The problem is, human language depends on all sorts of <span id=\"urn:enhancement-4a072105\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/risk_factor_computing\">factors<\/span>. Context, emotion, knowledge, and idiom all change the meaning of a sentence. Often, there are many ways to say the same thing, even for something as simple as making a phone call. \u201cI\u2019m phoning my parents\u201d, \u201cI\u2019m calling home\u201d, \u201cI must give my dad a call\u201d, etc. <span id=\"urn:enhancement-21dc5c3c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/natural_language_processing\">NLP<\/span> is the process of teaching a <span id=\"urn:enhancement-2a7d7347\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computer<\/span> about the structure and meaning of human language.<\/p>\n\n\n\n<p>For decades, <span id=\"urn:enhancement-81b97b6e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/natural_language_processing\">NLP<\/span> was a theoretical field. <span id=\"urn:enhancement-6721dd43\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">Computers<\/span> simply weren\u2019t powerful enough to solve the problem. Nowadays, <span id=\"urn:enhancement-b28ac3ea\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/computer\">computers<\/span> are getting better and better at it. This is largely down to improvements in machine <span id=\"urn:enhancement-ce1a47e3\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learning<\/span>, especially <a href=\"https:\/\/www.codemotion.com\/magazine\/tag\/deep-learning\/\" class=\"ek-link\">deep learning<\/a>. The latest approaches combine several different machine <span id=\"urn:enhancement-13932134\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learning<\/span> <span id=\"urn:enhancement-c621e89c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/technology\">technologies<\/span>.&nbsp;<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Supervised learning<\/strong> from large corpora of recorded and annotated speech.&nbsp;<\/li><li><strong>Reinforcement learning<\/strong> to improve performance based on human feedback.&nbsp;<\/li><li><strong>Transfer learning<\/strong> to allow data scientists to finetune existing models, such as <a href=\"https:\/\/www.codemotion.com\/magazine\/dev-hub\/machine-learning-dev\/bert-how-google-changed-nlp-and-how-to-benefit-from-this\/\" class=\"ek-link\">BERT<\/a>, <a href=\"https:\/\/allennlp.org\/elmo\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">ELMO<\/a>, or <a href=\"https:\/\/openai.com\/blog\/better-language-models\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">GPT-2<\/a>.<\/li><\/ul>\n\n\n\n<p>The resulting <span id=\"urn:enhancement-34ec7b89\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> can understand more and more human language.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-decision-making\">Decision making<\/h3>\n\n\n\n<p>The final requirement for a voice <span id=\"urn:enhancement-6718ed10\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/control_system\">control system<\/span> is deciding how to respond to the <span id=\"urn:enhancement-27925ff0\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/user_computing\">user<\/span>. In other <span id=\"urn:enhancement-6e5a04ec\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/word_computer_architecture\">words<\/span>, what action should the <span id=\"urn:enhancement-a1f771f8\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">system<\/span> actually take? There are many approaches for this. In simple <span id=\"urn:enhancement-6a69085f\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span>, you could use a rules engine. That is simply a list of actions to take given a set of input conditions.&nbsp;<\/p>\n\n\n\n<p>Many voice assistants use a variation of this. For <span id=\"urn:enhancement-eda73ea0\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/instance_computer_science\">instance<\/span>, <span id=\"urn:enhancement-47a2bce9\" class=\"textannotation disambiguated wl-organization\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/amazon-com\">Amazon<\/span> <span id=\"urn:enhancement-54ae910\" class=\"textannotation disambiguated wl-organization\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/alexa_internet\">Alexa<\/span> allows you to write your own Skills. Here, you are able to specify what you expect a <span id=\"urn:enhancement-fc47fb64\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/user_computing\">user<\/span> to say and provide the appropriate response.<\/p>\n\n\n\n<p>However, for voice assistants the range of possible instructions is completely open-ended. So, increasingly reinforcement learning and unsupervised <span id=\"urn:enhancement-d7751b19\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/learning\">learning<\/span> are used to allow the <span id=\"urn:enhancement-270c02c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">system<\/span> to react to the instructions it hears.&nbsp;<\/p>\n\n\n\n<p>So, now you understand a bit about how voice recognition and voice control work. But what about applying this in practice? For the rest of this article, I will explain how you can actually implement a simple voice controller.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-a-practical-voice-control-implementation\">A practical voice control implementation<\/h2>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"575\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control-1024x575.jpg\" alt=\"voice control\" class=\"wp-image-11898\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control-1024x575.jpg 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control-300x169.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control-768x431.jpg 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control-896x504.jpg 896w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control-400x225.jpg 400w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/voice-control.jpg 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>This example is based on a simple voice controller model that can recognise the words \u2018yes\u2019 and \u2018no\u2019. The model is created in <span id=\"urn:local-annotation-835135\" class=\"textannotation disambiguated\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/tensorflow\"><a href=\"https:\/\/www.codemotion.com\/magazine\/dev-hub\/machine-learning-dev\/tensorflow-furthers-the-development-of-machine-learning\/\" class=\"ek-link\">TensorFlow<\/a><\/span> and ported to <a href=\"https:\/\/www.codemotion.com\/magazine\/dev-hub\/machine-learning-dev\/interview-simone-scardapane-lets-all-discover-tensorflow-eager-and-tensorflow-lite\/\" class=\"ek-link\">TensorFlow Lite<\/a>, allowing it to run in low-power edge devices.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-hardware-requirements\">Hardware requirements<\/h3>\n\n\n\n<p>We are going to use an <a href=\"https:\/\/www.mouser.com\/new\/infineon\/infineon-xmc4700-eval-kits\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Infineon XMC4700<\/a> Relax development kit for the implementation. This kit is based on an ARM\u00ae Cortex\u00ae-M4 core running at 144MHz, with 325KB RAM and 2MB of flash memory. The board provides an Arduino shield header, making it easy to add peripheral devices once you have soldered headers into place.<\/p>\n\n\n\n<p>Obviously, since this is a voice controller, the first requirement is to add a microphone to the board. You can choose pretty much any Arduino shield with a microphone. I am going with the Infineon <a href=\"https:\/\/www.mouser.com\/new\/infineon\/infineon-s2go-memsmic-im69d\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">S2GO MEMSMIC IM69D Shield2Go<\/a>, which provides 2 MEMS microphones on an Arduino Uno shield. Before you can use the shield, you will need to solder on headers.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-software-requirements\">Software requirements<\/h3>\n\n\n\n<p>The software for this project can be found on <a href=\"https:\/\/github.com\/Mouser-Electronics\/TensorFlowLite-Infineon\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Mouser\u2019s GitHub<\/a>. Clone the repository and open the Software folder. Here, you will find all the files you need for the project. This includes:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The sample data for the commands \u2018yes\u2019 and \u2018no\u2019 (based on fast fourier transforms of captured speech samples)<\/li><li>The actual model for recognising the commands&nbsp;<\/li><li>A responder to handle the resulting decision<\/li><li>A main file to link all the pieces.<\/li><\/ul>\n\n\n\n<p>In addition, you will need a suitable development toolchain, such as the <a aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.arduino.cc\/en\/main\/software\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"><span id=\"urn:local-annotation-408377\" class=\"textannotation disambiguated\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/arduino_2\">Arduino<\/span> IDE<\/a> or <a aria-label=\" (opens in a new tab)\" href=\"https:\/\/infineoncommunity.com\/dave-download_ID645\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">Infineon DAVE IDE<\/a>. I\u2019m going with the Arduino IDE. This means I need to add the correct Infineon XMC Library.&nbsp;<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Go to <strong>Preferences<\/strong> and find the entry for \u201cAdditional Boards Manager URLs\u201d. Paste <em>https:\/\/github.com\/Infineon\/XMC-for-Arduino\/releases\/latest\/download\/package_infineon_index.json<\/em> and click OK.&nbsp;<\/li><li>Click <strong>Tools<\/strong> &gt; <strong>Board: \u201cArduino Uno\u201d<\/strong> &gt; <strong>Boards Manager<\/strong>. NB by default you will see \u2018Board: \u201cArduino Uno\u201d, but if you already used the IDE, you will see the latest board family you used.<\/li><li>Enter XMC in the search box and press Enter.&nbsp;<\/li><li>You should see an entry for \u201cInfineon\u2019s XMC Microcontroller\u201d. Click <strong>Install<\/strong>.<\/li><li>Once the install is completed, click <strong>Close<\/strong>.<\/li><\/ol>\n\n\n\n<p>Now you are able to set the correct board in the IDE. To do this, go to <strong>Tools<\/strong> &gt; <strong>Board: \u201cArduino Uno\u201d<\/strong> &gt; <strong>XMC Family<\/strong> &gt; <strong>XMC4700 Relax Kit<\/strong>.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"424\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/image1-1024x424.png\" alt=\"Implementing voice control on Arduino\" class=\"wp-image-11897\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/image1-1024x424.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/image1-300x124.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/image1-768x318.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/image1-1536x635.png 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/image1.png 1944w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>The last thing you need is the SEGGER J-Link software. This will allow you to access the onboard debugger and programmer on the XMC4700 Relax Kit. The software can be found <a href=\"https:\/\/www.segger.com\/downloads\/jlink\/#J-LinkSoftwareAndDocumentationPack\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">here<\/a>.&nbsp;&nbsp;<\/p>\n\n\n\n<p>With all that done, you are ready to actually build the project.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Building the project<\/h3>\n\n\n\n<p>Fortunately, the software you cloned from the Mouser GitHub repository has been written specifically for this board. This means compiling the software is pretty simple. First, connect the board to your computer using the debug Micro-USB port near the RJ45 jack. Go to <strong>Tools<\/strong> &gt; <strong>Port<\/strong> and make sure the correct port is selected.&nbsp;<\/p>\n\n\n\n<p><em>Note:&nbsp; You may need to identify which COM port the board is connected to. You can do this in Windows using the Device Manager. Or in MacOS type <\/em><em>ls \/dev\/tty.*<\/em><em> in the terminal and look for the correct port in the list.<\/em><\/p>\n\n\n\n<p>Second, you need to rename the main-functions.cc file to voice-control.ino. You also need to rename the Software folder to voice-control. This will allow the Arduino IDE to recognise this as a project.<\/p>\n\n\n\n<p>Third, open the renamed voice-control project in the Arduino IDE. Go to <strong>File<\/strong> &gt; <strong>Open<\/strong> and browse to the correct folder. Select voice-control.ino and click <strong>Open<\/strong>.<\/p>\n\n\n\n<p>The final step is compiling and uploading the software. This is really simple: Just go to <strong>Sketch<\/strong> &gt; <strong>Upload<\/strong> and (all being well) the software will be compiled and flashed to your board.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Testing voice control<\/h3>\n\n\n\n<p>Make sure the microphone shield is connected to the XC4700 board. Plug the board into a USB power source (or your computer). Allow the board to fully power up. Now say the word \u2018yes\u2019 into the microphone. LED1 on the board should light up for 3s. Then say the word \u2018no\u2019. LED2 should light for 3s this time. If nothing is heard, or the system can\u2019t recognise what is said, no LEDs will light up. If you want to use this as a real controller, you can easily modify command-responder.cc.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Going beyond simple voice recognition<\/h2>\n\n\n\n<p>All the practical examples we looked at so far in this series have been relatively simple. In the next article, I will look at more powerful hardware platforms that take edge ML to the next level. These platforms are specifically designed for running AI applications at the edge. They include:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Google\u2019s <a href=\"https:\/\/coral.ai\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Coral TPU<\/a>, a platform and ecosystem for creating privacy-preserving AI.<\/li><li>Intel\u2019s <a href=\"https:\/\/ark.intel.com\/content\/www\/us\/en\/ark\/products\/140109\/intel-neural-compute-stick-2.html\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">NCS2<\/a> or Neural Compute Stick, a plug-and-play USB stick specifically designed to bring deep learning to the edge.<\/li><li>ST\u2019s <a href=\"https:\/\/www.st.com\/content\/st_com\/en\/stm32-ann.html\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">STM32 Cube.AI<\/a>, a package that brings artificial neural networks to ST\u2019s Cortex-based microcontrollers.<\/li><\/ul>\n\n\n\n<p>We will see how these platforms go beyond simple pre-trained ML models, enabling deep learning and unsupervised learning at the edge.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Voice control was the stuff of science fiction throughout the 20th Century. But in the last two decades, voice control has entered the mainstream. Voice assistants like Siri and Alexa are embedded in home devices, headphones, and even cars.&nbsp; But, how did we get to this point? What is the connection with machine learning at&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/\">Read more<\/a><\/p>\n","protected":false},"author":83,"featured_media":11896,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[9904],"tags":[6253],"collections":[],"class_list":{"0":"post-11895","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-voice-digital-assistants","8":"tag-chatbot","9":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.5) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Voice Control: Building Your Voice Assistant - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"Machine Learning at the edge has made voice control an everyday reality. Here, you will learn how to create a simple voice controller.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Voice Control: Building Your Voice Assistant\" \/>\n<meta property=\"og:description\" content=\"Machine Learning at the edge has made voice control an everyday reality. Here, you will learn how to create a simple voice controller.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-11-02T14:24:36+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-01-05T19:06:14+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"675\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Mark Patrick, Mouser Electronics\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mark Patrick, Mouser Electronics\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/\"},\"author\":{\"name\":\"Mark Patrick, Mouser Electronics\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/person\\\/664e4da6990fc1344a2299435a542654\"},\"headline\":\"Voice Control: Building Your Voice Assistant\",\"datePublished\":\"2020-11-02T14:24:36+00:00\",\"dateModified\":\"2022-01-05T19:06:14+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/\"},\"wordCount\":2109,\"publisher\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/Voice-assistant-voice-test.jpg\",\"keywords\":[\"Chatbot\"],\"articleSection\":[\"Voice &amp; Digital Assistants\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/\",\"name\":\"Voice Control: Building Your Voice Assistant - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/Voice-assistant-voice-test.jpg\",\"datePublished\":\"2020-11-02T14:24:36+00:00\",\"dateModified\":\"2022-01-05T19:06:14+00:00\",\"description\":\"Machine Learning at the edge has made voice control an everyday reality. Here, you will learn how to create a simple voice controller.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/Voice-assistant-voice-test.jpg\",\"contentUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2020\\\/11\\\/Voice-assistant-voice-test.jpg\",\"width\":1200,\"height\":675,\"caption\":\"mouser voice control\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/voice-digital-assistants\\\/voice-control\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI\\\/ML\",\"item\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/ai-ml\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Machine Learning\",\"item\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/ai-ml\\\/machine-learning\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Voice Control: Building Your Voice Assistant\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#website\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2019\\\/11\\\/codemotionlogo.png\",\"contentUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2019\\\/11\\\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/Codemotion.Italy\\\/\",\"https:\\\/\\\/x.com\\\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/person\\\/664e4da6990fc1344a2299435a542654\",\"name\":\"Mark Patrick, Mouser Electronics\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/0d35fad9fee01e991637b67f54ae7cb8b001b5d2c1e4f7c1942b2105dad5a9bf?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/0d35fad9fee01e991637b67f54ae7cb8b001b5d2c1e4f7c1942b2105dad5a9bf?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/0d35fad9fee01e991637b67f54ae7cb8b001b5d2c1e4f7c1942b2105dad5a9bf?s=96&d=mm&r=g\",\"caption\":\"Mark Patrick, Mouser Electronics\"},\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/author\\\/mark-patrick\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Voice Control: Building Your Voice Assistant - Codemotion Magazine","description":"Machine Learning at the edge has made voice control an everyday reality. Here, you will learn how to create a simple voice controller.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/","og_locale":"en_US","og_type":"article","og_title":"Voice Control: Building Your Voice Assistant","og_description":"Machine Learning at the edge has made voice control an everyday reality. Here, you will learn how to create a simple voice controller.","og_url":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2020-11-02T14:24:36+00:00","article_modified_time":"2022-01-05T19:06:14+00:00","og_image":[{"width":1200,"height":675,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg","type":"image\/jpeg"}],"author":"Mark Patrick, Mouser Electronics","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Mark Patrick, Mouser Electronics","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/"},"author":{"name":"Mark Patrick, Mouser Electronics","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/664e4da6990fc1344a2299435a542654"},"headline":"Voice Control: Building Your Voice Assistant","datePublished":"2020-11-02T14:24:36+00:00","dateModified":"2022-01-05T19:06:14+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/"},"wordCount":2109,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg","keywords":["Chatbot"],"articleSection":["Voice &amp; Digital Assistants"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/","url":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/","name":"Voice Control: Building Your Voice Assistant - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg","datePublished":"2020-11-02T14:24:36+00:00","dateModified":"2022-01-05T19:06:14+00:00","description":"Machine Learning at the edge has made voice control an everyday reality. Here, you will learn how to create a simple voice controller.","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg","width":1200,"height":675,"caption":"mouser voice control"},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/voice-digital-assistants\/voice-control\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"AI\/ML","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/"},{"@type":"ListItem","position":3,"name":"Machine Learning","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/machine-learning\/"},{"@type":"ListItem","position":4,"name":"Voice Control: Building Your Voice Assistant"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/664e4da6990fc1344a2299435a542654","name":"Mark Patrick, Mouser Electronics","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/0d35fad9fee01e991637b67f54ae7cb8b001b5d2c1e4f7c1942b2105dad5a9bf?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/0d35fad9fee01e991637b67f54ae7cb8b001b5d2c1e4f7c1942b2105dad5a9bf?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/0d35fad9fee01e991637b67f54ae7cb8b001b5d2c1e4f7c1942b2105dad5a9bf?s=96&d=mm&r=g","caption":"Mark Patrick, Mouser Electronics"},"url":"https:\/\/www.codemotion.com\/magazine\/author\/mark-patrick\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-600x400.jpg","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-600x600.jpg","author_info":{"display_name":"Mark Patrick, Mouser Electronics","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/mark-patrick\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg",1200,675,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-150x150.jpg",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-300x169.jpg",300,169,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-768x432.jpg",768,432,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-1024x576.jpg",1024,576,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg",1200,675,false],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg",1200,675,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test.jpg",100,56,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-180x128.jpg",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-896x504.jpg",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-400x225.jpg",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-600x400.jpg",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/11\/Voice-assistant-voice-test-600x600.jpg",600,600,true]},"uagb_author_info":{"display_name":"Mark Patrick, Mouser Electronics","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/mark-patrick\/"},"uagb_comment_info":0,"uagb_excerpt":"Voice control was the stuff of science fiction throughout the 20th Century. But in the last two decades, voice control has entered the mainstream. Voice assistants like Siri and Alexa are embedded in home devices, headphones, and even cars.&nbsp; But, how did we get to this point? What is the connection with machine learning at&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/11895","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/83"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=11895"}],"version-history":[{"count":3,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/11895\/revisions"}],"predecessor-version":[{"id":11993,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/11895\/revisions\/11993"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/11896"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=11895"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=11895"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=11895"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=11895"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}