snipt

Ctrl+h for KB shortcuts

C#

How to add text-to-speech and speech-to-text features to your SIP software by using Microsoft Speech Platform in C#?

using System;
using System.Threading;
using Ozeki.Media.MediaHandlers;
using Ozeki.Media.MediaHandlers.Speech;
 
namespace Microsoft_Speech_Platform
{
    class Program
    {
        static Speaker _speaker;
        static Microphone _microphone;
        static MediaConnector _connector;
        static TextToSpeech _tts;
        static SpeechToText _stt;
 
        static void Main(string[] args)
        {
            _microphone = Microphone.GetDefaultDevice();
            _speaker = Speaker.GetDefaultDevice();
            _connector = new MediaConnector();
 
            SetupTextToSpeech();
 
            SetupSpeechToText();
 
            while (true) Thread.Sleep(10);
        }
 
        static void SetupTextToSpeech()
        {
            _tts = new TextToSpeech();
            _tts.AddTTSEngine(new MSSpeechPlatformTTS());
 
            var voices = _tts.GetAvailableVoices();
            foreach (var voice in voices)
            {
                if (voice.Language.Equals("en-GB"))
                    _tts.ChangeLanguage(voice.Language, voice.Name);
            }
 
            _speaker.Start();
            _connector.Connect(_tts, _speaker);
            _tts.AddAndStartText("Hello World!");
        }
 
 
        static void SetupSpeechToText()
        {
            string[] words = {"Hello", "Welcome"};
            _stt = SpeechToText.CreateInstance(words);
            _stt.WordRecognized += stt_WordRecognized;
            _stt.ChangeSTTEngine(new MSSpeechPlatformSTT());
 
            var recognizers = _stt.GetRecognizers();
            foreach (var recognizer in recognizers)
            {
                if (recognizer.Culture.Name == "en-GB")
                    _stt.ChangeRecognizer(recognizer.ID);
            }
 
            _connector.Connect(_microphone, _stt);
            _microphone.Start();
        }
 
        static void stt_WordRecognized(object sender, SpeechDetectionEventArgs e)
        {
            Console.WriteLine("Word recognized: {0}", e.Word);
        }
    }
}

Description

In my previous snippet I have written about converting text to speech using C#. This code snippet can be used not just for allowing your computer to read txt aloud, but also for speech recognition. To implement this functionality I used Microsoft Speech Platform 11 along with Ozeki VoIP SIP SDK. The first one provides two classes (MSSpeechPlatformSTT, MSSpeechPlatformTTS) for text-to-speech and speech-to-text, and the VoIP SDK ensures the necessary VoIP components. The source code below is ready for use, so you only need to copy&paste it to your Visual Studio, then modify the necessary fields. (Do not forget to add the necessary DLL files to your references: http://www.voip-sip-sdk.com, http://www.microsoft.com/en-us/download/details.aspx?id=27226 )

After creating the necessary using lines and media handler objects, you can implement the text-to-speech and the voice recognition features by using the SetupTextToSpeech() and the SetupSpeechToText() methods.

Have a good time!
https://snipt.net/embed/c8231104d12ee85fcd0e32ba3de6d501/
/raw/c8231104d12ee85fcd0e32ba3de6d501/
c8231104d12ee85fcd0e32ba3de6d501
csharp
C#
70
2019-08-23T06:50:14
True
False
False
Mar 06, 2015 at 09:06 AM
/api/public/snipt/138411/
how-to-add-text-to-speech-and-speech-to-text-features-to-your-sip-software-by-using-microsoft-speech-platform-in-c-d10ecab5
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><a href="#L-1"> 1</a> <a href="#L-2"> 2</a> <a href="#L-3"> 3</a> <a href="#L-4"> 4</a> <a href="#L-5"> 5</a> <a href="#L-6"> 6</a> <a href="#L-7"> 7</a> <a href="#L-8"> 8</a> <a href="#L-9"> 9</a> <a href="#L-10">10</a> <a href="#L-11">11</a> <a href="#L-12">12</a> <a href="#L-13">13</a> <a href="#L-14">14</a> <a href="#L-15">15</a> <a href="#L-16">16</a> <a href="#L-17">17</a> <a href="#L-18">18</a> <a href="#L-19">19</a> <a href="#L-20">20</a> <a href="#L-21">21</a> <a href="#L-22">22</a> <a href="#L-23">23</a> <a href="#L-24">24</a> <a href="#L-25">25</a> <a href="#L-26">26</a> <a href="#L-27">27</a> <a href="#L-28">28</a> <a href="#L-29">29</a> <a href="#L-30">30</a> <a href="#L-31">31</a> <a href="#L-32">32</a> <a href="#L-33">33</a> <a href="#L-34">34</a> <a href="#L-35">35</a> <a href="#L-36">36</a> <a href="#L-37">37</a> <a href="#L-38">38</a> <a href="#L-39">39</a> <a href="#L-40">40</a> <a href="#L-41">41</a> <a href="#L-42">42</a> <a href="#L-43">43</a> <a href="#L-44">44</a> <a href="#L-45">45</a> <a href="#L-46">46</a> <a href="#L-47">47</a> <a href="#L-48">48</a> <a href="#L-49">49</a> <a href="#L-50">50</a> <a href="#L-51">51</a> <a href="#L-52">52</a> <a href="#L-53">53</a> <a href="#L-54">54</a> <a href="#L-55">55</a> <a href="#L-56">56</a> <a href="#L-57">57</a> <a href="#L-58">58</a> <a href="#L-59">59</a> <a href="#L-60">60</a> <a href="#L-61">61</a> <a href="#L-62">62</a> <a href="#L-63">63</a> <a href="#L-64">64</a> <a href="#L-65">65</a> <a href="#L-66">66</a> <a href="#L-67">67</a> <a href="#L-68">68</a> <a href="#L-69">69</a> <a href="#L-70">70</a></pre></div></td><td class="code"><div class="highlight"><pre><span></span><span id="L-1"><a name="L-1"></a><span class="k">using</span> <span class="nn">System</span><span class="p">;</span> </span><span id="L-2"><a name="L-2"></a><span class="k">using</span> <span class="nn">System.Threading</span><span class="p">;</span> </span><span id="L-3"><a name="L-3"></a><span class="k">using</span> <span class="nn">Ozeki.Media.MediaHandlers</span><span class="p">;</span> </span><span id="L-4"><a name="L-4"></a><span class="k">using</span> <span class="nn">Ozeki.Media.MediaHandlers.Speech</span><span class="p">;</span> </span><span id="L-5"><a name="L-5"></a> </span><span id="L-6"><a name="L-6"></a><span class="k">namespace</span> <span class="nn">Microsoft_Speech_Platform</span> </span><span id="L-7"><a name="L-7"></a><span class="p">{</span> </span><span id="L-8"><a name="L-8"></a> <span class="k">class</span> <span class="nc">Program</span> </span><span id="L-9"><a name="L-9"></a> <span class="p">{</span> </span><span id="L-10"><a name="L-10"></a> <span class="k">static</span> <span class="n">Speaker</span> <span class="n">_speaker</span><span class="p">;</span> </span><span id="L-11"><a name="L-11"></a> <span class="k">static</span> <span class="n">Microphone</span> <span class="n">_microphone</span><span class="p">;</span> </span><span id="L-12"><a name="L-12"></a> <span class="k">static</span> <span class="n">MediaConnector</span> <span class="n">_connector</span><span class="p">;</span> </span><span id="L-13"><a name="L-13"></a> <span class="k">static</span> <span class="n">TextToSpeech</span> <span class="n">_tts</span><span class="p">;</span> </span><span id="L-14"><a name="L-14"></a> <span class="k">static</span> <span class="n">SpeechToText</span> <span class="n">_stt</span><span class="p">;</span> </span><span id="L-15"><a name="L-15"></a> </span><span id="L-16"><a name="L-16"></a> <span class="k">static</span> <span class="k">void</span> <span class="nf">Main</span><span class="p">(</span><span class="kt">string</span><span class="p">[]</span> <span class="n">args</span><span class="p">)</span> </span><span id="L-17"><a name="L-17"></a> <span class="p">{</span> </span><span id="L-18"><a name="L-18"></a> <span class="n">_microphone</span> <span class="p">=</span> <span class="n">Microphone</span><span class="p">.</span><span class="n">GetDefaultDevice</span><span class="p">();</span> </span><span id="L-19"><a name="L-19"></a> <span class="n">_speaker</span> <span class="p">=</span> <span class="n">Speaker</span><span class="p">.</span><span class="n">GetDefaultDevice</span><span class="p">();</span> </span><span id="L-20"><a name="L-20"></a> <span class="n">_connector</span> <span class="p">=</span> <span class="k">new</span> <span class="n">MediaConnector</span><span class="p">();</span> </span><span id="L-21"><a name="L-21"></a> </span><span id="L-22"><a name="L-22"></a> <span class="n">SetupTextToSpeech</span><span class="p">();</span> </span><span id="L-23"><a name="L-23"></a> </span><span id="L-24"><a name="L-24"></a> <span class="n">SetupSpeechToText</span><span class="p">();</span> </span><span id="L-25"><a name="L-25"></a> </span><span id="L-26"><a name="L-26"></a> <span class="k">while</span> <span class="p">(</span><span class="k">true</span><span class="p">)</span> <span class="n">Thread</span><span class="p">.</span><span class="n">Sleep</span><span class="p">(</span><span class="m">10</span><span class="p">);</span> </span><span id="L-27"><a name="L-27"></a> <span class="p">}</span> </span><span id="L-28"><a name="L-28"></a> </span><span id="L-29"><a name="L-29"></a> <span class="k">static</span> <span class="k">void</span> <span class="nf">SetupTextToSpeech</span><span class="p">()</span> </span><span id="L-30"><a name="L-30"></a> <span class="p">{</span> </span><span id="L-31"><a name="L-31"></a> <span class="n">_tts</span> <span class="p">=</span> <span class="k">new</span> <span class="n">TextToSpeech</span><span class="p">();</span> </span><span id="L-32"><a name="L-32"></a> <span class="n">_tts</span><span class="p">.</span><span class="n">AddTTSEngine</span><span class="p">(</span><span class="k">new</span> <span class="n">MSSpeechPlatformTTS</span><span class="p">());</span> </span><span id="L-33"><a name="L-33"></a> </span><span id="L-34"><a name="L-34"></a> <span class="kt">var</span> <span class="n">voices</span> <span class="p">=</span> <span class="n">_tts</span><span class="p">.</span><span class="n">GetAvailableVoices</span><span class="p">();</span> </span><span id="L-35"><a name="L-35"></a> <span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">voice</span> <span class="k">in</span> <span class="n">voices</span><span class="p">)</span> </span><span id="L-36"><a name="L-36"></a> <span class="p">{</span> </span><span id="L-37"><a name="L-37"></a> <span class="k">if</span> <span class="p">(</span><span class="n">voice</span><span class="p">.</span><span class="n">Language</span><span class="p">.</span><span class="n">Equals</span><span class="p">(</span><span class="s">&quot;en-GB&quot;</span><span class="p">))</span> </span><span id="L-38"><a name="L-38"></a> <span class="n">_tts</span><span class="p">.</span><span class="n">ChangeLanguage</span><span class="p">(</span><span class="n">voice</span><span class="p">.</span><span class="n">Language</span><span class="p">,</span> <span class="n">voice</span><span class="p">.</span><span class="n">Name</span><span class="p">);</span> </span><span id="L-39"><a name="L-39"></a> <span class="p">}</span> </span><span id="L-40"><a name="L-40"></a> </span><span id="L-41"><a name="L-41"></a> <span class="n">_speaker</span><span class="p">.</span><span class="n">Start</span><span class="p">();</span> </span><span id="L-42"><a name="L-42"></a> <span class="n">_connector</span><span class="p">.</span><span class="n">Connect</span><span class="p">(</span><span class="n">_tts</span><span class="p">,</span> <span class="n">_speaker</span><span class="p">);</span> </span><span id="L-43"><a name="L-43"></a> <span class="n">_tts</span><span class="p">.</span><span class="n">AddAndStartText</span><span class="p">(</span><span class="s">&quot;Hello World!&quot;</span><span class="p">);</span> </span><span id="L-44"><a name="L-44"></a> <span class="p">}</span> </span><span id="L-45"><a name="L-45"></a> </span><span id="L-46"><a name="L-46"></a> </span><span id="L-47"><a name="L-47"></a> <span class="k">static</span> <span class="k">void</span> <span class="nf">SetupSpeechToText</span><span class="p">()</span> </span><span id="L-48"><a name="L-48"></a> <span class="p">{</span> </span><span id="L-49"><a name="L-49"></a> <span class="kt">string</span><span class="p">[]</span> <span class="n">words</span> <span class="p">=</span> <span class="p">{</span><span class="s">&quot;Hello&quot;</span><span class="p">,</span> <span class="s">&quot;Welcome&quot;</span><span class="p">};</span> </span><span id="L-50"><a name="L-50"></a> <span class="n">_stt</span> <span class="p">=</span> <span class="n">SpeechToText</span><span class="p">.</span><span class="n">CreateInstance</span><span class="p">(</span><span class="n">words</span><span class="p">);</span> </span><span id="L-51"><a name="L-51"></a> <span class="n">_stt</span><span class="p">.</span><span class="n">WordRecognized</span> <span class="p">+=</span> <span class="n">stt_WordRecognized</span><span class="p">;</span> </span><span id="L-52"><a name="L-52"></a> <span class="n">_stt</span><span class="p">.</span><span class="n">ChangeSTTEngine</span><span class="p">(</span><span class="k">new</span> <span class="n">MSSpeechPlatformSTT</span><span class="p">());</span> </span><span id="L-53"><a name="L-53"></a> </span><span id="L-54"><a name="L-54"></a> <span class="kt">var</span> <span class="n">recognizers</span> <span class="p">=</span> <span class="n">_stt</span><span class="p">.</span><span class="n">GetRecognizers</span><span class="p">();</span> </span><span id="L-55"><a name="L-55"></a> <span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">recognizer</span> <span class="k">in</span> <span class="n">recognizers</span><span class="p">)</span> </span><span id="L-56"><a name="L-56"></a> <span class="p">{</span> </span><span id="L-57"><a name="L-57"></a> <span class="k">if</span> <span class="p">(</span><span class="n">recognizer</span><span class="p">.</span><span class="n">Culture</span><span class="p">.</span><span class="n">Name</span> <span class="p">==</span> <span class="s">&quot;en-GB&quot;</span><span class="p">)</span> </span><span id="L-58"><a name="L-58"></a> <span class="n">_stt</span><span class="p">.</span><span class="n">ChangeRecognizer</span><span class="p">(</span><span class="n">recognizer</span><span class="p">.</span><span class="n">ID</span><span class="p">);</span> </span><span id="L-59"><a name="L-59"></a> <span class="p">}</span> </span><span id="L-60"><a name="L-60"></a> </span><span id="L-61"><a name="L-61"></a> <span class="n">_connector</span><span class="p">.</span><span class="n">Connect</span><span class="p">(</span><span class="n">_microphone</span><span class="p">,</span> <span class="n">_stt</span><span class="p">);</span> </span><span id="L-62"><a name="L-62"></a> <span class="n">_microphone</span><span class="p">.</span><span class="n">Start</span><span class="p">();</span> </span><span id="L-63"><a name="L-63"></a> <span class="p">}</span> </span><span id="L-64"><a name="L-64"></a> </span><span id="L-65"><a name="L-65"></a> <span class="k">static</span> <span class="k">void</span> <span class="nf">stt_WordRecognized</span><span class="p">(</span><span class="kt">object</span> <span class="n">sender</span><span class="p">,</span> <span class="n">SpeechDetectionEventArgs</span> <span class="n">e</span><span class="p">)</span> </span><span id="L-66"><a name="L-66"></a> <span class="p">{</span> </span><span id="L-67"><a name="L-67"></a> <span class="n">Console</span><span class="p">.</span><span class="n">WriteLine</span><span class="p">(</span><span class="s">&quot;Word recognized: {0}&quot;</span><span class="p">,</span> <span class="n">e</span><span class="p">.</span><span class="n">Word</span><span class="p">);</span> </span><span id="L-68"><a name="L-68"></a> <span class="p">}</span> </span><span id="L-69"><a name="L-69"></a> <span class="p">}</span> </span><span id="L-70"><a name="L-70"></a><span class="p">}</span> </span></pre></div> </td></tr></table>
"microsoft speech platform", audio, autodialer, c#, call, convert, csharp, ivr, pbx, phone, recognition, sip, speech, text, text-to-speech, voice, voip