No, I will not do the critic of the 1997 Radio Head album. I really like this album, but you are not coming to this blog for the review of a 22-years old album!
Instead, I will show you how you can program your .Net application to react to voice commands in a really easy way.
Even if Microsoft just made public some big announcement for Cortana, it does not mean that you cannot benefit from vocal commands. Vocal commands to your phone and other devices (Alexa, Google Home, …) are here to stay. So why not your very own .Net application?
As per my tests, we are far from being able to dictate a full text to the computer and have it spit it out for you. But for limited commands, accuracy is quite good.
The downloadable demo
This month’s downloadable demo solution contains both VB and C# projects. The solution was created using Visual Studio 2017 but should also work in previous versions.
Figure 1: The demo application in action
Required reference
To be able to use the code from this article, you will need to add a reference to System.Speech. It is part of the .Net framework so there is nothing more to install and/or to deploy.
Figure 2: Adding your reference
Building the UI
As you can see from figure 1, the UI is simple. The main controls are a few buttons to trigger some actions and a multiline textbox to report the status of the actions.
The code
The code you need to add to your application to benefit from this feature is not complex.
First, here is the code to display the list of currently supported cultures in which we are just looping through the collection of InstalledRecognizers to output their culture into the textbox:
txtResults.AppendText(Environment.NewLine) txtResults.AppendText("Installed recognizers are:" & Environment.NewLine) For Each ri As RecognizerInfo In SpeechRecognitionEngine.InstalledRecognizers() txtResults.AppendText($"{ri.Culture.DisplayName} (Code={ri.Culture.Name})" & Environment.NewLine) Next
txtResults.AppendText(Environment.NewLine); txtResults.AppendText("Installed recognizers are:" + Environment.NewLine); foreach (RecognizerInfo ri in SpeechRecognitionEngine.InstalledRecognizers()) { txtResults.AppendText($"{ri.Culture.DisplayName} (Code={ri.Culture.Name})" + Environment.NewLine); }
The following code is used when clicking on one of the two “Start Capture” buttons. I am trying to use the DRY principle. Both are doing the same thing except that one is providing a list of commands to the grammar builder. You will also notice that I am setting the culture otherwise the recognition could be fun:
Dim strListeningMode As String Thread.CurrentThread.CurrentCulture = New Globalization.CultureInfo(txtCulture.Text) _listener = New SpeechRecognitionEngine(New Globalization.CultureInfo(txtCulture.Text)) Dim builder As GrammarBuilder = New GrammarBuilder() builder.Culture = _listener.RecognizerInfo.Culture If sender Is btnStartCaptureCommands Then strListeningMode = "commands" builder.Append(New Choices(New String() {"time", "heure", "date", "quit", "quitter", "clear", "vider", "snow", "neige", "culture"})) Else strListeningMode = "free text" builder.AppendDictation() End If Dim grammar As Grammar = New Grammar(builder) _listener.LoadGrammar(grammar) _listener.SetInputToDefaultAudioDevice() AddHandler _listener.SpeechRecognized, AddressOf _listener_SpeechRecognized _listener.RecognizeAsync(RecognizeMode.Multiple) txtResults.AppendText(Environment.NewLine) txtResults.AppendText($"Computer is now listening for {strListeningMode}..." & Environment.NewLine) txtResults.AppendText($"Thread.CurrentThread.CurrentCulture = {Thread.CurrentThread.CurrentCulture}" & Environment.NewLine) txtResults.AppendText($"_listener.RecognizerInfo.Culture = {_listener.RecognizerInfo.Culture}" & Environment.NewLine) txtResults.AppendText($"builder.Culture = {builder.Culture}" & Environment.NewLine)
string strListeningMode; Thread.CurrentThread.CurrentCulture = new System.Globalization.CultureInfo(txtCulture.Text); _listener = new SpeechRecognitionEngine(new System.Globalization.CultureInfo(txtCulture.Text)); GrammarBuilder builder = new GrammarBuilder(); builder.Culture = _listener.RecognizerInfo.Culture; if (sender == btnStartCaptureCommands) { strListeningMode = "commands"; builder.Append(new Choices(new string[] { "time","heure", "date", "quit","quitter", "clear","vider", "snow","neige", "culture" })); } else { strListeningMode = "free text"; builder.AppendDictation(); } Grammar grammar = new Grammar(builder); _listener.LoadGrammar(grammar); _listener.SetInputToDefaultAudioDevice(); _listener.SpeechRecognized += _listener_SpeechRecognized; _listener.RecognizeAsync(RecognizeMode.Multiple); txtResults.AppendText(Environment.NewLine); txtResults.AppendText($"Computer is now listening for {strListeningMode}..." + Environment.NewLine); txtResults.AppendText($"Thread.CurrentThread.CurrentCulture = {Thread.CurrentThread.CurrentCulture}" + Environment.NewLine); txtResults.AppendText($"_listener.RecognizerInfo.Culture = {_listener.RecognizerInfo.Culture}" + Environment.NewLine); txtResults.AppendText($"builder.Culture = {builder.Culture}" + Environment.NewLine);
In the previous code, we are binding the SpeechRecognized event to a method of our code. This method must handle the various commands we have added to our grammar or else it will just spit out what was recognized:
Dim result As String = e.Result.Text Select Case result.ToLower() Case "quit", "quitter" txtResults.AppendText("you want to quit? Bye." & Environment.NewLine) Thread.Sleep(2000) Application.[Exit]() Exit Select Case "clear", "vider" txtResults.Clear() Exit Select Case "culture" btnListCultures.PerformClick() Exit Select Case "date" txtResults.AppendText("you asked me for the date?" & Environment.NewLine) txtResults.AppendText(" it is actually " & DateTime.Now.ToLongDateString() + Environment.NewLine) Exit Select Case "time", "heure" txtResults.AppendText("you asked me for the time?" & Environment.NewLine) txtResults.AppendText(" it is actually " & DateTime.Now.ToLongTimeString() + Environment.NewLine) Exit Select Case Else txtResults.AppendText("Command not recognize. What I have heard is this:" & Environment.NewLine) txtResults.AppendText(" " & result & Environment.NewLine) Exit Select End Select
string result = e.Result.Text; switch (result.ToLower()) { case "quit": case "quitter": { txtResults.AppendText("you want to quit? Bye." + Environment.NewLine); Thread.Sleep(2000); Application.Exit(); break; } case "clear": case "vider": { txtResults.Clear(); break; } case "culture": { btnListCultures.PerformClick(); break; } case "date": { txtResults.AppendText("you asked me for the date?" + Environment.NewLine); txtResults.AppendText(" it is actually " + DateTime.Now.ToLongDateString() + Environment.NewLine); break; } case "time": case "heure": { txtResults.AppendText("you asked me for the time?" + Environment.NewLine); txtResults.AppendText(" it is actually " + DateTime.Now.ToLongTimeString() + Environment.NewLine); break; } default: { txtResults.AppendText("Command not recognize. What I have heard is this:" + Environment.NewLine); txtResults.AppendText(" " + result + Environment.NewLine); break; } }
Finally, a bit of cleanup is always better:
txtResults.AppendText(Environment.NewLine) txtResults.AppendText("Does not listen anymore" + Environment.NewLine) _listener.RecognizeAsyncStop() _listener.Dispose()
txtResults.AppendText(Environment.NewLine); txtResults.AppendText("Does not listen anymore" + Environment.NewLine); _listener.RecognizeAsyncStop(); _listener.Dispose();
Command mode versus Free text mode
It might be my strong French Canadian accent, but I found that the free text mode is not very accurate. Even when switching to French (or maybe I need to talk like a Frenchman from France).
But when you build the list of commands you are expecting, the accuracy is very close to 100%.
It would probably help to get a better accuracy result if I would spend the time training my computer with my own voice. There is a voice training feature in Windows that allow you to do this.
Equipped to capture voice
Of course, your computer will need to have devices (a microphone) either built-in or through a standalone microphone or even a headset. If you don’t have such a device, your application will surely fail with an InvalidOperationException which you better catch and handle properly.
Other cultures
Apparently not all cultures are recognized.
This URL shows the list of the 8 supported culture as of the time of writing: https://docs.microsoft.com/en-us/dotnet/api/system.speech.recognition.speechrecognitionengine.-ctor?view=netframework-4.7.2.
Also, if you try to load a culture that is not properly installed and supported by your OS, you will get an “ArgumentException: No recognizer of the required ID found”.
To fix that issue, open the “Region & language” dialog of Windows to add one of the 8 officially supported language (see link here above). Note that you might have to reboot the computer before it is getting effective.
Figure 3: Adding a language in Windows 10
Conclusion
With a few lines of code, you might get your application to react to some vocal commands.
Think twice about implementing that kind of feature if you are working in an open-space area!