Speech

There may (currently, or perhaps quite a bit into the future) be some further information about this on the Human Interaction: speech section.

[#txtospch]: Text-to-speech (“TTS”)
Microsoft's Speech Application Programmer's Interface (“Speech API”, “SAPI”)
Microsoft's site on speech describes some technologies from Microsoft. TOOGAM's tentative page about SAPI has some information. To summarize, SAPI 5.x was downloadable for WinNT4, Win98, 2K, and ME could download SAPI 5.x. WinXP included this technology. Win95 and earlier could download SAPI 4.0. There were various ways to use this, including downloadable software that uses this technology and, in some cases, technology that is bundled with the operating system and so may already be installed with the operating system.
Voice recognition
Computer interaction

Perhaps see also: Wikipedia's list of speech recognition software: “Open Source” section, Wikipedia's list of speech recongition software: Unix-like x86/x86-64 Speech Transcription Software

Perhaps see also: Wikipedia's article on: Transcription software, User speech

Web-based solution(s)

Note that if a website starts to support a microphone, the user may be prompted by the web browser. This may annoy some users if they were not expecting microphone interaction.

JuliusJS

Based on the home page's documentation, JuliusJS looks rather simple to install for a webmaster.

However, there is a reason why this technology hasn't been incorporated into this website (yet)...

Review

Unfortunately, it doesn't seem to work really well (based on a test in September 2015). The JuliusJS Live Demo notes, “Note that my vocabulary is limited for this demo.” Although the JuliusJS home page seemed to indicate that saying “Hello, world!” might work, those words did not seem to work. Numbers did seem to work rather okay, although that wasn't entirely accurate. A lot of phrases seemed to prepend the word “DIAL” before whatever else was recognized. “CALL” also appeared as a first word, so the demo might have been trying to be a bit specialized to mimic interactions with a telephone.

Solutions for Microsoft Windows
Hey Athena

On the downside, this isn't bundled with the operating system. On the plus side, it is open source, and cross-platform. See: Hey Athena info.

Microsoft Cortana

Bundled with Windows 10

Using Cortana allows a person to search the web (using Bing), perform some tasks (like scheduling an appointment), and providing some answers to questions.

At the time of this writing, getting direct answers seems rather rare, except for specific known questions which often provide entertaining answers. Instead, most results will have the user's web browser check out Bing's results. Over time, it seems Microsoft has been expanding Cortana's ability to provide direct answers without needing to open up a web browser and using Bing.

Requiring a Microsoft Account

Microsoft Cortana requires that the user be signed into a Microsoft account.

Based on DigitalCitizen.life's guide: “How to use Cortana with a local user account in Windows 10”, it seems like Cortana's ability to use a local user account was an ability that was new to the Windows 10 update released in November, 2015. In the earlier release of Windows 10 (build 10240, the “RTM” build which was the original “production” release), Cortana required that the user was logged into Windows 10 by using the Microsoft account.

Here is a first-hand account where the requirement of the Microsoft account was known to be particularly problematic. (This was Windows 10 version 1511, OS Build 10586.420, or earlier.) This surprised a user when enabling the Microsoft account overrode the decision to not require a login. The user had memorized a complex password just long enough to be able to sign in, and then did not retain the password. The computer was left on overnight and hibernated, effectively locking the user out of the computer until the user was able to find another computer to look up the Microsoft password.

Win7
Official documentation

In addition to this guide, there are some documents that Microsoft has produced. Here are some hyperlinks:

Windows 7: setting up speech recognition, Speech Recognition Capabilities, Common commands with Speech ecognition, Dictating text

Setup
[#w7pmicph]: Microphone properties

First, it makes sense to make sure that the Microphone is working. Visit Microsoft Windows Control Panel: “Sound” applet. (Perhaps use: “ RunDLL32 shell32.dll,Control_RunDLL mmsys.cpl,,1 ”. Another method is to right-click on the “Volume Control” icon in the “system tray”/“message notification area”, and then choose “Recording Devices”.) Make sure the microphone jack is showing that a device is plugged in. (If it has a white check mark on a green circle in the lower-right corner, and says “Default Device”, that is good. If the icon's lower-right corner is a red down arrow in a white circle, and says “Not plugged in”, then that probably needs to be fixed first.)

In practice, Microphones are often too quiet. Press the Properties button for the sound port that the microphone is plugged into. If the microphone cannot hear the computer's sound output (because headphones are being used), the “Listen” tab may show a “Listen to this device” checkbox (defaulting to off), which allows the user to hear what the microphone picks up. With that checked, the “Levels” tab may be able to increase volume. Setting the volume to 100 is probably a good idea. Also, on the “Levels” tab, there might be a slider bar called “Microphone Boost”. (That slider bar seems to exist for a (red) sound port that Microsoft Windows identifies as a Microphone port, but that slider bar simply did not exist for a (blue) sound port that Microsoft Windows identified as a “Line In” sound port.) If that “Microphone boost” slider bar exists, go ahead and increase it to “+30.0 db”.

Initial test

With the Microphone working, go to the Microsoft Windows Control Panel: “Speech” applet. Choose “Set up microphone”. (Doing this via the GUI really may be best. The program may be "%windir%\system32\Speech\SpeechUX\SpeechUXWiz.exe" and the program has been seen run with a command line parameter of “MicTraining,en-US,HKEY_CURRENT_USER\SOFTWARE\Microsoft\Speech\RecoProfiles\Tokens\{B41E60E3-63A0-4A0D-B164-563C2BF0A52A},363926068,0,""”. However, running the program directly seemed to result in an error message stating, “Speech recognition is not supported in the current language.”

After specifying the microphone type, read the sentence: “Peter dictates to his computer. He prefers it to typing, and particularly prefers it to pen and paper.”

If the microphone is not working very well, reading the sentence multiple times may be needed. If the microphone is picking up anything at all, Microsoft Windows might decide to move the dialog box to the next screen.

Results

Possible results screens include:

Results: Screen remains

One possible result is that the screen does not automatically advance; the screen continues to show the sound volume level and the sentence that the user is expected to read.

This just seems to happen sometimes. The best thing to do is usually to re-read the sentence. If nothing happens after reading the sentence a second time, that's usually an indication that things are not working ideally. After reading the text about Peter four or five times, it may just be best to go ahead and press the “Next” button.

Results: “Your microphone is now set up”

The screen may show a the following text: “The microphone is ready to use with this computer.” “Click Finish to complete the wizard.”

If this showed automatically after reading the text

If the screen came up automatically, this screen indicates that the microphone has worked suitably enough to complete this test. The microphone might, or might not, work suitably when trying to actually use it. (Actually using the microphone may require higher quality than what this test is able to determine.) However, the best possible results of this particular test seem to have been achieved. (Upon reaching this point, move onto the next segment of setting up speech recongition.)

If this showed up in other circumstances

If this came up while in the middle of reading the sentence, or particularly if it came up while in the middle of re-reading the sentence, that might not be a very good sign. Re-running this test might be worthwhile. (If the test works better the next time, then things probably really are fine, so just go with it.)

If this screen is shown after one of the other results screens (which are screens that indicate a problem), that is very disheartening... it probably is best to re-run the test to see if some better results can be obtained.

results: “Is your microphone muted?”

The screen may say: “The computer did not hear anything.” “Make sure the Microphone/Recording Device is not muted (check for a mute button on the microphone).” “Also make sure that you're speaking into the correct microphone (if there are multiple microphones) and that it is connected to your computer.” “Click Next to try again.” “Note: If you see htis error repeatedly, your microphone may not be ideal for Speech Recognition. Consider trying a different microphone.”

This can be a particularly frustrating error message when the previous screen was showing the bar.

If this screen comes up, then see the following section on “Troubleshooting a microphone that is not picking up sound well

results: “Is the microphone positioned correctly?”

Results may say: “The computer did not hear you very clearly.” “For better speech recognition, make sure that the microphone is positioned correctly, that you are speaking in a quiet environment, and that you are speaking clealry and not rushing.” “Click the back button in the upper left corner to try again, otherwise, click Next to continue.”

The “back button” it refers to is a blue circle with a left arrow.

results: “Set up your microphone”

Results may say: “Proper microphone placement”. Then a bulleted list: “Position the microphone about an inch from your mouth, off to one side”, “Do not breathe directly into the microphone”, “Make sure the mute button is not set to mute”.

If this error message comes up, then see the following section on “Troubleshooting a microphone that is not picking up sound well

[#spchbdmc]: Troubleshooting a microphone that is not picking up sound well

The volume bar needs to be increased above the first/left/yellow section, and get into the middle/green section.

If the sound volume does not change at all when text is spoken

If that is not happening, then the computer might not be successfully receiving sound from the microphone. Even if the sound volume is showing a very small green bar, this does not really indicate that the microphone is working as expected.

This will need to be fixed before any further progress can be made.

  • Visit Microsoft Windows Control Panel: “Sound” applet. (Perhaps use: “ RunDLL32 shell32.dll,Control_RunDLL mmsys.cpl,,1 ”. Another method is to right-click on the “Volume Control” icon in the “system tray”/“message notification area”, and then choose “Recording Devices”.) Next to the name of the device, the right side should show a volume bar. See if that is rising when you speak, or “clear your throat”/cough, or tap the microphone.
  • Make sure that the device is not muted (using a hardware mute button). If the microphone is powered, and if the microphone has a light to indicate that it is turned on, make sure it is on.
  • Trying a different sound port may be useful.
  • Unlike prior versions of Microsoft Windows, the “Volume Control” applet does not seem to have any useful interface for microphones, except for the “recording Devices” menu option which simply shows the same screen as the “Recording” tab of the Control Panel's “Sound” applet.
If the sound volume changes when text is spoken

The microphone is working... it is simply not hearing the text well enough.

From some experience, this is almost always caused by the microphone just not picking up the sound loud enough. When reading the text about Peter, nearly every single word should be showing the volume bar in the green section of the sound volume indicator. If the volume is remaining in the yellow, this is almost certainly the problem.

Even if the volume is reaching the green level, the problem might often be that the microphone is not picking up the voice. There are some things that can be done to help test this theory:

Increase volume through software

Maximize the microphone volume levels. Details are provided in the section about Windows 7: Microphone properties.

If this works, that is often the most pleasant method of resolving the issue.

The “Volume Mixer” may show various “Applications”, including one caled “Speech Recognition”. There may also be a slider bar named after the microphone device. Try making sure that any such slider bar is set to the maximum level. (Note that adjusting a slider bar might affect the “master” slider bar, by raising the global volume. Then, reducing the global volume back to its previous level might affect other slider bars.)

Software-based mute does not generally seem to be a problem: no such “mute” checkbox seems to be readily apparent in Microsoft Windows 7 (unlike, say, Windows XP).

Try varying the microphone placement

If the microphone is close to the user's mouth, make sure that the microphone is off to the side of the user's mouth, and not directly in front of the user's mouth. The reason is simply to have the microphone avoid the airflow from exhaling (or inhaling).

Hold the microphone right next to the speaker's mouth. Try not to slobber on the microphone, but other than that, get the microphone as close as possible to the speaker's mouth without excessive risk of salival contact. If this requires that someone holds the microphone to the speaker's mouth, then do that. This might not be a very ideal setup for a long-term scenario, but sometimes is useful for troubleshooting purposes. This may really help to confirm the suspicion that volume levels are indeed the only real problem, and that troubleshooting should be focused only on resolving that issue.

If at all possible, try placing the microphone closer to the person's mouth (but not directly in front of the user's mouth)

Port/Jack

Using a different sound recording jack/port might be useful. (This approach probably will not be useful, but if this possible solution does work, that may be much preferred over the next possible solution.)

Hardware selection

One item to note is that, as strange as this may be, the speech recognition software is often more picky than other programs that simply record audio input. A microphone that works just fine for other scenarios might lead to quite a bit of trouble when trying to use it with Microsoft's Speech API.

Quite often, and understandably so, people are very reluctant to try using a different microphone. (Many people may not have another microphone, and even if they do, there was often a reason that people selected the available microphone that they did select.) This may especially be resisted if there is reason to believe that the microphone should work fine, such as if the microphone is believed to have worked fine in some other circumstances.

Despite that very natural reluctance, actually trying a different microphone really, really, really does end up “magically” resolving issues in very, very, very many cases. In too many cases to be counted, the results end up being as different as what night is from day. As unpleasant as this step may seem, this is very often a step that should not be avoided if all of the following is true:

  • the sound indicator is showing some activity whatsoever
  • people have tried the other troubleshooting steps, such as having the software-based volume maximized
  • After making the last troubleshooting adjustment, four or five attempts have been made at getting the computer to understand the text about Peter. Also, all of the other recommended troubleshooting steps have been attempted (as much as reasonable).

If each of those things is true, then in most cases the result is that nothing seems to improve the situation until a better microphone is tried, and then that ends up just “magically” fixing things. This is true regardless of whether people spend another small number of minutes delaying this troubleshooting step, or whether someone spends dozens or hundreds of additional minutes delaying this troubleshooting step. Simply, if the troubleshooting has reached this point, the most rewarding method is very often to stop spending further active time until the results of this troubleshooting step can be witnessed.

Superior results have often been noticed by trying to use a headset with an attached microphone “boom” stick that places the microphone only an inch away from the user's mouth. If using a different style of microphone, placing the microphone closer to the speaker's mouth may be helpful. Despite the fact that Microsoft's screen indicates that the distance should be two feet or less, even shorter distances may work better. In some tests, a distance of about six inches worked better than eight. The audio that was actually been successfully picked up by the microphone was also noticeably quieter from just the distance of those couple of inches. This is actually detected fairly easily after checking the “Listen to this device” checkbox on the microphone port's properties in the “Sound” Control Panel's “Recording” tab.

If that approach does actually fix the issue, then that confirms that the sound port/jack/card is functioning, and people are often far less prone to spend a ton of time on other approaches like exploring for more options to click on. Even if using such a headset does not seem like the most preferred long-term solution, just trying it out has often ended up being a very re-assuring troubleshooting step.

Some people have made the claim that USB-based microphones are known to provide superior results. That seems less likely, but has not been confirmed to be false.

A better solution?

Some people may be very unhappy when their preferred microphone does not seem to work well. They may feel there is compelling reason for them to believe that a different microphone should work just fine. That may often be a reason why people are so reluctant to try a headset. Even after a headset seems to be working, people still seem to want to switch back to the microphone that they would prefer.

Umm... sorry... but, tough luck.

One would think that a slider bar could just be adjusted to being above 100%. Well, it does make some sense that the Microphone cannot hear more than 100% of what the Microphone is physically able to hear. (What might make a bit less sense is why the “Microphone boost” seems to maximize at “+30.0 db”.) People may often feel like there must be some way in software to treat the volume as being even louder.

Unfortunately, at the time of this writing, no such solution seems to be readily available. (As an example: SuperUser article on microphone's maximum volume being too quiet shows multiple non-solutions.) The best solution may really be to use different hardware. Until that better hardware is obtained, realize that many people have struggled with this issue, without finding any other better solutions.

Tutorial

The tutorial is part of a training system where the computer learns about the speech expected from an end user. It also provides details about certain commands, including many of the commands included in the relevant section of this documentation. (The “Mousegrid” may be a very notable exception: the tutorial didn't mention that interesting feature.)

Further training

“Show Speech Recognition”, and then “Show Speech Options”. Choose “Configuration”. Then choose “Improve voice recognition.”

Also, when the phrase “Correct that” is used, the speech recognition tries to identify what was intended. This behavior may lead to the speech recognition ending up being more accurate over time.

Recognized voice commands

Many of these may be provided by Microsoft's tutorial, which is good to go through since the tutorial will also be helping to train the computer to better recognize the user. Here is a reference, which largely includes commands taught by the tutorial from Microsoft Windows 7.

Some basic interaction with the speech recongition program
“Start Listening”
(This wakes up system when the voice activation is running, but is in “Sleep” mode.)
“Stop Listening”
This places the computer in sleep mode. The computer might still be listening, checking to see if it hears the “Start Listening” command. However, other than that, it will not respond to anything else that is said.
“Hide”/“Show” “Speech Recognition”

Use “Hide Speech Recongition” to minimize the program. To restore it, use “Show Speech Recognition”.

“Show Speech Options”

This shows a menu, with options including turning off, or “Start Speech Tutorial”

Turning listening off

First, say “Show Speech Options”.

Then, one option is to read the entire option: “Off: Do not listen to anything I say”. A different option might be to say, “Press Down Arrow three times”. Then, if the correct option is highlighted, say “Enter”.

“What can I say?”

May pop up a window showing some available options.

If you're looking for more options, look throughout this section of documentation. Another source of information is Microsoft's “Common Speech Recognition Commands” for Windows.

“Refresh Speech Commands” / “Cancel”

Probably not needed, but might be useful? This will “Update the list of speech commands that are currently available” (according to the “What can I say?” help).

Perhaps similarly, saying “Cancel” may help the speech menu to back out of an undesired menu/mode.

“Mousegrid”

(This isn't shown in the tutorial, but is part of the help that is shown by saying “What can I say?”, under the section called “Click anywhere on the screen”.)

Saying “Mousegrid” draws a grid on the screen, dividing the screen into nine segements. The user may say a number related to one of the grid sections. Then, the mousegrid will divide that segment of the screen into nine segments. The user may keep narrowing this down; on a 1980x1200 display this may be done five times. If the user keeps saying numbers after that, and if the speech program's interface is visible, the program will show a message: “Please say click, mark, move mouse, or cancel”. There may be some other options beyond that, including “double-click”.

At some probably cost of efficiency, but perhaps excellence in coolness factor, the mouse doesn't just teleport to the desired location. Instead, it is virtually dragged to the needed location. Visually, the mouse can be seen moving to the desired location before it performs an action like “Click”.

After selecting a location with mousegrid, another option is “Mark”. That will place a mark on the screen and show another mouse grid. The user may then use the second mousegrid to specify a second location, and then say “click”. The result will be that the mouse cursor will move to the first location, and hold down the mouse button, and move to the new location. This effectively drags the item to the new location.

If the user does not seem an as much as possible, that is fine. The mouse cursor will simply use the center of the last mousegrid that were shown.

Text handling
“Undo”
or “Undo that” or “Delete that”
“Correct (word/phrase)

The computer may show some alternatives on what it thinks might have been said.

  • If none of those are correct, try again to say the desired word. The computer may change the alternatives.
  • If an alternative looks good, say the number next to the desired alternative, and then say OK.
  • If none of the alternatives look close to right, another option is to say “Spell it”.
Select

e.g., “Select first through last”. (The previous text will be searched for the word “first” and then the word “last”, and select those words as well as all text between those words. It likely will not automatically select punctuation after the word.)

Or, “Select previous sentence”. Or, “Select next two sentences.” Or, “Select previous 3 words.” (These examples came from the tutorial.)

Once the text is selected, it may be unselected (especially likely to be useful when the text selected was not what was expected) with “Clear selection”. The selected text may then be deleted (“Delete that”).

Another option may be “Select all”. For many document editing programs, this is probably the same as pressing Ctrl-A.

“Select previous 5 words”

“Delete (word)

The phrase “Delete that” may have a special meaning. Otherwise, saying “Delete” followed by a word, or a series of words, may delete the specified words.

Go
“Go to (word)

Tries to find the word (within the text of a document that is being edited), and then moves the mouse cursor to that word. If the phrase is found multiple times, the Speech recognition may draw some numbers, allowing the user to specify which location is desired.

“Go after (word)

Tries to find the word, and moves the mouse cursor after that word.

Some others:

  • “Go to the end of the document
  • “Go to the start of the document
  • Or, instead of “document”, try “sentence”, “paragraph”.
Go to (fieldName)

(e.g. “Go to Address” for a web browser's address bar, or “Go to Subject” to hop to a “Subject” field of an E-Mail program)

Specific input

Common punctuation, and keys related to common punctuation, include “Period”, “dot” (like period, at least when specifying a URL in a web browser's address bar), “Question mark”, “Space”, “Backspace”, “Exclamation Mark”, “New Line”, “New Paragraph”, “Enter”, “Home”, “End”, “Tab”, “Delete”.

The word “Literal” may be used as an “escape” to specify that the next word should be used literally. For example, saying “Literal period” will cause the word “period” to be typed, instead of just pressing the period key.

Another option, which may be needed for most keys (but not super-common keys that were just mentioned), is to say “Press” followed by the key's name. Capital letters may be entered using something like “Capital B”, or “Shift X”. If that is not working well becuase the computer does not seem to be understanding a letter, try saying “Press X as in Xylophone”, or any other word that starts with the desired letter.

The help states, “You can also use the ICAO\NATO phoenetic alphabet to say the keyboard keys to press.” (See: “International Civil Aviation Organization” alphabet (“International Civil Aviation Organization” alphabet, archived by the Wayback Machine @ Archive.org), Wikipedia's article on “North Atlantic Treaty Assocation” phoenetic alphabet. It is probably best to avoid things like Alternative Phoenetic Alphabet, where words like Aye or Eye are used for humorous confusion.) “Using Speech Recognition to press keyboard keys will only work with Latin alphabets.”

“Spell it”

Microsoft's Tutorial has had users say:

  • “C as in close”
  • “d as in daughter”
  • “E as in Edward”
  • “M as in Mary”

Some examples: “Press Control Home” or “Press Control Plus Home” (to input Ctrl-Home), or “Press down arrow”.

As a variation: “Press Q five times.”

Computer interaction
Button names

The tutorial refers to “Say what you see” commands. If there is a traditional button, a button on a ribbon bar, or a tab, saying the text on the button may select the button. If there are multiple buttons with the same name, the buttons may then be covered up by a number: say the appropriate number to clarify which button was intended.

Clickable objects

When there is text related to a clickable hyperlink, including tree items (in the left frame of some software) or labelled icons, using “Click name” or “Double-click name” or “Right click name” may work.

“Show Numbers”

This may overlay clickable items with numbers. (In other words, numbers will appear on top of clickable items.) This may be useful for programs with buttons that are identified with pictoral icons, but not text. If the text for a button is not completely evident, for the short term (until the button's proper name is memorized), this may be a decent-ish way to specify which button to press.

It may be worthwhile to have the speech interandface visible (“Show Speech Recognition” may help with that). After selecting a number (and then saying OK), the speech recognition may select what was specified. Also, the speech recognition window may breifly display a message such as “OK. Next time say Text highlight” “color” or “OK. Next time say Rich Text” “Window”. This reveals additional shortcuts that the speech recognition tool would have been able to recognize. For tasks that are likely to be repeated, knowing those commands may offer a faster solution than going through “Show Numbers” again.

Scrolling
“Scroll up” will go up. “Scroll up optionalDistance” may help to specify a distance, an amount, to scroll. For instance, “Scroll down 20” my scroll down a bit, while “Scroll up 10” may scroll in the opposite direction a bit.
Window handling
“Close that”, “Minimize that”, “Restore that”, “Maximize that” may affect a current foreground window.
“Switch” “application”/“to programName

“Switch application” may be one method.

Specifying a program name, by saying “Switch to programName”, may be more efficient. This will cause the speech recognition software to attempt to find a running program that matches the name provided by the user. If there are multiple such windows, the speech recognition software prompts the user for clarification, so the user may then switch to the desired window. If there are multiple copies of the program running, and each program's title bar shows the name of a document, then specifying the document name might successfully work to quickly choose which instance to switch to. (Notepad is an example of this.)

“Start”

After “Start”, the next desired step (on a machine using Microsoft Windows 7) will likely be “All Programs”. Another option may be “Show Numbers”, which may allow icons in the Quick Launch bar to be selected.

Another option that might work is “Start programName

  • Examples, of programs that are likely to be recognized, may include:
    • Notepad
    • Wordpad
    • Calculator
“Show Desktop”

Windows are effectively minimized so that the desktop may be seen.

Holding down the mouse button

According to PDF file,

There doesn?t seem to be a way to tell Windows Speech Recognition Macros to perform a “hold down the button” or “let go of the button”. If you want to do this, you need another

program... and the PDF file refers to how to use AutoHotkey to be used (in addition to Microsoft's speech recognition program).

Microsoft's tutorial did include a command called “Mark” to drag an icon from the desktop to the Recycle Bin. Mark was used to select the icon, and then “Move to” moved the item to the recycle bin. Sometimes the speech recognition program seemed to recognize the word “mark” (because the interface responded by saying “Marking”), but the effect was not apparent. Sometimes, when dictating text, the program just inserted the word “mark”.

The tutorial (from Microsoft Windows 7) includes an example of a program called “Contoso v2.0”. Usually Microsoft uses “Contoso” as a reference to a fictional company (used by Microsoft for documentation/training purposes). Many people who have read books printed by Microsoft will have seen the name “Contoso” before. (The Contoso program, in this tutorial, appeared to be a spreadsheet.)

Troubleshooting

Many people have felt like the results of voice recognition have not been accurate enough to be as satisfying as desired.

One task that may be worth performing is to try opening up a text editor (or, if using Microsoft Windows XP, perhaps a word processor. MS KB 306901: How To Use Speech Recognition in Windows XP has stated, “You cannot dictate text in Microsoft Notepad at this time.”) See if the words being picked up are the words that are expected. If so, the issue is likely just software configuration. Otherwise, physical issues are often the case.

To troubleshoot physical issues, see the section on: Troubleshooting a microphone that is not picking up sound well.

In some cases, the most effective and impactful solution may be to try different hardware.

Another solution is to try having the user go through training. Manufacturers generally highly recommend the procedure; how much it really affects things may be unknown.

Try using a different speaker. To clarify, this means: try using a different person. See if a different person's voice works better. If a person has an easy time with one computer, but a hard time with another computer, that may be an indication that the problem is with the computer that is giving troubles. If one person has a much easier time, this may be a result of the user's accent. If that is the case, then trying different hardware might not be very worthwhile.

Trying different software may be an impact. SAPI 5 is better than SAPI 4. However, even SAPI 5.1 is notably older than some newer solutions. Google's solutions (seen by “Google Voice” voicemail sound-to-text, or “Google Now”) have been quite notable. Nuance communications have also provided some results that have impressed many people. United Parcel Service has implemented a solution that has worked remarkably well: UPS Pressroom article on voice recognition has stated, “Both the UPS Automated Pickup and Tracking systems are built on a suite of hardware and software products provided by Periphonics (www.peri.com) and Nuance Communications (www.nuance.com).” (Hyperlinks added.) One thing that may have benefited UPS is that the voice recognition is usually only looking for a small possibility of words. Also, Nuance communications have been used with Apple's Siri.

Vista version of speech recognition
Vista's speech recognition capabilities
Earlier operating systems

See information about various versions of SAPI (the Speech API by Microsoft), which is mentioned in the text to speech section.

Windows XP came with an upgrade to SAPI. MS KB 306537: How to install and configure speech recognition in Windows XP provides some setup options, and has a “How to use speech recognition” section with hyperlinks to other MS KB documents.

Apparently there some some compatability limits: MS KB 306901: How To Use Speech Recognition in Windows XP states, “The Microsoft speech recognition engine enables you to insert text into a document using specific programs. You can dictate text in any Office XP program, in Internet Explorer, and in Microsoft Outlook Express (versions 5.0 or later). Other software programs may eventually support the Microsoft speech recognition engine. You cannot dictate text in Microsoft Notepad at this time.” Furthermore, it seems some support was dropped in Office after the XP/2002 version: MS KB 306537: How to install and configure speech recognition in Windows XP states, “You can use speech recognition in Microsoft Office 2003 and 2002 programs in Windows XP. Versions of Microsoft Office programs that are earlier than Office 2002 do not support speech recognition. Windows XP does not support speech recognition in 2007 Microsoft Office programs.”

Word XP/2002 (part of Office XP/2002) may have had some support for speech recognition that was usable in Windows 98, ME, and NT4. See: Q278927: Description of the speech recognition and handwriting recognition methods in Word 2002.

A third party solution (which might, or might not, have been worthwhile) is Nitrous Voice Flux (archived by the Wayback Machine @ Archive.org).

Text To Speech
In addition to recognizing words, the intent here is to be able to write out grammer/puctuation/etc. See: text to speech section.