Experimenting with Speech Recognition
I've long been a proponent for the use of speech recognition software. While at Apple Computer Advanced Technology Group, I started a special interest group to study the possibility of adding speech recognition to every Macintosh. It seemed to me that speech recognition, even if just for limited user-interface menus, would truly make the Macintosh "the computer for the rest of us". The marketing buzz alone would have been worth the price. Apple would have become known as the company that invented it. However, I was not successful convincing the management. Over the years since, I've watched from a distance the painfully slow progress of speech recognition technology. Every few years I would give it another try, to see if it had progressed far enough to become truly useful. Just this last weekend I decide to give it a try again while chatting on the REBOL3 AltME world on wireless XP tablet (where I had no other easy way of inputting text). It was an interesting experience. After a few hours, my conclusion was that the general technology of speech recognition software had indeed improved. It is several times better than it was 20 years ago, but unfortunately it is not "amazingly" better (it is not orders of magnitude better). Over the weekend I tried both the Microsoft and the Dragon NaturallySpeaking (from ScanSoft) speech recognition software. Ironically, the problems with this technology are not so much in the accuracy of the speech recognition but more in the integration of the software with the environment you're using. Although the software was easy to install and only a small amount of training was required to become moderately productive, both systems suffered from odd problems when used in environments other than Microsoft Word and other such standard tools. For example, when using Dragon to write this blog it had a serious problem with the Web form interface I use for text input (a standard Web browser textarea gadget). The software could not keep track of the edit point as I moved the cursor around to edit different parts of my text. It became so frustrating that I had to input my text into another application then cut and paste it back into the browser window. This was not a major problem because you can use the "Dragon Pad" application that comes with the Dragon software. However, when it comes to a blog, that additional step somewhat defeats the purpose. One of the main ideas of a blog is to be able to rapidly publish your ideas - type them right into the browser. So, if extra steps are necessary, you become less productive. Another problem that often occurs, such as when using AltME or a plain text editor, is that the software cannot do obvious things like capitalize the first word of the sentence. I realize that this specific problem comes from the fact that the recognition software has no idea of my editing context. It does not know if I am at the beginning of the sentence or somewhere within the sentence. However, it seems that a very simple algorithm could solve the problem in most cases. For instance, if my prior text ended with period, it could capitalize the character that follows. That would work well for 90% of the text I enter, and the software would not require deeper integration with my tools. Better yet, the software could provide a short little key word that would indicate a new sentence. I would not mind that. It could add the ending period to the previous sentence and start the next sentence with a capital letter. There are a lot of these little annoyances within the speech recognition software. The net result makes me perhaps only twice as fast as typing, rather than 10 times as fast or more. That is unfortunate. However, as with any new technology, I'm sure it will improve within the next few years. I plan to continue using the software for the rest of the week while I write more of the REBOL services documentation -- just to see how it works out. Perhaps in the end it may not make me that much more productive, but it sure might help with my carpal tunnel issues (from typing 10 hours every day).
|
Updated 19-Nov-2024 - Copyright Carl Sassenrath - WWW.REBOL.COM - Edit - Blogger Source Code |