REBOL Technologies

A Speech Recognition Test - Dictating a Tutorial

Carl Sassenrath, CTO
REBOL Technologies
25-Jan-2005 18:13 GMT

Article #0115
Main page || Index || Prior Article [0114] || Next Article [0116] || Post Comments || Send feedback

Back in November, as a community, we started a CGI tutorial project to describe how to write a REBOL-based web bulletin board system (BBS). The purpose was to show a more substantial example of using REBOL for creating CGI programs.

Unfortunately, the project pretty much got lost in the holiday shuffle. So today I decided... what better than to spend a few hours and wrap it up, while at the same time evaluating speech recognition technology. The ultimate question being: would I become more productive?

For my speech recognition tool, I selected the Dragon NaturallySpeaking software from ScanSoft because it is widely considered one of the best programs on the market. (In fact I'm using it to dictate this very blog.)

I must say that I have had a limited degree of success with Dragon. I've had several people write to me and tell me that Dragon works very well for them, but I have yet to duplicate that experience. Although the software does a fairly good job in the actual speech-to-text conversion, there are a few "workflow" problems that really slow me down and almost negate the benefit of the software for real projects.

For example, here is a small problem that really gets in the way. The Dragon software absolutely cannot remember the word REBOL. I have trained and retrained the software over and over on this simple word. I have deleted similar words (like "rebel") from the vocabulary - just so it won't get confused. And, as soon as I think I have it trained, I return a few hours later or the next day, and it has reverted to its old behavior.

The big puzzle for me as a language designer is why speech recognition software can't be more tuned to the context of my writing. I typically use the word REBOL every few sentences. In a RISC way of thinking (as I always think), I would call REBOL a high-frequency word within the context of my writing. The software should recognize that and not continually give me the words Roybal, Preble, rabble, and others. I have never used those words in any of my documents. They are low-frequency words. The software should not be suggesting them. In fact, when I tell the software to correct the word, the menu of choices usually does not even include the word REBOL! In other words, the software doesn't even have a clue.

I'm sure that some of you will write to me and tell me that I'm not using the software correctly, that I'm not speaking properly, that I have a bad microphone, or my environment is too noisy, etc. I've been pretty careful about those, and as I said before the speech-to-text conversion works quite well.

The problem is in the overall design of the product. This is just not a great implementation of a speech recognition tool. Good software designs need to consider the most frequent workflow of their users and optimize for that. Otherwise, they are just wasting our time.

Post Comments

Updated 19-Nov-2024   -   Copyright Carl Sassenrath   -   WWW.REBOL.COM   -   Edit   -   Blogger Source Code