In the March issue of PC Magazine, John Dvorak comments on four areas of computer technology in his column entitled “Computing’s Final Frontier“: voice recognition; machine translation (MT); optical character recognition (OCR); and spell-checkers. Basically he’s decrying how little progress has been made on these in recent years relative to the vast improvements in computer capacities.
I’d like to comment briefly on all four. Two of those – voice recognition, or actually speech recognition, and MT – are areas that I think have particular importance and potential for non-dominant languages (what I’ve referred to elsewhere as “MINELs,” for minority, indigenous, national, endangered or ethnic, and local languages) including African languages on which I’ve been focusing. OCR is key to such work as getting out-of-print books in MINELs online. And spell-checkers are fundamental.
Voice recognition. Dvorak seems to see the glass half empty. I can’t claim to know the technology as he does, and maybe my expectations are too low, but from what I’ve seen of Dragon NaturallySpeaking, the accuracy of speech recognition in that specific task environment is quite excellent. We may do well to separate out two kinds of expectations: one, the ability of software to act as an accurate and dutiful (though at times perhaps a bit dense) scribe, and the other as something that can really analyze the language. For some kinds of production, the former is already useful. I’ll come back to the topic of software and language analysis towards the end of this post.
Machine translation. I’ve had a lot of conversations with people about MT, and a fair amount of experience with some uses of it. I’m convinced of its utility even today with its imperfections. It’s all too easy, however, to point out the flaws and express skepticism. Of course anyone who has used MT even moderately has encountered some hilarious results (mine include English to Portuguese “discussion on fonts” becoming the equivalent of “quarrels in baptismal sinks,” and the only Dutch to English MT I ever did which yielded “butt zen” from what I think was a town name). But apart from such absurdities, MT can do a lot – I’ll enjoy the laughs MT occasionally provides and take advantage of the glass half full here too.
But some problems with MT results are not just inadequacies of the programs. From my experience using MT, I’ve come to appreciate the fact that the quality of writing actually makes a huge difference in MT output. Run-on sentences, awkward phrasing, poor punctuation and simple spelling errors can confuse people, so how can MT be expected to do better?
Dvorak also takes a cheap shot when he considers it a “good gag” to translate with MT through a bunch of languages back to the original. Well you can get the same effect with the old grapevine game of whispering a message through a line of people and see what you get at the end – in the same language! At my son’s school they did a variant of this with a simple drawing seen and resketched one student at a time until it got through the class. If MT got closer to human accuracy you’d still have such corruption of information.
A particularly critical role I see for MT is in streamlining the translation of various materials into MINELs and among related MINELs, using work systems that involve perhaps different kinds of MT software as well as people to refine the products and feedback into improvements. In my book, “smart money” would take this approach. MT may never replace the human translator, but it can do a lot that people can’t.
Optical character resolution. Dvorak finds fault with OCR, but I have to say that I’ve been quite impressed with what I’ve seen. The main problems I’ve had have been with extended Latin characters and limited dictionaries – and both of those are because I’m using scanners at commercial locations, not on machines where I can make modifications. In other words I’d be doing better than 99% accuracy for a lot of material if I had my own scanners.
On the other hand, when there are extraneous marks – even minor ones – in the text, the OCR might come up with the kind of example Dvorak gives of symbols mixed up with letters. If you look at the amazing work that has been done with Google Patent Search, you’ll notice on older patents a fair amount of misrecognized character strings (words). So I’d agree that it seems like one ought to be able to program the software to be able to sort out characters and extraneous marks through some systematic analysis (a series of algorithms?) – picking form out of noise, referencing memory of texts in the language, etc.
In any event, enhancing OCR would help considerably with more digitization, especially as we get to digitizing publications in extended Latin scripts on stenciled pages and poor quality print of various sorts too often used for materials in MINELs.
Spell-checkers. For someone like me concerned with less-resourced languages, the issues with spell-checkers are different and more basic – so let me get that out of the way first. For many languages it is necessary to get a dictionary together first, and that may have complications like issues of standard orthographies and spellings, variant forms, and even dictionary resources being copyrighted.
In the context of a super-resourced language like English, Dvorak raises a very valid criticism here regarding how the wrong word correctly spelled is not caught by the checker. However, it seems to me that the problem would be appropriately addressed by a grammar-checker, which should spot words out of context.
This leads to the question of why we don’t have better grammar-checkers? I recall colleagues raving in the mid-90s about the then new WordPerfect Grammatik, but it didn’t impress me then (nevertheless, one article in 2005 found it was further along than Word’s grammar checker). The difference is more than semantic – grammar checkers rely on analysis of language, which is a different matter than checking character strings against dictionary entries (i.e., spell-checkers).
Although this is not my area of expertise, it seems that the real issue beneath all of the shortcomings Dvorak discusses is the applications of analysis of language in computing (human language technology). Thus some of the solutions could be related – algorithms for grammar checking could spot properly-spelled words out of place and also be used in OCR to analyze a sentence with an ambiguous word/character string. These may in turn relate to the quality of speech recognition. The problems in MT are more daunting but in some ways related. So, a question is, are the experts in each area approaching these with reference to the others, or as discrete and separate problems?
A final thought is that this “final frontier” – what I have sometimes referred to as “cutting edge” technologies – is particularly important for speakers of less-resourced languages in multilingual societies. MT can save costs and make people laugh in the North, but it has the potential to help save languages and make various kinds of information available to people who wouldn’t have it otherwise. Speech recognition is useful in the North, but in theory could facilitate the production of a lot of material in diverse languages that might not happen otherwise (it’s a bit more complex than that, but I’ll come back to it another time). OCR adds increments to what is available in well-resourced languages, but can make a huge difference in available materials for some less-resourced languages, for which older publications are otherwise locked away in distant libraries.
So, improvement and application of these cutting edge technologies is vitally important for people / markets not even addressed by PC Magazine. I took issue with some of what Dvorak wrote in this column but ultimately his main point is spot on in ways he might not have been thinking of.