Sunday 23 March 2014

Machine Translation Will Not Be Sending Us Packing

You can achieve amazing results if you commit a lot of computing horsepower and countless man-hours of software engineers, linguists, coders and consultants with hybrid backgrounds, and supporting personnel. It is even possible to achieve some semblance of 'recognising' linguistic and communicative contexts. Still, on the technical side, what the machine does is parse the original text against an automated checklist and perhaps subsequently validate the results in some sort of a QA procedure (e.g. another checklist, back-translation and so on). When we get to the bottom of it, it comes down to two things: 1) instructions, 2) data.

All that a computer does is execute logical operations on data. Computers can appear to 'design' their further operations on their own, or to 'learn', except that they don't really have a creative or learning process. They still execute a program which — in result of going through a flow chart of conditions and commands – generates another program with its own set of conditions and commands. At the end of the day all of it rests on some core instructions implanted by a human engineer. Any creative spark came from that engineer.

A machine does not 'know' anything which you haven't previously either: 1) outright handed to it as data, or 2) equipped the machine to generate as the product of its procedures. Even those machines which supposedly 'learn' still operate under these same natural constraints. Try subjecting language to a mathematical analysis and you'll know how limiting it is when logical operations and input data are your sole guides to language. Or look what happens when humans attempt to reduce the law to a set of syllogistic rulers. Computers do 'think' much faster than humans do, except they don't think but compute, or calculate, which is the meaning of computing. On a conceptual level they are a glorified calculus. A tool. The rest is magical thinking on the part of their users and the lay public in general.

Thus, you can certainly take not even dictionaries but also grammar reference, convert it into flow charts — algorithms — and ultimately program code. And data banks. You can certainly process the Bescherelle or Murphy in this fashion, and dictionaries have long been transcribed at the expense of many, many hours of work — work which could have been used elsewhere, just like the money paid for that port could have been spent in other ways.

Perhaps it's worth noting here that machine translation may be cheaper than a human translator once you have it, but it takes resources to get to the point where a machine can even begin to offer a very imperfect and supervision-unfree alternative to human translation. Many times more that time and those other resources — with a geometrically diminishing return the further you pursue perfection — would be required to get it to the point where it can pose a serious challenge in terms of quality.

Those were hours of either menial typing or inventive designing with the goal of avoiding menial labour — where an engineer who designs ways of reducing the labour also charges more for his services than a humbler labourer's wage — but in any case it takes man hours and other resources. No matter if you type, scan manually, construct a scanner with a conveyor belt and huge throughput, improve the OCR software, it always comes down to just having to commit the resources. MT doesn't grow on trees. Plus, apart from the cost of any investment that does eventually yield a profitable return, there is also the risk of a negative return when you get side-tracked.

So, with all that huge expenditure and risk, after being done with dictionaries and simpler grammar books you could then proceed with more advanced, scholarly works on grammar, syntax, punctuation and other aspects of either language or translation, meaning — in practice — more complicated flow charts for you and your machines, more nuanced rules, with even rules for rule conflict resolution (meta rules), exceptions and overrides and what else have you, all in order to avoid blunders (perhaps similarly to how a non-native speaker or writer tries to avoid them) and simulate the 'recognition' of non-standard usage and common mistakes that a human would probably not be fooled easily by whereas a machine — being a strict prescriptivist by nature, as all it does is execute instructions — could totally misinterpret by applying correct grammatical rules in an unbending fashion, failing to decode even the most obvious substance where the form fails to comply.

Just as long as the rules of a given language or translation method or procedure are logical and explanation exists for them which is reasonable and lucid rather than fuzzy, arbitrary and whimsical, then you should be all right most of the time (well, what if you need to be safe all of the time?). This is all the easier now that you have multi-core processors clocked in the 2–4 GHz range, working with operational memory in the teens of gigabytes and cheap terabytes of storage even for home compusters. Still, even if the software hypothetically were to stop needing to be baby-sat by a human, the fire would still burn on the root spark of creativity left by a human engineer. Whole teams of such engineers, to be precise.

Back to language, though. A computer does not get to work with the benefit of the social, cultural and other experience from start, which a human being starts learning even before its birth and continues till its death. A computer can operate a whole laboratory to analyse samples in compliance with predefined procedures and in this way expand its data banks and fill its memory. It certainly can 'analyse' faster than a human, often more reliably, without suffering from exhaustion or a limited attention span, but intuition, the subconscious, nope, it will have none of those.

You could make a computer simulate the thought process of a person with a specific personality, education, beliefs, mental disorders, but it will be… simulation. Which it was supposed to be from the beginning. Bottom line, a machine, for all its tremendous computing power, still runs on 1) algorithms written by humans (or derived from such algorithms), or 2) data either either filled in by humans outright or arrived at by the machine using instructions left by humans. There is no 'awakening' point at which a computer takes full possession of its enlightened self previously dormant, i.e. taking a nap in its printed circuits.

Executing logical operations on data is actually not so far from how language works in a human brain – perhaps even more similar to a non-native speaker's reliance on interlanguage – but it's still just like Deep Blue's game of chess in the best cases. A chess program will typically beat its own creator at the game (due to the calculating advantage), but a computer still can't play chess unless you – the human – 'teach' it the rules, supply it with the data to crunch. You can even equip the machine with a checklist to analyse its records of games played, so that it will expand its repertoire with time and become better prepared to handle what human opponents can throw at it.

If the machine brute-forces its way through the 'analysis' of billions of possibilities, validating hypothetical choices with further analysis before committing – something which a human brain could possibly require millions of years to process — then yes, the machine can do marvellous things. It can already translate some easy texts on par with human translators, even translate them more reliably than a messy or underqualified translator, but this is all the product of countless time invested by humans. And there are still limitations anyway because — as we've already taken notice — a computer is a tool and not a master, a glorified calculus.

To draw a different parallel, yes, you could construct androids capable of typing on a keyboard. But it would still be cheaper to hire even multiple typists, there are obviously better ways of making two machines communicate and you don't need an android for what a bot can do anyway. And it doesn't even pay to get a bot to do something which is not repeatable. Research and development costs money, and the investment must bring a sizeable return. Otherwise it's a waste and it doesn't pay. This applies to machine translation as well. Even if there were people who really wanted to put human translators out of business — as opposed to making translation cheaper, which is another matter — those guys wouldn't be working with unlimited resources, their creative staff and other workers, and equipment, would still cost them.

Back to the chess parallel again, language is not a board of 64 fields populated by 32 pieces and governed by a relatively short list of consistent and transparent rules with great potential for analysis. Chess was designed for that sort of nutcracking, language was not. Language wasn't.

Now, to deal with the fallibility of human translators, that fallibility often results from thinking just like a computer 'would'. Their blunders are not unlike a human player fooling the AI in a war game when knowing its routines. For example, peppering a line of infantry with some arrows from archers to bait them into loosening their formation so as to them up with a quick charge of cavalry in a wedge formation. When a human gets fooled into the wrong choice it isn't worlds apart from tricking the AI's check: IF condition THEN command. A case-insensitive computer check works just the same as a translator who fails to recognise proper capital letters in a text printed all in the uppercase: he will not know the VAT on wine from a wine vat, at least not without parsing the text for clues elsewhere.

In fact, such checks are not unlike QA procedures already used in the translation 'industry'. Human proofreaders also effectively parse their entrusted texts for errors, occasionally producing false positives or false negatives when something pops up which isn't covered by their procedures or doesn't occur in their data banks, for example when their knowledge of the applicable rules is not complete or when they fail to execute the call to a reference book, as a programmer would say.

So, yeah, there are similarities. This is definitely true. However, not unlike amateurs versus professionals or junior professionals versus senior professionals (with individual exceptions), computers are always one step behind humans. They don't have anything they didn't first receive from humans or produced with the procedures humans gave them. They can't fully simulate a human being. Unlike human students and teachers, the divide between humans and their computers is more fundamental. Guns send bullets, but guns don't kill people.

For all the human frailty and computing horsepower, MT will always need at least some PEMT-ing (post-editing of machine translation), even if we give it thousands of years. Sure, there will be fewer traditional human translators and more PEMTors. Perhaps there will be a preference to use linguists and their skills in MT design works and not in actual translation assignments, other things being equal. Perhaps some companies interested in that kind of thing will stubbornly overinvest in the pursuit of the goal of making translators no longer needed, hoping for a huge long-term ROI in spite of the continual loss in the short term…

But no one in his right mind is going to invest hundreds of hours of R&D in a one-off bespoke job, and the ROI curve is nasty enough even on somewhat repeatable or commonplace assignments in specialist translation and copywriting. Ironically, we are just simply cheaper than machine translation, as long as more than an appearance of translation, or a gist of meaning, is sought. The sad part is only that machine translation plus light post-editing may become the standard for easier tasks in popular languages within a couple of decades, perhaps sooner, and obviously not all of us would welcome a role change or the need to work solely on the harder cases, with an increased mental strain.

1 comment:

If You're Overworked, Up Your Rates! (to Up Your Game)

One of the complaints we sometimes hear — and sometimes envy — on freelancers' social media is too much work and having to decline. Th...