GosTalks: Machine Translation Will Not Be Sending Us Packing

Sunday, 23 March 2014

Machine Translation Will Not Be Sending Us Packing

You can achieve amazing results if you commit a lot of computing horsepower and countless man-hours of software engineers, linguists, coders and consultants with hybrid backgrounds, and supporting personnel. It is even possible to achieve some semblance of 'recognising' linguistic and communicative contexts. Still, on the technical side, what the machine does is parse the original text against an automated checklist and perhaps subsequently validate the results in some sort of a QA procedure (e.g. another checklist, back-translation and so on). When we get to the bottom of it, it comes down to two things: 1) instructions, 2) data.

All that a computer does is execute logical operations on data. Computers can appear to 'design' their further operations on their own, or to 'learn', except that they don't really have a creative or learning process. They still execute a program which — in result of going through a flow chart of conditions and commands – generates another program with its own set of conditions and commands. At the end of the day all of it rests on some core instructions implanted by a human engineer. Any creative spark came from that engineer.

A machine does not 'know' anything which you haven't previously either: 1) outright handed to it as data, or 2) equipped the machine to generate as the product of its procedures. Even those machines which supposedly 'learn' still operate under these same natural constraints. Try subjecting language to a mathematical analysis and you'll know how limiting it is when logical operations and input data are your sole guides to language. Or look what happens when humans attempt to reduce the law to a set of syllogistic rulers. Computers do 'think' much faster than humans do, except they don't think but compute, or calculate, which is the meaning of computing. On a conceptual level they are a glorified calculus. A tool. The rest is magical thinking on the part of their users and the lay public in general.

Thus, you can certainly take not even dictionaries but also grammar reference, convert it into flow charts — algorithms — and ultimately program code. And data banks. You can certainly process the Bescherelle or Murphy in this fashion, and dictionaries have long been transcribed at the expense of many, many hours of work — work which could have been used elsewhere, just like the money paid for that port could have been spent in other ways.

Perhaps it's worth noting here that machine translation may be cheaper than a human translator once you have it, but it takes resources to get to the point where a machine can even begin to offer a very imperfect and supervision-unfree alternative to human translation. Many times more that time and those other resources — with a geometrically diminishing return the further you pursue perfection — would be required to get it to the point where it can pose a serious challenge in terms of quality.

Those were hours of either menial typing or inventive designing with the goal of avoiding menial labour — where an engineer who designs ways of reducing the labour also charges more for his services than a humbler labourer's wage — but in any case it takes man hours and other resources. No matter if you type, scan manually, construct a scanner with a conveyor belt and huge throughput, improve the OCR software, it always comes down to just having to commit the resources. MT doesn't grow on trees. Plus, apart from the cost of any investment that does eventually yield a profitable return, there is also the risk of a negative return when you get side-tracked.

So, with all that huge expenditure and risk, after being done with dictionaries and simpler grammar books you could then proceed with more advanced, scholarly works on grammar, syntax, punctuation and other aspects of either language or translation, meaning — in practice — more complicated flow charts for you and your machines, more nuanced rules, with even rules for rule conflict resolution (meta rules), exceptions and overrides and what else have you, all in order to avoid blunders (perhaps similarly to how a non-native speaker or writer tries to avoid them) and simulate the 'recognition' of non-standard usage and common mistakes that a human would probably not be fooled easily by whereas a machine — being a strict prescriptivist by nature, as all it does is execute instructions — could totally misinterpret by applying correct grammatical rules in an unbending fashion, failing to decode even the most obvious substance where the form fails to comply.

Just as long as the rules of a given language or translation method or procedure are logical and explanation exists for them which is reasonable and lucid rather than fuzzy, arbitrary and whimsical, then you should be all right most of the time (well, what if you need to be safe all of the time?). This is all the easier now that you have multi-core processors clocked in the 2–4 GHz range, working with operational memory in the teens of gigabytes and cheap terabytes of storage even for home compusters. Still, even if the software hypothetically were to stop needing to be baby-sat by a human, the fire would still burn on the root spark of creativity left by a human engineer. Whole teams of such engineers, to be precise.

Back to language, though. A computer does not get to work with the benefit of the social, cultural and other experience from start, which a human being starts learning even before its birth and continues till its death. A computer can operate a whole laboratory to analyse samples in compliance with predefined procedures and in this way expand its data banks and fill its memory. It certainly can 'analyse' faster than a human, often more reliably, without suffering from exhaustion or a limited attention span, but intuition, the subconscious, nope, it will have none of those.

You could make a computer simulate the thought process of a person with a specific personality, education, beliefs, mental disorders, but it will be… simulation. Which it was supposed to be from the beginning. Bottom line, a machine, for all its tremendous computing power, still runs on 1) algorithms written by humans (or derived from such algorithms), or 2) data either either filled in by humans outright or arrived at by the machine using instructions left by humans. There is no 'awakening' point at which a computer takes full possession of its enlightened self previously dormant, i.e. taking a nap in its printed circuits.

Executing logical operations on data is actually not so far from how language works in a human brain – perhaps even more similar to a non-native speaker's reliance on interlanguage – but it's still just like Deep Blue's game of chess in the best cases. A chess program will typically beat its own creator at the game (due to the calculating advantage), but a computer still can't play chess unless you – the human – 'teach' it the rules, supply it with the data to crunch. You can even equip the machine with a checklist to analyse its records of games played, so that it will expand its repertoire with time and become better prepared to handle what human opponents can throw at it.

If the machine brute-forces its way through the 'analysis' of billions of possibilities, validating hypothetical choices with further analysis before committing – something which a human brain could possibly require millions of years to process — then yes, the machine can do marvellous things. It can already translate some easy texts on par with human translators, even translate them more reliably than a messy or underqualified translator, but this is all the product of countless time invested by humans. And there are still limitations anyway because — as we've already taken notice — a computer is a tool and not a master, a glorified calculus.

To draw a different parallel, yes, you could construct androids capable of typing on a keyboard. But it would still be cheaper to hire even multiple typists, there are obviously better ways of making two machines communicate and you don't need an android for what a bot can do anyway. And it doesn't even pay to get a bot to do something which is not repeatable. Research and development costs money, and the investment must bring a sizeable return. Otherwise it's a waste and it doesn't pay. This applies to machine translation as well. Even if there were people who really wanted to put human translators out of business — as opposed to making translation cheaper, which is another matter — those guys wouldn't be working with unlimited resources, their creative staff and other workers, and equipment, would still cost them.

Back to the chess parallel again, language is not a board of 64 fields populated by 32 pieces and governed by a relatively short list of consistent and transparent rules with great potential for analysis. Chess was designed for that sort of nutcracking, language was not. Language wasn't.

Now, to deal with the fallibility of human translators, that fallibility often results from thinking just like a computer 'would'. Their blunders are not unlike a human player fooling the AI in a war game when knowing its routines. For example, peppering a line of infantry with some arrows from archers to bait them into loosening their formation so as to them up with a quick charge of cavalry in a wedge formation. When a human gets fooled into the wrong choice it isn't worlds apart from tricking the AI's check: IF condition THEN command. A case-insensitive computer check works just the same as a translator who fails to recognise proper capital letters in a text printed all in the uppercase: he will not know the VAT on wine from a wine vat, at least not without parsing the text for clues elsewhere.

In fact, such checks are not unlike QA procedures already used in the translation 'industry'. Human proofreaders also effectively parse their entrusted texts for errors, occasionally producing false positives or false negatives when something pops up which isn't covered by their procedures or doesn't occur in their data banks, for example when their knowledge of the applicable rules is not complete or when they fail to execute the call to a reference book, as a programmer would say.

So, yeah, there are similarities. This is definitely true. However, not unlike amateurs versus professionals or junior professionals versus senior professionals (with individual exceptions), computers are always one step behind humans. They don't have anything they didn't first receive from humans or produced with the procedures humans gave them. They can't fully simulate a human being. Unlike human students and teachers, the divide between humans and their computers is more fundamental. Guns send bullets, but guns don't kill people.

For all the human frailty and computing horsepower, MT will always need at least some PEMT-ing (post-editing of machine translation), even if we give it thousands of years. Sure, there will be fewer traditional human translators and more PEMTors. Perhaps there will be a preference to use linguists and their skills in MT design works and not in actual translation assignments, other things being equal. Perhaps some companies interested in that kind of thing will stubbornly overinvest in the pursuit of the goal of making translators no longer needed, hoping for a huge long-term ROI in spite of the continual loss in the short term…

… But no one in his right mind is going to invest hundreds of hours of R&D in a one-off bespoke job, and the ROI curve is nasty enough even on somewhat repeatable or commonplace assignments in specialist translation and copywriting. Ironically, we are just simply cheaper than machine translation, as long as more than an appearance of translation, or a gist of meaning, is sought. The sad part is only that machine translation plus light post-editing may become the standard for easier tasks in popular languages within a couple of decades, perhaps sooner, and obviously not all of us would welcome a role change or the need to work solely on the harder cases, with an increased mental strain.

1 comment:

khanh chicanh5 June 2019 at 11:03
Why is it like that, I still don't know why the situation happened: Dịch thuật Hà Nội, Dịch thuật TPHCM, Dịch thuật Bắc Ninh, Dịch thuật Thanh Hóa, Dịch thuật Cần Thơ, Dịch thuật Hải Phòng, Dịch thuật Đà Nẵng, Dịch thuật Nghệ An, Dịch thuật Bình Dương, .........................
ReplyDelete
Replies

Add comment

Disclaimers and all that jazz

REPOSTING AND RIGHTS: Don't worry about copyrights and stuff when printing or sharing these posts for your own benefit or that of your friends. In fact, the point of them is to be read. You can also repost them whole if you think that will reach the audience better than linking to my blog here, or if you want to host a vigorous discussion on the subject among your commenters; it's all fine as long as you don't make a blog that's simply a copy of my blog. Feel free to translate the posts, too, or adapt them, just please mention the fact you did so on your own. Understand, however, that by reposting my posts in any way or form, including paper, you will not gain any sort of copyright or interest in my own copyright, let alone moral rights, other than limited protection of any significant value you actually add, where any such rights such as you may so acquire shall not restrict my own rights and freedom as the author or in any way impede the free, unpaid circulation of the material I created. Always credit me as the author.

NOT PROOFREAD OR EDITED: The posts are not always edited and proofread before posting. They are intended to represent the typical standard of blogs posts rather than printed works; I apologize for any inconvenience you may experience as a result. I simply prefer to write another post or five rather than either go back and reread old stuff or complete five rounds of editing before hitting the send button; this is more or less the point of having a blog. Still welcome to let me know if you find a typo or if anything's unclear, which may very well occasionally be the case due to the timing, such as late at night or in between urgent projects. Hence, before reposting any of this in a more serious setting than a blog or forum post, please drop me a line to give me the chance to apply some finishing touch or even ask me to write a proper article, which I'll be happy to do subject to time constraints.

NO ADVICE, NO CLIENT RELATIONSHIP: Nothing here consitutes legal, business, marketing, career or any other advice, at least not in the sense of establishing a lawyer-client or consultant-client relationship, not in the least because I didn't get a dime and because any implied contract is (hereby, herewith, etc.) expressly disclaimed. If what you find here is useful and informs your course of action, you still need to get proper legal advice appropriate to your jurisdiction and your unique factual situation, without exception. Likewise, consult such other advisors as are appropriate for proper business, marketing, career or other advice relevant to your situation. Whether or not you consult them, you agree to assume and do assume the responsibility and risk, with the consequence being, without limitation, that you can't sue me (and agree not to try) on the grounds of having relied on anything whatsoever on this blog.

NO RESPONSIBILITY FOR THIRD-PARTY SITES: Any links on this blog that don't expressly lead to other posts here should be presumed to lead to external sites that are outside my control. I am not responsible for the contents of such sites or their availability. Linking does not imply endorsement. By clicking on a link you expressly and enthusiastically declare for the benefit of any pesky government official anywhere that you did so out of the pure, unadulterated, spontaneous desire of your heart miraculously coupled with the informed and deliberate consent of your conscious, sovereign, unbroken will.

NO RESPONSIBILITY FOR COMMENTS YOU READ: If you find something illegal, obscene, hateful, etc., in the comments, let me know and, if I agree with your assessment, I'll axe the comment. You are hereby warned that I don't control the comments otherwise, they are the product of their own authors' expression. If you don't accept the risk that you might see something objectionable, don't read the comments.

YOUR RESPONSIBILITY FOR COMMENTS YOU POST: By posting a comment you agree to assume and do assume full and sole responsibility at law for posting whatever you post, even if I saw it and did not delete it, and even though you also agree that I may delete any comment at ultimately my sole discretion, with no liability to you, though I'll normally do so only if I find it to be spammy or more than just a little offensive. Further, you agree that if you subsequently decide to delete or modify your comment, you will need to do so on your own, and agree to do so on your own without involving me in the process. You agree not to post anything which is illegal, hateful, defamatory, spammy, in breach of a confidentiality obligation (or otherwise contains anything you are not free to disclose, including without limitation personal data) or infringes on anyone's legally protected rights. Advertising is forbidden.

Sunday, 23 March 2014

Machine Translation Will Not Be Sending Us Packing

1 comment:

If You're Overworked, Up Your Rates! (to Up Your Game)