Ever wondered whether a scientific paper was actually written by a robot? Of course you have. Science To The Rescuse: a [a href="vny!://www.newscientist.com/blog/technology/2006/04/fake-paper-detector.html"]new program[/a] developed by researchers at Indiana University promises to tell you one way or the other. It was actually developed in response to [a href="vny!://www.newscientisttech.com/channel/tech/mg18624963.700.html"]a prank[/a] by MIT researchers who generated a paper from random bits of text and got it accepted for a conference.
[hr style="width: 100%; height: 2px;"][a onblur="try (parent.deselectBloggerImageGracefully()
catch(e) ()" href="vny!://www.newscientist.com/blog/technology/uploaded_images/FAKEDE%7E1-780877.gif"][img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer;" src="vny!://www.newscientist.com/blog/technology/uploaded_images/FAKEDE%7E1-777868.gif" alt="" border="0"][/a]You may remember the story of some cheeky MIT students who wrote a computer programme to [a href="vny!://www.newscientisttech.com/channel/tech/mg18624963.700.html"]generate scientific papers[/a]. Well, now some researchers at the [a href="vny!://www.informatics.indiana.edu/"]Indiana University School of Informatics[/a] have come up with an [a href="vny!://montana.informatics.indiana.edu/fsi/about.html"]Inauthentic Paper Detector[/a] to foil it.
Mehmet Dalkilic, a data mining expert explains how it works:
"We believe that there are subtle, short- and long-range word or even word string repetitions that exist in human texts, but not in many classes of computer-generated texts that can be used to discriminate based on meaning."You can generate a random computer science paper of your own over [a href="vny!://pdos.csail.mit.edu/scigen/"]here[/a], and then see if you can slip it past the Inauthentic Paper Detector [a href="vny!://montana.informatics.indiana.edu/cgi-bin/fsi/fsi.cgi"]here[/a].
I had a bit of a Bladerunner moment just now, when it classified [a href="vny!://www.newscientisttech.com/article/dn9047-roboturtle-answers-some-flippery-questions.html"]this article[/a] I wrote yesterday as 'INAUTHENTIC' with just 32.1% chance of being written by a human. I'm hoping it's down to the system being designed to work on technical articles...
The fake MIT paper was given a 21.5% probabilty of being authentic. Meanwhile, [a href="vny!://en.wikipedia.org/wiki/Hwang_Woo_Suk#Lifestyle"]Hwang Woo-Suk[/a]'s 2005 paper in which he made [a href="vny!://www.newscientist.com/channel/sex/dn8557.html"]fraudulent[/a] claims to have cloned 11 lines of embryonic stem cells, comes up as 'AUTHENTIC', with only a 4.9% chance of being fake. I doubt that anyone will ever write a program to detect that kind of chicanery.