More regulation could make the job ofdetecting whether academic writing has been generated byartificial intelligence easier, amid concerns that tools created for this purpose are suffering from low accuracy rates and inbuilt biases.
Universities worldwide have embraced the use of AIdetectors tocombat the rising concern that the likes ofChatGPT and its successor GPT-4 can help students cheat onassignments, although many remain wary as anincreasing body ofevidence shows that they struggle inreal-world scenarios.
In a paper , researchers based across European universities concluded that “the available detection tools are neither accurate nor reliable and have amain bias towards classifying the output as human-written rather than detecting AI-generated text”. This followed another paper that showed that students whose second language was English were being disproportionately penalised because their vocabularies were more limited than native English speakers’.
A third from academics at the University of Maryland confirmed inaccuracy concerns and found that detectors could be easily outwitted by students using paraphrasing tools to rewrite text initially generated by large language models (LLMs).
ߣߣƵ
Campus collection: AI transformers like ChatGPT are here, so what next?
One of that study’s authors, Soheil Feizi, assistant professor of computer science, said the flaws in the tools had already had a “real-world impact”, with many cases of students suffering “trauma” after being falsely accused ofmisconduct.
“The issue is that the ‘AI detection camp’ is quite powerful and is successful in muddying the water: they often evaluate their detection accuracy under unrealistic or very specific scenarios and don’t report the full spectrum of false positive and detection rates,” he added.
ߣߣƵ
One of the detectors Dr Feizi tested was the model created by OpenAI, the company behind ChatGPT, which was recently shelved in a move that many viewed as evidence that detection could not be done.
Turnitin – whose detector generally scored higher than most in the studies but did not prove infallible – recently revealed that its tool has already been used 65million times.
Annie Chechitelli, the company’s chief product officer, said the product was helping maintain “fairness and consistency in classrooms” but was also still “evolving” and the next step was to help educators better understand the numbers the detector produces and what this might indicate.
Swansea University was notyet using Turnitin, according to Michael Draper, a professor of legal education who also serves as the university’s academic integrity director.
He said he had “mixed feelings” about detection. “If you use a detection tool as a primary means of evidence when accusing a student of committing misconduct, then you are on a hiding to nothing,” he said.
“But Ithink using it as a first step is legitimate. You can then have an exploratory conversation with a student in relation to their submission. Some may volunteer they have used AI, or it will become clear they can’t adequately explain how they have arrived at their answer.”
Professor Draper said universities should consider asking students to submit a “research trail” alongside their final draft to show their workings out, which could form part of the assessment.
ߣߣƵ
“These things can also be fabricated, but it is still a useful extra step in detection,” he said. “Anyway, it would be beneficial for students to develop this skill.”
ߣߣƵ
AI detection was not going to go away, however, according to Professor Draper, who pointed to a recent voluntary commitment made in the US by many of the major companies creating LLMs to develop “robust technical mechanisms to ensure that users know when content is AI generated, such as a watermarking system”.
This, he said, would likely be followed by regulation if adequate detection methods were not produced voluntarily, in a “turning of the tide” against companies that “have a vested commercial interest in not having detection”.
“There is increasing recognition that we need to have the ability to differentiate between AI- and human-written text for a number of ethical and legal reasons. It is in everyone’s interest long term to know if something is AI generated or not,” Professor Draper said.
“Some people say detection will never keep up. That’s true when it’s an independent company trying to second-guess what will happen next, but when you have a commitment from the AI companies themselves to create a means of detection, you are on a much stronger wicket.”
Savvy and determined students will find ways around watermarking, but another issue was the blurring of the lines between AI and human writing as chatbots become embedded into everyday programs, according to Mike Sharples, emeritus professor at the Open University's Institute of Educational Technology.
For example, “Copilot” – Microsoft’s soon-to-launch AI assistant – promises to be able to “shorten, rewrite or give feedback” on a user’s written work.
“Rather than generating an entire essay with AI, students will just press the ‘continue’ button or equivalent when they get stuck,” said Professor Sharples.
“Or use it to rewrite a section, or to suggest references. AIwill become part of the workflow. It will become increasingly difficult for AIdetectors to call out these ‘AI-assisted’ student assignments.”
ߣߣƵ
Register to continue
Why register?
- Registration is free and only takes a moment
- Once registered, you can read 3 articles a month
- Sign up for our newsletter
Subscribe
Or subscribe for unlimited access to:
- Unlimited access to news, views, insights & reviews
- Digital editions
- Digital access to ձᷡ’s university and college rankings analysis
Already registered or a current subscriber?








