Protecting Young Minds with AI
Operating like a spell checker for toxic language, a new tool flags potentially harmful messages before they are sent, encouraging users to reconsider their words.
‘Serenity’, a new AI-driven chat application designed by University of Auckland researchers, can detect and mitigate cyberbullying in real time.
The prototype harnesses artificial intelligence to analyse the level of toxicity as a user types their message. If a message is flagged as potentially harmful, Serenity ‘nudges’ the person who wrote it.
Lead researcher Dr Johnny Chan from the University’s Business School says getting the alert before sending a flagged message encourages people to pause and reconsider their wording, or the need to send it at all.
“This aligns with prior research suggesting that nudges can effectively reduce unwanted behaviours,” he says.
Serenity’s toxicity detection is powered by Google’s Perspective API, which assigns a score to each typed message based on its potential for harm. If the toxicity score exceeds the user-defined threshold, the message is flagged.
As well as utilising the Google tool, Serenity enables users to create their own lists of potentially harmful words or phrases. This ensures that specific terms not recognised by Google’s Perspective API service, but deemed offensive or triggering by the user, will also be flagged.
"As a father, I know my kids will soon be stepping into online spaces and we need tools that give young people and families a safer experience,” says Chan.
“For parents, Serenity offers a ‘guardian account’ feature, allowing them to monitor overall toxicity scores without accessing message content. This respects young people’s privacy while empowering early intervention when needed.”
Adaptable to various online platforms, Serenity could be used for social media, online gaming, chat applications, comment sections and forums.
“Cyberbullying is a huge issue for young people and their parents, and despite our awareness of the harm it causes, we don’t have many tools to combat it,” says Chan.
He says traditional approaches, such as lexicon-based detection systems, rely on pre-defined lists of harmful words and phrases, and while these systems can flag some instances of cyberbullying, they often fail to capture the nuanced and evolving nature of language used in online interactions.
“They also lack the ability to provide timely interventions that could prevent the harm from occurring in the first place. This is where we hope Serenity can make a difference.”
Developed with funding from Netsafe New Zealand, Serenity’s source code is available as open-source software on GitHub, allowing other researchers and developers to contribute to its future. Read the full paper.