One of the biggest challenges facing online platforms today – and especially those with UGC (user-generated content) – is detecting harmful content and malicious behavior. Platform abuse poses brand and legal risks, harms the user experience, leads to decreased user trust, and oftentimes represents a blurred line between online and offline harm.
One of the reasons harmful content detection is so challenging is that it is a multimodal problem. Items can be in any number of formats (video, text, image, and audio), any language, and violative in any number of ways. For example, white supremacy can be embedded in a first-person shooter game that enables the user to reenact the mass shootings from the shooter’s perspective, with a benign pop music soundtrack, harmful text overlaid on the images, and an innocuous title. Furthermore, content cannot be analyzed in isolation. Analyzing a single media type alone may misinform the overall assessment of risk, whether by incorrectly flagging benign material or by missing harmful content. In the shooting game example above, the soundtrack or game title taken in isolation may each individually be benign, but when combined, may be indicative of extremism or racism. So how can online platforms tackle abuse in a world where bad actors are continuously changing their tactics and developing new ways to avoid detection?
In this talk, I will discuss how ActiveFence’s AI utilizes language models, embeddings, object detection, unsupervised learning, and hashing algorithms to build a contextual understanding that generates the most informative risk scores at scale – regardless of content format or type of harm. I will discuss different levels of contextual understanding, from local context investigated with NLP or computer vision to that which crosses modalities. I will share concrete examples where contextual understanding is key for accurately determining risk and how our data model enables AI at scale. Finally, I will discuss harmful content detection as an adversarial challenge and share how we ensure our algorithms are aware of the most recent and updated information.