Don’t Be Blind to Model Blindspots and Other Lessons on AI Red-Teaming

In 2023, discussions of Ethics in Machine Learning went mainstream. “Responsible AI” became one of the hottest buzzwords in tech and debates about how to rein in the risks of AI without quashing innovation led to a proliferation of AI Ethics, Governance and Risk Frameworks. There are substantial gaps, however, between ethical principles, the governance frameworks intended to implement them, and the practical guidance needed to achieve these aims. At IQT, a global investment platform that fosters emerging technologies with applications to national security, we view this as a missing translation layer between policy and practice.

One important step is to ensure that the teams who develop, deploy, and use AI/ML tools are clear-eyed about their limitations and aware of potential risks. Red-Teaming, stress-testing tools prior to deployment, is commonly viewed as a means of uncovering security vulnerabilities. However, by broadening the purview of AI Red Teams, we can help organizations identify a much wider variety of concerns and respond to their ethical implications.

In 2021, I stood up a multi-disciplinary AI Red Team within IQT, to help our partners across the U.S. Intelligence and National Security Community assess the risks of deploying open source AI/ML tools in high-stakes situations. We examined different types of tools — a deepfake detection tool called FakeFinder, a pretrained Large Language Model (LLM) called RoBERTa, and SkyScan, which collects and automatically labels images of aircraft — using quantitative and qualitative methods to assess each tool from 4 perspectives: ethics, bias, security, and the user experience.

In this talk, I will share 5 important lessons that we learned along the way. These lessons aren’t exhaustive, but I’ve selected them because they are grounded in tangible examples from our work and because they highlight critical aspects of Red Teaming that are often overlooked by ML practitioners. For example, whereas many discussions about staffing Red Teams concentrate on technical skills, I will discuss the equally important need to cultivate an adversarial mindset and avoid “Model Groupthink.” While many people use the term “AI security” to refer (only) to adversarial attacks on models, I’ll give examples of vulnerabilities we’ve found that populate the seams between AI models and their supporting infrastructure and share tips on how to look for vulnerabilities across the ML stack. And to address an extremely common and perennially ignored problem — the way that people using AI tools are often unaware of their limitations — I’ll discuss the steps we take to ensure that we are not blind to model blind spots.

While this topic may seem, at first glance, most directly relevant to others charged with Red-Teaming AI, I believe our lessons can benefit a much broader community. Learning to think like a Red Team can help any ML practitioner uncover hidden problems to gain a better understanding of the tools they touch.

Session Summary

Don’t Be Blind to Model Blindspots and Other Lessons on  AI Red-Teaming

Andrea Brennen

Code of Conduct

Refund Policy

Press Inquiries

Don't miss a thing!