Event Round-Up: Red Teaming Techniques for Responsible AI
On the 16 July a range of cybersecurity specialists and digital ethicists gathered to have a deep dive discussion on the role of red teaming as a key technique in supporting the development of responsible AI.
This webinar was part of our Decoding Digital Ethics Webinar Series and in collaboration with the techUK cybersecurity programme. This dialogue explored the multifaceted role of red teaming in AI development and security, definition and types of red teaming, its function as an assurance mechanism, and its practical applications across industries.
Our expert panelist's insights allowed audience members to walk away with a clear understanding of the best practices in red teaming for AI, real-world applications, and the next steps for integrating these practices into their organisation.
The webinar heard from speakers including:
-
Tessa Darbyshire, Responsible AI Manager, Accenture
-
Steve Eyre, Cyber Technical Director, Alchemmy
-
Nicolas Krassas, Red Team Member, Synack Red Team
-
Chris Jefferson, CTO and Co-Founder, Advai
The discussion delved into common challenges encountered during red teaming exercises, examined recommendations from The Interim International Report of Safety of Advanced AI Systems and considered how AI has altered the landscape for cyber security, making attacks easier for adversaries. The session also addressed the resources available for organisations looking to implement or enhance their red teaming efforts. The conversation concluded by considering how to effectively translate these insights into organisational practices, ensuring that key stakeholders are engaged in the process of improving AI safety and security through red teaming. The recording from the session can be found below alongside timestamped takeaways:
~5:48 Getting on the Same Page: Can you explain what red teaming is and how it supports the responsible development of emerging technologies, particularly AI?
Red teaming in AI serves multiple purposes, extending beyond its traditional cybersecurity roots. It's used to identify acute failure modes within AI models and explore vulnerabilities in the broader AI system pipeline. The scope has expanded to include ethical considerations, fairness, bias, and output quality. This evolution presents challenges in satisfying regulatory requirements and implementing effective controls. The field now encompasses both "white box" approaches, examining the holistic AI model and infrastructure, and "black box" methods focusing on social engineering and ethical aspects. This dual nature creates distinct threat models that need to be integrated for comprehensive assessment.
~10:02 Types of Red Teaming: What are the different types of red teaming, and can you provide examples of scenarios where each type would be most effective?
Red teaming encompasses various approaches, with network and application testing being the most common, especially for large organisations. Social engineering is typically integrated into broader red team operations, starting with external network reconnaissance and progressing to targeted tactics using gathered information. This may include email phishing and link-based attacks to assess employee vulnerability. While rare, some organisations conduct highly specific red teaming operations focused on critical infrastructure or particular business units, such as AI technologies for companies heavily reliant on them. The overall goal is to identify and exploit potential security weaknesses across different aspects of an organisation's infrastructure and human elements.
~12:46 Red Teaming as a Form of AI Assurance: How does red teaming serve as a form of assurance in the context of AI development and deployment?
By actively probing for weaknesses, red teaming offers a higher level of confidence in a system's resilience against potential threats, thus serving as a valuable assurance mechanism. However, there's currently a lack of clarity and consensus regarding the scope and definition of AI red teaming. While traditionally focused on security, safety, and privacy vulnerabilities, regulatory requirements now include broader adversarial testing. This has led to confusion between red teaming and other forms of adversarial testing, such as evaluating chatbot outputs for toxicity. The lack of specific guidance in regulations like the EU AI Act, NIST framework, and UK framework compounds this issue. Without established best practices for AI red teaming, organisations, especially large tech companies, are developing their own approaches. This situation poses challenges for stakeholders across the spectrum, from major AI developers to end-users, in terms of implementing and assuring AI systems effectively.
~17:25 Reocurring Issues to Tackle: What are the key reoccurring issues you encounter during red teaming exercises, and how can organisations proactively address and avoid these pitfalls?
Red teaming AI systems encompasses a range of issues, from data privacy and leakage to operational costs and denial of service risks. The approach to testing should be tailored to the specific use case and business context of the AI system, with a clear understanding of its intended purpose and acceptable use boundaries. This allows for more targeted testing strategies. There's a distinction between general adversarial testing, which focuses on manipulating data to test AI systems, and red teaming, which simulates real-world attacks. The choice between these approaches depends on whether the focus is on potential harms, system performance, or security vulnerabilities. Effective red teaming requires a comprehensive understanding of the risks associated with deploying AI models in a specific business context, allowing for a more strategic and relevant testing approach.
~20:47 Red Teaming in the Interim International Report on Safety of Advanced AI Systems: In the interim international report on the safety of advanced AI systems there is a section on red teaming, what are your thoughts on this area and how can these recommendations be practically implemented?
Red teaming and AI evaluation are important but limited approaches within a broader AI assessment toolkit, particularly in detecting downstream harms and unknown issues post-deployment. The proprietary nature of AI models complicates ongoing evaluation, prompting discussions about solutions like digital twins. Organisations are encouraged to leverage existing security, privacy, and ethical frameworks when assessing AI systems. The distinction between evaluating general-purpose and specific-use AI systems is emphasised, requiring different strategies. Critically, organisations should assess whether AI is truly necessary and if it will effectively solve their key problems before implementation. The report highlights the challenges in achieving transparency and understanding of AI systems, noting that even within the academic community, this represents the cutting edge of what's possible with current AI technology. This underscores the complexity and evolving nature of AI evaluation and implementation strategies.
~26:51 AI used by adversaries: How has AI changed the landscape for our adversaries, enabling them to attack with greater ease and a broader surface area?
AI is revolutionising attack strategies for cybercriminals and penetration testers alike. It enables rapid code development, automation of complex tasks, and creation of more convincing phishing campaigns by overcoming language barriers. AI is being used to develop malware evasion techniques, including on-the-fly code regeneration to avoid detection. The technology also facilitates the creation of highly convincing deepfakes, which have already been used in high-profile scams. Additionally, AI can assist in recreating zero-day exploits by analysing existing reports and data. While these advancements make the job easier for attackers and security professionals testing systems, they pose significant challenges for defensive cybersecurity measures.
~31:20 Current Applications in Industry: Can you share some examples of how red teaming is currently being applied in various industries to enhance AI safety and security?
Organisations are grappling with the cost-benefit analysis of implementing AI red teaming and related security measures. The decision to invest in internal teams or external providers is complicated by a lack of clear guidance on best practices. High-risk sectors, such as critical infrastructure, are more likely to adopt red teaming as an extension of existing cybersecurity practices. However, the process of identifying and mitigating risks through red teaming can lead to costly implementations of guardrails and restrictions. The value proposition of AI implementation must be weighed against these security costs and the potential risks. Factors to consider include the speed and efficiency gains AI can provide, the risk level of specific use cases, the cost of necessary infrastructure and data management, and the potential competitive disadvantage of not adopting AI. Ultimately, organisations must balance the risks, costs, and benefits while ensuring at minimum a consistent logging strategy for retrospective analysis.
~36:12 Resources for Industry to Start or Continue Red Teaming: What resources (e.g., tools, frameworks, guidelines) are available for organisations looking to start or enhance their red teaming efforts?
The open-source community offers various tools for testing individual AI models. However, testing larger generative or foundational models is more challenging and often requires implementing agent architectures. Resources like Hugging Face provide valuable open-source projects and datasets for developing testing strategies. Cloud providers offer on-demand compute resources, while organisations like AI Commons provide benchmarks for evaluating AI systems and potential harms. Some security-conscious users prefer using closed, locally-run systems to maintain control over their environment and prevent potential data exposure to public models. This approach allows for greater security but may limit access to some broader resources and tools available in the open-source community.
~39:05 Key takeaway/next steps: How do we take it back to organisations? How do we make sure voices are heard in the rooms which attendees occupy?
When implementing AI systems and considering red teaming, organisations should focus on generating evidence to measure risks, understand system limits, prepare for upcoming regulations, and demonstrate value. Continuous testing and validation are crucial, as vulnerabilities and attack surfaces evolve over time. It's important to involve a diverse group of stakeholders, including cybersecurity and data privacy experts, to align with existing organisational practices and specific regulatory requirements. Threat modeling is recommended as an iterative approach in the early design and adoption phases, allowing for risk assessment and potential reconsideration of AI implementation if necessary. This comprehensive strategy helps organisations make informed decisions about AI adoption, balancing business objectives with potential risks and regulatory compliance.
Tess Buckley
Tess is the Programme Manager for Digital Ethics and AI Safety at techUK.