Securing AI Systems (TryHackMe)

Introduction

TryTrainMe's engineering team has built TryAssist, an AI-powered code review assistant that analyses pull requests, queries internal documentation, and connects to the CI/CD pipeline. Before AI, TryTrainMe's attack surface was well understood: web application, API endpoints, database, and authentication layer. TryAssist changed all of that.

The assistant accepts natural language from developers and converts it into actions. It reads documents from shared storage and summarises them. It calls internal APIs to retrieve repository data and pipeline status. It logs every conversation, including those in which engineers paste credentials, source code, or internal architecture details into the chat window.

In 2023, Samsung engineers pasted proprietary semiconductor source code (opens in new tab)and internal meeting notes directly into ChatGPT. The code left Samsung's control entirely. No exploit was needed. No vulnerability was present. The system worked exactly as designed, and confidential data still leaked, because nobody had mapped the architectural risks before deployment.

How many new attack surfaces did TryTrainMe just introduce?

This room answers that question. As the security architect reviewing TryAssist before it goes live, your job is to map every attack surface, identify the trust boundaries, and recommend defences for each one. If you completed the AI Fundamentals module, you already understand what AI is and how adversarial attacks work at the model level. This room moves up one layer: what does TryAssist's production architecture look like, where are the trust boundaries, and which components create risks that traditional security frameworks were never designed to handle?.

Learning Objectives

By completing this room, you will be able to:

Identify the core components of a production AI system and the data flows between them
Identify the OWASP LLM Top 10 (2025) and MITRE ATLAS as the primary frameworks for AI threat classification
Explain five system-level threat categories: improper output handling, excessive agency, system prompt leakage, unbounded consumption, and sensitive information disclosure
Apply secure design patterns, including defence in depth, least privilege, and monitoring to AI system architectures

Prerequisites

Before starting this room, ensure you have:

Familiarity with AI/ML concepts: what a language model is, how it generates output. The AI fundamentals module covers this; equivalent background works too.
Basic web application security literacy: APIs, input validation, authentication, and authorisation
A working understanding of attack surfaces: what they are and why they matter

Anatomy of an AI System

From Traditional to AI-Augmented

Traditional web applications have well-understood architectures: requests flow from the UI to the API to the database and back, and security teams know exactly where to place controls. When an AI component enters, the picture changes fundamentally: new components appear, and data flows through paths that existing security controls were never designed to monitor.

Component	Traditional App	AI-Augmented App
User input	Structured forms, API parameters	Free-form natural language
Processing	Deterministic code	Probabilistic model inference
Data access	Direct database queries	Model-mediated retrieval (RAG)
Output	Template-rendered responses	Generated natural language
Dependencies	Libraries, frameworks	Libraries + pre-trained models + embeddings

The shift from structured to unstructured input is the most consequential change. A traditional input field expects a date, a number, or a selection from a dropdown. An AI system accepts any text the user chooses to type. That single change invalidates most existing input validation strategies.

The TryAssist Architecture

TryTrainMe's TryAssist system has nine components. Each one processes data differently, and each creates a potential point of failure.

The user sees a chat box at the front door. The security architect sees the foundation, the wiring, and everything behind.

Component	Function
User Interface	Developer-facing chat widget embedded in the code review platform
API Gateway	Authentication, rate limiting, request routing
Orchestration Layer	Manages conversation state, routes requests, coordinates components
Prompt Construction	Combines the system prompt, user query, and retrieved context into the final prompt sent to the model
LLM	The language model (hosted internally or accessed via API) that generates responses
Tool Layer	Functions the LLM can invoke: database queries, documentation search, CI/CD status checks
Output Processing	Response formatting, content filtering, length enforcement
Logging and Monitoring	Conversation storage, usage analytics, audit trail
Vector Store	Embedded representations of internal documentation for retrieval-augmented generation (RAG)

Trust Boundaries

A trust boundary is where data moves from one security context to another, and every one is a potential attack surface. TryAssist has five:

Boundary	Data Crossing
User-to-system	Untrusted natural language enters the system
System-to-LLM	Constructed prompt (system instructions + user input + context) sent to the model
LLM-to-tools	Model output triggers database queries, API calls, or file operations
System-to-external-data	Retrieved documents from vector store or external sources enter the prompt
System-to-user	Generated response delivered to the user

Data Flow: A Single Request

Let us trace a single request through TryAssist to see every boundary in action:

A developer types: "Does this pull request handle authentication correctly?"
The API gateway authenticates the request and applies rate limits
The orchestration layer retrieves conversation history and routes the request
The prompt construction layer combines the system prompt ("You are a secure code review assistant..."), the user's question, and relevant documentation retrieved from the vector store
The assembled prompt is sent to the LLM, which generates a response
The LLM's response may include a request to invoke a tool (e.g., "fetch the latest CI pipeline status for this PR")
The tool layer executes the action and returns the result to the LLM
The LLM generates a final response incorporating the tool result
Output processing applies content filters and formats the response
The response is delivered to the developer and the entire exchange is written to the logging system

Every numbered step crosses at least one trust boundary. The question is: which boundaries have security controls, and which are unprotected?

Answer the questions below

What layer in an AI system is responsible for combining the system prompt, user input, and retrieved context before sending it to the model? Prompt Construction

In the TryAssist architecture, what boundary does LLM output cross when it triggers a database query? LLM-to-tools

The AI Attack Surface

You have mapped TryAssist from the inside. An attacker looking at the same diagram sees something different: entry points, weak boundaries, and paths to data. Three frameworks exist to name what they see and provide defenders with a shared language for responding.

OWASP LLM Top 10 (2025)

The OWASP LLM Top 10 (2025) classifies the ten most critical vulnerabilities in LLM applications. Not all ten are equally relevant to a pre-deployment architecture review. Five of the ten operate at the system architecture level: they emerge from how an AI system is built and integrated, not from the model's internal behaviour. Those five are the focus of this room. The remaining five require dedicated treatment and appear in later modules.

Risk	Category	Description	Covered In
LLM01	Prompt Injection	Manipulating LLM behaviour through crafted inputs	Prompt Security Module
LLM02	Sensitive Information Disclosure	Leaking confidential data, PII, or system details through responses	This room + Data Poisoning Module
LLM03	Supply Chain	Compromised pre-trained models, datasets, and third-party dependencies introduced before deployment	AI Supply Chain Security Module
LLM04	Data and Model Poisoning	Corrupting training data or model weights to alter behaviour	Data Poisoning Module
LLM05	Improper Output Handling	LLM output is causing injection in the downstream systems	This room
LLM06	Excessive Agency	AI components with more privilege or autonomy than necessary	This room
LLM07	System Prompt Leakage	Exposure of system-level instructions and internal configuration	This room
LLM08	Vector and Embedding Weaknesses	Exploiting retrieval mechanisms and embedding pipelines	Data Poisoning Module
LLM09	Misinformation	LLM generating false or misleading content	LLM Security Room in this module
LLM10	Unbounded Consumption	Resource exhaustion, cost explosion, denial of service	This room

The five categories marked This room all trace back to architectural decisions made when TryAssist was designed. That is exactly what a pre-deployment security review examines.

MITRE ATLAS

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) is a knowledge base of adversary tactics, techniques, and case studies for AI systems, structured as a counterpart to MITRE ATT&CK. OWASP classifies what the vulnerabilities are. ATLAS documents how adversaries exploit them.

ATLAS follows the adversary's progression through a target. An attacker begins with reconnaissance, learning what model the system uses and how it is exposed. They gain initial access by compromising a supply chain component or exploiting an input vector. They achieve execution through techniques like prompt injection, adversarial inputs, or model tampering. Where persistence is needed, they implant backdoors in model weights. The end goal is impact: data exfiltration, service disruption, or silent manipulation of model outputs. For TryAssist, the most relevant part of this arc runs from Execution through Impact, tracing how an attacker who reaches the chat interface can move through the system and cause real damage.

ATLAS covers over 50 techniques across more than a dozen tactics, each with real-world case studies, and is updated as new attack patterns emerge.

NIST AI Risk Management Framework

The NIST AI RMF approaches the problem from an organisational perspective. Its four functions describe how an organisation manages AI risk systematically: Govern (setting policies and accountability structures), Map (identifying AI systems and their risk contexts), Measure (assessing and monitoring risk levels), and Manage (responding to and mitigating identified risks). Where OWASP names the vulnerabilities, and ATLAS describes how adversaries exploit them, the NIST AI RMF asks whether the organisation has a repeatable process for addressing them. Its companion, NIST AI 100-2 (published January 2025), provides a technical catalogue of adversarial ML techniques and mitigations across the full model lifecycle.

The Threat Modelling room (later in this module) delves into how the NIST AI RMF integrates with STRIDE and PASTA for structured AI risk governance.

Answer the questions below

Which OWASP LLM Top 10 (2025) category covers the risk of LLM output being used to execute SQL injection against a backend database? LLM05

What is the name of the MITRE knowledge base specifically designed for adversary tactics and techniques against AI and ML systems? ATLAS

Securing AI Systems (TryHackMe)

Introduction

Learning Objectives

Prerequisites

Anatomy of an AI System

From Traditional to AI-Augmented

The TryAssist Architecture

Trust Boundaries

Data Flow: A Single Request

Answer the questions below

The AI Attack Surface

OWASP LLM Top 10 (2025)

MITRE ATLAS

NIST AI Risk Management Framework

Answer the questions below

System-Level Threats

Secure Design Patterns

Auditing TryAssist: A Conversation with the System

Conclusion

Comments

More from this blog

Sensitive Information Disclosure (TryHackMe)

Understanding AI Supply Chains (TryHackMe)

Jailbreaking (TryHackm)

RAG Security Fundamentals (TryHackMe)

Prompt Injection (TryHackMe)

Command Palette

Introduction

Learning Objectives

Prerequisites

Anatomy of an AI System

From Traditional to AI-Augmented

The TryAssist Architecture

Trust Boundaries

Data Flow: A Single Request

Answer the questions below

The AI Attack Surface

OWASP LLM Top 10 (2025)

MITRE ATLAS

NIST AI Risk Management Framework

Answer the questions below

System-Level Threats

Secure Design Patterns

Auditing TryAssist: A Conversation with the System

Conclusion

Comments

More from this blog