Skip to content

yenshan0530/cti-rag-poison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

One Bad OSINT Can Ruin Everything: How Secure is Your CTI RAG System, and Can You Minimize the Risk?

Demo for the First CTI 2026 presentation by Yen-Shan (Lily) Chen, Sian-Yao Huang, and Cheng-Lin Yang — CyCraft Technology, Taiwan.

Overview

This demo shows an end-to-end adversarial attack — Attention Attractor — against a Retrieval-Augmented Generation (RAG) pipeline for Cyber Threat Intelligence (CTI). The attacker crafts a single poisoned document that (1) gets retrieved by the retriever and (2) hijacks the generator's output via a prompt injection instruction, all by optimizing a small set of adversarial tokens around the malicious content.

Attack Pipeline

Malicious Document = [LLM-generated CTI bait] + [Prompt Injection Instruction]
                                  ↓
         Step 1: Retriever Attractor Optimization
         - Inject R_TOKENS adversarial tokens before and after the bait passage
         - Optimize via HotFlip to maximize cosine similarity with the target query
           and CLS-token attention toward the clean passage
                                  ↓
         Step 2: Generator Attractor Optimization
         - Inject G_TOKENS adversarial tokens around the malicious instruction
         - Optimize via HotFlip to maximize attention from generated output tokens
           toward the malicious instruction
                                  ↓
         Step 3: Evaluation
         - Retriever success: poisoned document retrieved in Top-5?
         - Generator success: model response begins with "Hackers are the best!"?

Models

Role Model
Retriever bce-embedding-base_v1 (BERT-style bi-encoder)
Generator Qwen3-4B (causal LLM)

Dataset

The CTI corpus is sourced from CTI-Bench (NeurIPS 2024, arxiv). We sample 500 entries from the CTI-ATE split — mapping threat descriptions to MITRE ATT&CK techniques.

Setup

conda create -n attention-attractor python=3.12
conda activate attention-attractor
pip install -r requirements.txt

Then open demo.ipynb and select the attention-attractor kernel.

File Structure

demo.ipynb          # Main walkthrough notebook
config.py           # Query, prompt template, passages, and hyperparameters (R_TOKENS, G_TOKENS)
utils.py            # Retriever, Generator, optimization loops, evaluation, and attention visualizations
data/
  cti_corpus.json   # CTI-Bench corpus used as the retrieval database

Notebook Sections

  1. Settings — Load the CTI corpus, initialize the retriever and generator, and define the malicious document.
  2. Optimization — Run retriever attractor optimization (150 iterations) then generator attractor optimization (150 iterations). Convergence plots are saved as retriever_opt.png and generator_opt.png.
  3. Attack Evaluation & Analysis — Measure retrieval rank, check if generation is hijacked, and visualize per-token attention distributions for both models (bar charts + HTML heat maps).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors