ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation

Official code for the ACL 2026 Main Track paper:

ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation
Hyeong Kyu Choi and Sharon Li
arXiv:2601.02535

Overview

ModeX is an evaluator-free framework for selecting the best output from a set of N independently sampled LLM responses. Instead of relying on a reward model or external judge, ModeX builds a semantic similarity graph over the candidates and identifies the modal output — the centroid of the dominant cluster — through recursive spectral graph partitioning.

ModeX-Lite is an efficient variant that integrates the same pruning logic directly into the token-by-token decoding loop, eliminating the need to generate all N responses to completion before selection.

Both methods are entirely evaluator-free, requiring no auxiliary model or additional inference beyond the N forward passes used to generate the candidates.

Repository Structure

ModeX/
├── modex/              # ModeX: post-hoc selection via spectral graph clustering
│   ├── main.py         # Entry point and core algorithm
│   ├── utils.py        # Batched generation engine
│   ├── evaluator.py    # Task-specific answer extraction and scoring
│   ├── prompts.py      # Prompt templates
│   ├── dashboard.py    # Logging and result visualization
│   ├── model/          # Model wrappers (Qwen, Llama, CodeLlama)
│   └── data/           # Dataset loaders (CNN/DM, HumanEval, MATH-500, …)
│
├── modex-lite/         # ModeX-Lite: online pruning during decoding
│   ├── main.py         # Entry point (adds --new_decode, --prune_frequency)
│   ├── utils.py        # Generation engine with ModeX-Lite hook
│   ├── model/
│   │   └── ma_decoder.py   # Online similarity-based batch pruning
│   └── ...             # (same structure as modex/)
│
├── scripts/
│   ├── run_modex.sh        # Example commands for ModeX
│   └── run_modex_lite.sh   # Example commands for ModeX-Lite
│
├── environment.yml     # Conda environment
└── README.md

Installation

git clone https://github.com/deeplearning-wisc/ModeX.git
cd ModeX
conda env create -f environment.yml
conda activate modex

For gated models (e.g., Llama), log in to HuggingFace:

huggingface-cli login

or place your access token in a file named token inside the modex/ (or modex-lite/) directory.

Quick Start

ModeX (post-hoc selection)

cd modex/

# Summarization — Qwen2.5-7B, N=8
python main.py \
    --model qwen2.5-7b \
    --num_agents 8 \
    --data cnn_daily \
    --data_size 300 \
    --tau 0.8 \
    --adjacency text \
    --goodness_of_cut conductance

# Math reasoning — Llama3.1-8B, N=8
python main.py \
    --model llama3.1-8b \
    --num_agents 8 \
    --data math500 \
    --data_size 300 \
    --tau 0.8 \
    --adjacency text \
    --goodness_of_cut conductance

ModeX-Lite (online pruning)

cd modex-lite/

# Code generation — Qwen2.5-7B, N=8, prune every 300 tokens
python main.py \
    --model qwen2.5-7b \
    --num_agents 8 \
    --data humaneval \
    --data_size 164 \
    --tau 0.8 \
    --adjacency text \
    --goodness_of_cut conductance \
    --new_decode \
    --prune_frequency 300

See scripts/ for more examples.

Key Arguments

Argument	Default	Description
`--model`	`qwen2.5-7b`	Model name (see supported models below)
`--num_agents`	`4`	Number of parallel samples N
`--data`	`math500`	Dataset (see supported datasets below)
`--data_size`	`300`	Number of test samples to evaluate
`--tau`	`0.8`	Early-stopping threshold (higher = more aggressive pruning)
`--goodness_of_cut`	`conductance`	Cut quality metric: `conductance`, `cutratio`, or `ngc`
`--adjacency`	`text`	Similarity type: `text` (n-gram Jaccard), `semantics` (sentence-transformers MiniLM), or `both`
`--multi_persona`	off	Assign diverse system prompts to agents (from DyLAN)
`--bae`	off	Use base answer extractor for evaluation
`--model_dir`	`None`	Local path to model weights (default: HuggingFace Hub)
`--out_dir`	`out/`	Directory for logs and plots

ModeX-Lite only:

Argument	Default	Description
`--new_decode`	off	Enable online pruning during generation
`--prune_frequency`	`100`	Token interval between pruning steps

Supported Models

Short name	HuggingFace ID
`qwen2.5-1.5b`	`Qwen/Qwen2.5-1.5B-Instruct`
`qwen2.5-7b`	`Qwen/Qwen2.5-7B-Instruct`
`qwen2.5-14b`	`Qwen/Qwen2.5-14B-Instruct`
`qwen2.5-32b`	`Qwen/Qwen2.5-32B-Instruct`
`llama3.2-1b`	`meta-llama/Llama-3.2-1B-Instruct`
`llama3.2-3b`	`meta-llama/Llama-3.2-3B-Instruct`
`llama3.1-8b`	`meta-llama/Meta-Llama-3.1-8B-Instruct`
`llama3.3-70b`	`meta-llama/Llama-3.3-70B-Instruct`
`llama2-7b-chat`	`meta-llama/Llama-2-7b-chat-hf`
`llama2-13b-chat`	`meta-llama/Llama-2-13b-chat-hf`
`llama2-70b-chat`	`meta-llama/Llama-2-70b-chat-hf`
`codellama`	`meta-llama/CodeLlama-7b-Instruct-hf`

Supported Datasets

Category	Dataset key
Math reasoning	`math500`, `gsm8k`, `arithmetics`
Multiple choice	`gpqa`
Summarization	`cnn_daily`
Code generation	`humaneval`

Citation

@inproceedings{choi2026modex,
  title     = {ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation},
  author    = {Choi, Hyeong Kyu and Li, Sharon},
  booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},
  year      = {2026},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation

Overview

Repository Structure

Installation

Quick Start

ModeX (post-hoc selection)

ModeX-Lite (online pruning)

Key Arguments

Supported Models

Supported Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
modex-lite		modex-lite
modex		modex
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Folders and files

Latest commit

History

Repository files navigation

ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation

Overview

Repository Structure

Installation

Quick Start

ModeX (post-hoc selection)

ModeX-Lite (online pruning)

Key Arguments

Supported Models

Supported Datasets

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages