A research framework for evaluating social intelligence in LLM agents through communication barriers.
SocialVeil is a research framework for evaluating social intelligence in LLM agents through communication barriers. The framework simulates realistic social interactions where agents must navigate various communication barriers:
- 🗣️ Semantic Barriers: Ambiguous language and unclear expressions
- 🌍 Cultural Barriers: Different communication styles and norms
- 💭 Emotional Barriers: Emotional states affecting communication clarity
# Clone the repository
git clone https://github.com/ulab-uiuc/socialveil.git
cd socialveil
# Create environment
conda create -n socialveil python=3.11
conda activate socialveil
# Install dependencies
pip install poetry
poetry install
# Install yq (for config parsing)
pip install yq # or: brew install yq (macOS)Edit configs/config.yaml:
models:
model_a: "gpt-4o-mini" # Barrier agent
model_b: "Qwen/Qwen2.5-7B-Instruct" # Partner agent
vllm_port: 7900
gpu: "0,1,2,3"
AGENT_OPENAI_API_KEY: "your-key-here"
EVALUATOR_OPENAI_API_KEY: "your-key-here"# Start vLLM server (for local models)
bash scripts/start_vllm_server.sh
# Run evaluation
bash scripts/run.sh
# With custom settings
CONCURRENCY=16 bash scripts/run.sh
PARTNER_REPAIR_MODE=true bash scripts/run.shpython results/compare_modes.py \
--base_dir results/exp_qwen2.5-7b-instruct_episode_all_neutralized \
--out_csv results/comparison.csvsocialveil/
├── configs/ # Configuration files
├── data/ # Episode datasets
├── scripts/ # Experiment runners
├── socialveil/ # Core package
│ ├── agent/ # Agent implementations
│ ├── environment/ # Scenario management
│ └── evaluate.py # Evaluation logic
├── results/ # Experiment outputs
└── analysis/ # Analysis tools
# High concurrency
CONCURRENCY=32 bash scripts/run.sh
# Chain-of-Thought prompting
PARTNER_COT_MODE=true bash scripts/run.sh
# Custom results directory
RESULTS_DIR="results/my_exp" bash scripts/run.shCONCURRENCY=32 python analysis/compare_evaluators.py \
--results_dir results/exp_qwen2.5-7b-instruct_episode_all_neutralized \
--evaluator1 gpt-4o \
--evaluator2 qwen2.5-7b-instruct \
--use_vllm_for_evaluator2 \
--output results/evaluator_comparison.csv- ✅ Multi-Barrier Evaluation: Test agents across semantic, cultural, and emotional barriers
- ✅ Flexible Model Support: OpenAI API or local models via vLLM
- ✅ High Concurrency: Parallel scenario execution for faster experiments
- ✅ Statistical Analysis: Built-in significance testing and correlation analysis
- ✅ Extensible Framework: Easy to add new barrier types or evaluation metrics
poetry run pytest
poetry run mypy --config-file pyproject.toml .# Install pre-commit hooks
pre-commit install
# Run all checks
pre-commit run --all-filesIf you use this code in your research, please cite:
@article{xuan2026socialveil,
title={SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers},
author={Xuan, Keyang and Wang, Pengda and Ye, Chongrui and Yu, Haofei and August, Tal and You, Jiaxuan},
journal={arXiv preprint arXiv:2602.05115},
year={2026}
}This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
