Skip to content

Leotaby/up-ml-prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting the Lifecycle of Unobserved Performance

Hedge funds generate returns through activities invisible in 13F filings: derivatives, short selling, confidential trades, and active timing. Agarwal, Ruenzi & Weigert (2024, Journal of Finance) call the gap between reported and holdings-implied returns "Unobserved Performance" (UP), and show that high-UP funds outperform by ~8%/year.

This project asks two questions nobody has addressed:

Live dashboard: leotaby.github.io/up-ml-prediction

UP Lifecycle UP peaks around years 3-6 and decays as AUM grows.

Heatmap 25 portfolios sorted on IVOL and UP. Alpha is highest when both are high.

  1. Does UP follow a lifecycle? Young funds should generate more UP because their strategies face fewer capacity constraints. As AUM grows, the edge decays.

  2. Can we predict UP transitions using ML and NLP? Beyond standard fund characteristics, hedge fund investor letters and SEC filings contain textual signals about whether a manager's edge is growing or fading. We use a multi-agent LLM pipeline to extract these signals and feed them into the ML prediction framework of Bali, Beckmeyer, Moerke & Weigert (2023, RFS).

Structure

python/
  predict_up.py         expanding-window ML prediction of next-month UP
  up_measure.py         UP = reported return - holdings-implied return
  nlp/
    agent_pipeline.py   multi-agent LLM: letter, filing, social, synthesizer
    extract_features.py HuggingFace text features from fund documents
R/
  cross_section.R       Fama-MacBeth with Newey-West, quintile sorts
stata/
  panel_up.do           two-way FE, System-GMM, double sorts
cpp/
  mc_copula.cpp         Clayton copula tail dependence, Monte Carlo
  CMakeLists.txt
dashboard/
  dashboard.jsx         React + Recharts interactive visualization
data/
  funds.csv             simulated panel (200 funds, 2005-2022)

Quick start

pip install -r python/requirements.txt
python python/predict_up.py --data data/funds.csv
python python/nlp/agent_pipeline.py --letters data/sample_letters/
Rscript R/cross_section.R
cd cpp && mkdir build && cd build && cmake .. && make && ./mc_sim

References

  • Agarwal, Ruenzi & Weigert (2024). Unobserved performance of hedge funds. JF 79, 3203-3259.
  • Bali, Beckmeyer, Moerke & Weigert (2023). Option return predictability with ML. RFS 36, 3548-3602.
  • Bali & Weigert (2024). Hedge funds and the positive idiosyncratic volatility effect. RoF 28, 1611-1661.
  • Chabi-Yo, Huggenberger & Weigert (2022). Multivariate crash risk. JFE 145, 129-153.
  • Maitre, Pugachyov & Weigert (2025). Social media attention and crypto returns. JBF, forthcoming.

About

Predicting the lifecycle of hedge fund Unobserved Performance with ML and NLP

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors