Hi — I’m Yougen Yuan. Welcome to my personal homepage. I conduct research at the intersection of LLMs/VLLMs, Agentic RAG, speech processing, multimodal learning, and clustering. This repository contains links to my projects, papers, and contact information.
- Models: Qwen, InternVL, DeepSeek, Kimi, GLM, MiniMax
- Methods: SFT, LoRA, DPO, MPO, CoT, GRPO
- Applications: RAG, Dify
- Speech Retrieval: spoken term detection, wake-up word detection, keyword spotting
- TTS: zero-shot text-to-speech, voice conversion, end-to-end spoken interaction
- ASR: automatic speech recognition
- Audio Classification: Audio scene classification, Speech language identification
- CLIP-based frameworks: weclip, youclip
- Language–image recognition, LLM-driven multimodal representation learning
- Audio, visual, and text similarity representation
- Large-scale clustering (SinglePass, HDBSCAN variants)
- Code and demo projects related to my research
- Selected models and experiments for keyword retrieval, TTS/VC, and multimodal fusion
- Notes and resources about clustering and large-scale similarity search
If you’re interested in my work or open to collaboration, feel free to reach out: yougenyuan@gmail.com
