Yougen Yuan ygyuan

👋 Yougen Yuan

Hi — I’m Yougen Yuan. Welcome to my personal homepage. I conduct research at the intersection of LLMs/VLLMs, Agentic RAG, speech processing, multimodal learning, and clustering. This repository contains links to my projects, papers, and contact information.

🔍 Research Interests

LLMs/VLLMs

Models: Qwen, InternVL, DeepSeek, Kimi, GLM, MiniMax
Methods: SFT, LoRA, DPO, MPO, CoT, GRPO
Applications: RAG, Dify

Speech Processing

Speech Retrieval: spoken term detection, wake-up word detection, keyword spotting
TTS: zero-shot text-to-speech, voice conversion, end-to-end spoken interaction
ASR: automatic speech recognition
Audio Classification: Audio scene classification, Speech language identification

Multimodal Learning

CLIP-based frameworks: weclip, youclip
Language–image recognition, LLM-driven multimodal representation learning

Clustering & Retrieval

Audio, visual, and text similarity representation
Large-scale clustering (SinglePass, HDBSCAN variants)

📂 What you’ll find here

Code and demo projects related to my research
Selected models and experiments for keyword retrieval, TTS/VC, and multimodal fusion
Notes and resources about clustering and large-scale similarity search

📫 Contact

If you’re interested in my work or open to collaboration, feel free to reach out: yougenyuan@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly