Skip to content
View ygyuan's full-sized avatar

Block or report ygyuan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ygyuan/README.md

👋 Yougen Yuan

Hi — I’m Yougen Yuan. Welcome to my personal homepage. I conduct research at the intersection of LLMs/VLLMs, Agentic RAG, speech processing, multimodal learning, and clustering. This repository contains links to my projects, papers, and contact information.

🔍 Research Interests

LLMs/VLLMs

  • Models: Qwen, InternVL, DeepSeek, Kimi, GLM, MiniMax
  • Methods: SFT, LoRA, DPO, MPO, CoT, GRPO
  • Applications: RAG, Dify

Speech Processing

  • Speech Retrieval: spoken term detection, wake-up word detection, keyword spotting
  • TTS: zero-shot text-to-speech, voice conversion, end-to-end spoken interaction
  • ASR: automatic speech recognition
  • Audio Classification: Audio scene classification, Speech language identification

Multimodal Learning

  • CLIP-based frameworks: weclip, youclip
  • Language–image recognition, LLM-driven multimodal representation learning

Clustering & Retrieval

  • Audio, visual, and text similarity representation
  • Large-scale clustering (SinglePass, HDBSCAN variants)

📂 What you’ll find here

  • Code and demo projects related to my research
  • Selected models and experiments for keyword retrieval, TTS/VC, and multimodal fusion
  • Notes and resources about clustering and large-scale similarity search

📫 Contact

If you’re interested in my work or open to collaboration, feel free to reach out: yougenyuan@gmail.com


Pinned Loading

  1. ygyuan.github.io ygyuan.github.io Public

    Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

    SCSS 1

  2. UltraEval-Audio UltraEval-Audio Public

    Forked from OpenBMB/UltraEval-Audio

    Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。

    Python 1

  3. GPT-SoVITS GPT-SoVITS Public

    Python

  4. LlamaFactory LlamaFactory Public

    Forked from hiyouga/LlamaFactory

    Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

    Python 1