feat: Add language selection to CLI and Gradio UI#15
feat: Add language selection to CLI and Gradio UI#15apolmig wants to merge 1 commit intoKugelaudio:mainfrom
Conversation
|
hi @kajode, pr submitted with help of opus4.6 extended, congrats on the awesome model |
There was a problem hiding this comment.
Pull request overview
This pull request adds language metadata infrastructure for KugelAudio's 23 supported European languages, but does not include the CLI and Gradio UI integration promised in the PR title and description. The changes provide a foundation for explicit language selection to address auto-detection issues mentioned in issues #10, #9, and #2.
Changes:
- New
languages.pymodule with structured metadata for 23 languages, including ISO codes, names, flags, and quality tiers - Comprehensive test suite covering language data validation, lookup functions, and Gradio formatting helpers
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/kugelaudio_open/languages.py | New module defining Language dataclass, metadata for 23 languages with quality tiers, and helper functions for validation and Gradio UI formatting |
| tests/test_languages.py | Comprehensive test suite covering language data integrity, lookup functions, validation, quality warnings, and Gradio integration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| Language configuration for KugelAudio Open. | ||
|
|
||
| Structured metadata for the 24 supported European languages with quality |
There was a problem hiding this comment.
The docstring claims "24 supported European languages" but the code actually defines 23 languages. This is inconsistent with the test file which correctly asserts 23 languages (test_languages.py line 13). The docstring should be updated to say "23 supported European languages" to match the actual implementation.
| Structured metadata for the 24 supported European languages with quality | |
| Structured metadata for the 23 supported European languages with quality |
| def parse_gradio_choice(choice: str) -> str: | ||
| """Extract language code from Gradio dropdown value.""" | ||
| try: | ||
| return choice.split("(")[-1].split(")")[0].strip() |
There was a problem hiding this comment.
The parse_gradio_choice function doesn't validate that the extracted code is a valid language. If a malformed input contains parentheses with an invalid code like "Test (zz)", it will return "zz" instead of falling back to DEFAULT_LANG. Consider validating the extracted code with get(code) and returning DEFAULT_LANG if it's None.
| return choice.split("(")[-1].split(")")[0].strip() | |
| # Ensure we are working with a string | |
| if not isinstance(choice, str): | |
| return DEFAULT_LANG | |
| # Extract the text between the last '(' and the following ')' | |
| code = choice.split("(")[-1].split(")")[0].strip() | |
| # Validate the extracted code against known languages | |
| if get(code) is not None: | |
| return code | |
| return DEFAULT_LANG |
| def test_parse_fallback(self): | ||
| assert parse_gradio_choice("garbage") == DEFAULT_LANG |
There was a problem hiding this comment.
The test_parse_fallback test only checks handling of completely malformed input ("garbage"), but doesn't test the case where input contains parentheses with an invalid language code like "Test (zz)". This edge case should be tested to ensure the function properly falls back to DEFAULT_LANG for invalid codes.
| @@ -0,0 +1,107 @@ | |||
| """ | |||
There was a problem hiding this comment.
The PR title and description claim to "Add language selection to CLI and Gradio UI" with examples showing CLI flags like -l de and --language fr, but no changes to start.py, cli.py, or the Gradio UI files are present in this diff. Only the language metadata module and tests are included. Either the PR description is inaccurate, or the integration code is missing from this pull request.
e8cbf0b to
369c70e
Compare
feat: Add language selection to CLI and Gradio UI
Closes #10
What
Adds explicit language selection across all interfaces — CLI flag, Gradio dropdown, and Python API param. Currently there's no way to tell the model which language you're targeting, so it relies on auto-detection which breaks on short texts and ambiguous input.
Changes
New:
src/kugelaudio_open/languages.pyLanguagedataclass with ISO 639-1 code, name, flag, and quality tier (high/medium/limited based on YODAS2 coverage)CLI (
start.py)Gradio UI
Tests:
tests/test_languages.pySide effects
Also helps with:
dahint should improve pronunciationNotes
--languageis optional everywherelanguageparam needs to be wired intoKugelAudioProcessor.__call__()— I've provided the integration patches but couldn't verify against the full model without GPU