Skip to content

Added dutch language#47

Open
roelti wants to merge 2 commits intoBlaspsoft:mainfrom
roelti:feature/dutch
Open

Added dutch language#47
roelti wants to merge 2 commits intoBlaspsoft:mainfrom
roelti:feature/dutch

Conversation

@roelti
Copy link
Copy Markdown

@roelti roelti commented Mar 19, 2026

Hi!
Thanks for creating this package! I was looking for a profanity filter package and came across your package. I needed the Dutch language so I've added it. Perhaps you will also find it useful.

Thanks!

Summary by CodeRabbit

  • New Features

    • Added Dutch language support for profanity detection with severity tiers, false-positive handling, and obfuscated/diacritic-aware matching.
    • Added a fluent shorthand to select Dutch checks easily.
  • Tests

    • Added comprehensive tests covering Dutch detection, severity tiers, obfuscation/diacritics normalization, false-positive cases, shorthand behavior, and availability in the language set.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2587a3e5-daba-49de-a45c-0bb8af54fd50

📥 Commits

Reviewing files that changed from the base of the PR and between fc54f3e and 9b9c8cf.

📒 Files selected for processing (1)
  • config/languages/dutch.php
✅ Files skipped from review due to trivial changes (1)
  • config/languages/dutch.php

📝 Walkthrough

Walkthrough

Adds Dutch language support: new config/languages/dutch.php, a DutchNormalizer, Dictionary mapping for 'dutch', PendingCheck::dutch() shortcut, and comprehensive PHPUnit tests for detection, normalization, false positives, and obfuscation handling.

Changes

Cohort / File(s) Summary
Dutch Language Configuration
config/languages/dutch.php
New Dutch profanity config: severity buckets (mild/moderate/high/extreme), profanities, false_positives, and substitutions for obfuscated matching.
Core Normalization & Routing
src/Core/Normalizers/DutchNormalizer.php, src/Core/Dictionary.php
Added DutchNormalizer implementing diacritic-to-base character mapping; Dictionary::getNormalizerForLanguage() now returns a DutchNormalizer for 'dutch'.
API Shortcut
src/PendingCheck.php
Added public function dutch(): self as a fluent shorthand for in('dutch').
Tests
tests/DutchLanguageDetectionTest.php, tests/DutchStringNormalizerTest.php, tests/AllLanguagesApiTest.php
Added extensive PHPUnit tests for Dutch detection (multiple severities, case/diacritics/obfuscation, false positives), normalizer unit tests, and updated mixed-language expectations.

Sequence Diagram(s)

sequenceDiagram
    participant User as "User"
    participant PendingCheck as "PendingCheck"
    participant Detector as "Profanity Detector"
    participant Dictionary as "Dictionary"
    participant DutchNormalizer as "DutchNormalizer"
    participant DutchConfig as "Dutch Config"

    User->>PendingCheck: dutch() or in('dutch')
    PendingCheck->>Detector: set language = 'dutch'
    User->>Detector: check(text)
    Detector->>Dictionary: getNormalizerForLanguage('dutch')
    Dictionary->>DutchNormalizer: instantiate / provide instance
    Detector->>DutchNormalizer: normalize(text)
    DutchNormalizer->>DutchNormalizer: Apply mappings (ë→e, Ü→U, etc.)
    DutchNormalizer->>Detector: return normalized text
    Detector->>DutchConfig: load profanities & substitutions
    Detector->>Detector: match normalized text using substitutions & severities
    Detector->>User: return results (offensive?, matches, censored text)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 I hopped through accents, small and bright,
ë, ü, á—now stripped to light.
From kut to godverdomme I spy,
Masked and counted, none slip by.
A little hop for language cheer!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Added dutch language' directly describes the main change: adding Dutch language support to the package.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/languages/dutch.php`:
- Line 90: Remove the incorrect entry 'neef' from the Dutch profanity list (the
string literal 'neef' shown in the diff) to avoid false positives; either delete
that line from the array or move it into your project's allowed/whitelist
collection for Dutch words and ensure any tests or lookup logic (the profanity
array used by your profanity-check routine) reference the updated list.
- Around line 63-64: The language entries array contains a duplicated string
'klerelijer'; remove the redundant occurrence so the array only contains a
single 'klerelijer' entry (locate the duplicate within the Dutch language
entries list where 'klerelijer' appears twice and delete one of them).
- Line 130: Remove the incorrect entry 'teer' from the Dutch profanity list (the
literal string 'teer' present in the array) so only actual profanities like
'tering' remain; edit the array in config/languages/dutch.php to delete the
'teer' element and ensure surrounding commas/array formatting stay valid.
- Line 109: The array entry 'rotterd' is truncated/incorrect; locate the string
'rotterd' in config/languages/dutch.php and either remove that array element if
it was added accidentally or replace it with the intended term (e.g., the
correct word like 'rotterdam' or the proper Dutch word you meant) so the
language list contains valid entries only.
- Line 35: The entry 'deboer' in the Dutch profanity list is a common surname
and should not be treated as profanity; remove the string 'deboer' from the
offensive-words array (or relocate it into a legitimate-names/whitelist array if
you maintain one) so it no longer triggers false positives, and add a short
comment noting why it was removed to prevent reintroduction.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9142c153-a410-4366-af68-1b740204bb66

📥 Commits

Reviewing files that changed from the base of the PR and between b863354 and fc54f3e.

📒 Files selected for processing (7)
  • config/languages/dutch.php
  • src/Core/Dictionary.php
  • src/Core/Normalizers/DutchNormalizer.php
  • src/PendingCheck.php
  • tests/AllLanguagesApiTest.php
  • tests/DutchLanguageDetectionTest.php
  • tests/DutchStringNormalizerTest.php

Copy link
Copy Markdown
Collaborator

@deemonic deemonic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing Dutch language support, @roeltinkhof! This is a great addition and follows the existing patterns well. The test coverage is solid too.

Before merging, there are a few issues in config/languages/dutch.php that need to be addressed:

Incorrect entries in the profanity list

These should be removed as they are not profanities and will cause false positives:

  • 'deboer' (line 35) — Common Dutch surname (de Boer)
  • 'neef' (line 90) — Means "cousin/nephew", a normal Dutch word
  • 'rotterd' (line 109) — Appears to be a truncated word (Rotterdam?), not a profanity
  • 'teer' (line 130) — Means "tar" in Dutch. The actual profanity 'tering' is already listed separately

Duplicate entry

  • 'klerelijer' appears twice (lines ~63-64). One occurrence should be removed.

Misspelled false positive

  • 'roterdam' in the false positives list should be 'rotterdam' (double t).

Once these are fixed, this should be good to go!

@roelti
Copy link
Copy Markdown
Author

roelti commented Mar 25, 2026

Thanks for you review! I fixed the issues :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants