Added dutch language by roelti · Pull Request #47 · Blaspsoft/blasp

roelti · 2026-03-19T12:54:33Z

Hi!
Thanks for creating this package! I was looking for a profanity filter package and came across your package. I needed the Dutch language so I've added it. Perhaps you will also find it useful.

Thanks!

Summary by CodeRabbit

New Features
- Added Dutch language support for profanity detection with severity tiers, false-positive handling, and obfuscated/diacritic-aware matching.
- Added a fluent shorthand to select Dutch checks easily.
Tests
- Added comprehensive tests covering Dutch detection, severity tiers, obfuscation/diacritics normalization, false-positive cases, shorthand behavior, and availability in the language set.

coderabbitai · 2026-03-19T12:54:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2587a3e5-daba-49de-a45c-0bb8af54fd50

📥 Commits

Reviewing files that changed from the base of the PR and between fc54f3e and 9b9c8cf.

📒 Files selected for processing (1)

config/languages/dutch.php

✅ Files skipped from review due to trivial changes (1)

config/languages/dutch.php

📝 Walkthrough

Walkthrough

Adds Dutch language support: new config/languages/dutch.php, a DutchNormalizer, Dictionary mapping for 'dutch', PendingCheck::dutch() shortcut, and comprehensive PHPUnit tests for detection, normalization, false positives, and obfuscation handling.

Changes

Cohort / File(s)	Summary
Dutch Language Configuration `config/languages/dutch.php`	New Dutch profanity config: severity buckets (mild/moderate/high/extreme), `profanities`, `false_positives`, and `substitutions` for obfuscated matching.
Core Normalization & Routing `src/Core/Normalizers/DutchNormalizer.php`, `src/Core/Dictionary.php`	Added `DutchNormalizer` implementing diacritic-to-base character mapping; `Dictionary::getNormalizerForLanguage()` now returns a `DutchNormalizer` for `'dutch'`.
API Shortcut `src/PendingCheck.php`	Added `public function dutch(): self` as a fluent shorthand for `in('dutch')`.
Tests `tests/DutchLanguageDetectionTest.php`, `tests/DutchStringNormalizerTest.php`, `tests/AllLanguagesApiTest.php`	Added extensive PHPUnit tests for Dutch detection (multiple severities, case/diacritics/obfuscation, false positives), normalizer unit tests, and updated mixed-language expectations.

Sequence Diagram(s)

sequenceDiagram
    participant User as "User"
    participant PendingCheck as "PendingCheck"
    participant Detector as "Profanity Detector"
    participant Dictionary as "Dictionary"
    participant DutchNormalizer as "DutchNormalizer"
    participant DutchConfig as "Dutch Config"

    User->>PendingCheck: dutch() or in('dutch')
    PendingCheck->>Detector: set language = 'dutch'
    User->>Detector: check(text)
    Detector->>Dictionary: getNormalizerForLanguage('dutch')
    Dictionary->>DutchNormalizer: instantiate / provide instance
    Detector->>DutchNormalizer: normalize(text)
    DutchNormalizer->>DutchNormalizer: Apply mappings (ë→e, Ü→U, etc.)
    DutchNormalizer->>Detector: return normalized text
    Detector->>DutchConfig: load profanities & substitutions
    Detector->>Detector: match normalized text using substitutions & severities
    Detector->>User: return results (offensive?, matches, censored text)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: load and merge language-specific substitutions (#35) #39: Implements merging of language-specific substitution patterns into global configuration, which complements the Dutch-specific substitution patterns added here.

Poem

🐰 I hopped through accents, small and bright,
ë, ü, á—now stripped to light.
From kut to godverdomme I spy,
Masked and counted, none slip by.
A little hop for language cheer!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Added dutch language' directly describes the main change: adding Dutch language support to the package.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@config/languages/dutch.php`:
- Line 90: Remove the incorrect entry 'neef' from the Dutch profanity list (the
string literal 'neef' shown in the diff) to avoid false positives; either delete
that line from the array or move it into your project's allowed/whitelist
collection for Dutch words and ensure any tests or lookup logic (the profanity
array used by your profanity-check routine) reference the updated list.
- Around line 63-64: The language entries array contains a duplicated string
'klerelijer'; remove the redundant occurrence so the array only contains a
single 'klerelijer' entry (locate the duplicate within the Dutch language
entries list where 'klerelijer' appears twice and delete one of them).
- Line 130: Remove the incorrect entry 'teer' from the Dutch profanity list (the
literal string 'teer' present in the array) so only actual profanities like
'tering' remain; edit the array in config/languages/dutch.php to delete the
'teer' element and ensure surrounding commas/array formatting stay valid.
- Line 109: The array entry 'rotterd' is truncated/incorrect; locate the string
'rotterd' in config/languages/dutch.php and either remove that array element if
it was added accidentally or replace it with the intended term (e.g., the
correct word like 'rotterdam' or the proper Dutch word you meant) so the
language list contains valid entries only.
- Line 35: The entry 'deboer' in the Dutch profanity list is a common surname
and should not be treated as profanity; remove the string 'deboer' from the
offensive-words array (or relocate it into a legitimate-names/whitelist array if
you maintain one) so it no longer triggers false positives, and add a short
comment noting why it was removed to prevent reintroduction.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9142c153-a410-4366-af68-1b740204bb66

📥 Commits

Reviewing files that changed from the base of the PR and between b863354 and fc54f3e.

📒 Files selected for processing (7)

config/languages/dutch.php
src/Core/Dictionary.php
src/Core/Normalizers/DutchNormalizer.php
src/PendingCheck.php
tests/AllLanguagesApiTest.php
tests/DutchLanguageDetectionTest.php
tests/DutchStringNormalizerTest.php

config/languages/dutch.php

deemonic

Thanks for contributing Dutch language support, @roeltinkhof! This is a great addition and follows the existing patterns well. The test coverage is solid too.

Before merging, there are a few issues in config/languages/dutch.php that need to be addressed:

Incorrect entries in the profanity list

These should be removed as they are not profanities and will cause false positives:

'deboer' (line 35) — Common Dutch surname (de Boer)
'neef' (line 90) — Means "cousin/nephew", a normal Dutch word
'rotterd' (line 109) — Appears to be a truncated word (Rotterdam?), not a profanity
'teer' (line 130) — Means "tar" in Dutch. The actual profanity 'tering' is already listed separately

Duplicate entry

'klerelijer' appears twice (lines ~63-64). One occurrence should be removed.

Misspelled false positive

'roterdam' in the false positives list should be 'rotterdam' (double t).

Once these are fixed, this should be good to go!

roelti · 2026-03-25T14:17:37Z

Thanks for you review! I fixed the issues :)

Added dutch language

fc54f3e

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

deemonic reviewed Mar 25, 2026

View reviewed changes

Removed possible false positives, duplicated and 1 misspelling

9b9c8cf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added dutch language#47

Added dutch language#47
roelti wants to merge 2 commits intoBlaspsoft:mainfrom
roelti:feature/dutch

roelti commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 19, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deemonic left a comment

Uh oh!

roelti commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

roelti commented Mar 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

deemonic left a comment

Choose a reason for hiding this comment

Incorrect entries in the profanity list

Duplicate entry

Misspelled false positive

Uh oh!

roelti commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

roelti commented Mar 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 19, 2026 •

edited

Loading