Skip to content

fix(block-editor): support spaces in contentlet search (#34416)#35510

Open
oidacra wants to merge 4 commits intomainfrom
issue-34416
Open

fix(block-editor): support spaces in contentlet search (#34416)#35510
oidacra wants to merge 4 commits intomainfrom
issue-34416

Conversation

@oidacra
Copy link
Copy Markdown
Member

@oidacra oidacra commented Apr 29, 2026

Summary

Fixes the Block Editor contentlet search to support multi-word queries with spaces.

When typing inside the Block Editor's contentlet picker (after / → Contentlets → <ContentType>), spaces appear in the input but the result list does not refine. Root cause: SuggestionsService.getContentlets interpolated the user filter into +catchall:*${filter}* title:'${filter}'^15. With a multi-word filter like White Water, Lucene parses +catchall:*White Water* as two terms — +catchall:*White (mandatory) plus Water* (optional) — so the query degrades to a single-word filter on the first token only. The single-quoted title clause is also broken (Lucene phrase queries require double quotes).

CleanShot.2026-04-29.at.17.32.41.mp4

Fix

In suggestions.service.ts (getContentlets):

  • Tokenize the filter on whitespace and emit one mandatory +catchall:*token* clause per token, joined with spaces. Multi-word queries now require ALL tokens to match.
  • Switch the title boost from title:'${filter}'^15 to title:"${filter}"^15 for proper Lucene phrase semantics.
  • Compose the final query via array + filter + join so empty parts (e.g. missing identifierQuery) don't introduce double spaces.
  • Empty / whitespace-only filters omit the catchall and title clauses entirely.
  • Hyphen-bearing filters keep the existing identifier branch unchanged (+catchall:${filter}, no wildcards).

Closes

Closes #34416

Acceptance Criteria

  • Search input accepts space characters (was already preserved by TipTap; the fix targets the resulting query).
  • Multi-word query (e.g., White Water) narrows results to contentlets matching ALL tokens.
  • Single-word query still works — no regression.
  • Empty filter returns the default contentlet list (no catchall/title clauses emitted).
  • Filter containing - preserves the identifier branch.

Test Plan

Automated tests added in suggestions.service.spec.ts (now 8 tests, was 1) using HttpTestingController to assert the exact query string sent to /api/content/_search for: multi-word, single-word, empty, whitespace-only, hyphen, and contentletIdentifier exclusion cases, plus response mapping.

Manual verification:

  • Open a content type with a Block Editor field, type /, choose Contentlets → Activity.
  • Type White Water — list filters to contentlets containing both words (e.g., White Water Rafting).
  • Type Water — single-word search still works.
  • Clear the filter — full default list returns.

Changed Files

  • core-web/libs/block-editor/src/lib/shared/services/suggestions/suggestions.service.ts
  • core-web/libs/block-editor/src/lib/shared/services/suggestions/suggestions.service.spec.ts
  • core-web/libs/block-editor/src/lib/elements/dot-bubble-menu/dot-bubble-menu.component.spec.ts (drive-by: 1-line import order fix to satisfy nx affected:lint)

Visual Changes

User-facing behavior change in the Block Editor contentlet search popup; the visual UI itself is unchanged. A before/after screen recording can be shared if needed.

oidacra added 2 commits April 29, 2026 16:51
Tokenize the search filter on whitespace and emit one mandatory
+catchall:*token* clause per token so multi-word queries (e.g.
"White Water") narrow results to contentlets matching ALL tokens.
Previously the unquoted space caused Lucene to parse the catchall as
a single mandatory term + remaining optional terms, so multi-word
filters degraded to filtering by only the first word.

Also switch the title relevance boost from single quotes (literal,
non-functional) to double quotes (Lucene phrase syntax). Empty filter
now omits the catchall/title clauses entirely; hyphen-bearing filters
preserve the existing identifier/UUID branch unchanged.

Closes #34416
Move the rxjs import above @angular imports to satisfy the existing
import/order eslint rule. Pre-existing failure surfaced when running
nx affected:lint for #34416; fixed here to keep the affected lint
target green.
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 29, 2026

Claude finished @oidacra's task in 2m 38s —— View job


PR Review

  • Gather context and read changed files
  • Review suggestions.service.ts — core logic change
  • Review suggestions.service.spec.ts — test coverage
  • Review dot-bubble-menu.component.spec.ts — drive-by fix
  • Post findings

Issues

1. UUID_LIKE regex silently kills searches for hex English words (suggestions.service.ts:15)

const UUID_LIKE = /^[0-9a-f]+(-[0-9a-f]+)+$/i;

The comment claims it's "narrow enough to skip ordinary hyphenated English titles," but it matches any hyphen-separated sequence where every segment is all-hex. Short, real English words that are also valid hex: ace, cafe, bead, dead, beef, face, babe, fade. A user searching for content titled ace-cafe or dead-beef silently hits the no-wildcard exact-match path and gets nothing. This is user-invisible: no error, just empty results.

A proper UUID pattern (/^[0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12}$/i) would be narrow enough to cover the actual identifier use case without these false positives. The test at spec:112 uses abc-def as the UUID case — that's also not a UUID and would be better served by a proper UUID regex.

Fix this →

2. Hyphenated-word tokens won't match via wildcard (suggestions.service.ts:63)

When filter = 'White-Water Falls', whitespace-split produces ['White-Water', 'Falls'], and the query includes +catchall:*White\-Water*. Standard Lucene text analysis would index White-Water as two terms (white, water), so a single wildcard over White\-Water finds nothing. The +Falls clause still narrows results, but the hyphenated token is effectively dead weight.

This existed before (the old - branch also used no wildcards), so it's not a regression introduced here — but the PR now routes this through the "works correctly" path without addressing the actual matching issue. Worth a comment acknowledging the limitation, or splitting tokens on [-\s]+ instead of just \s+.

3. contentletIdentifier is interpolated without escaping (suggestions.service.ts:52)

const identifierQuery = contentletIdentifier ? `-identifier:${contentletIdentifier}` : '';

The PR adds escapeLucene for user-typed input but leaves this call site unescaped. In practice, dotCMS identifiers are UUIDs and safe, but it's an inconsistency: if someone passes an identifier with a Lucene special char, it bypasses the new escaping logic. One-liner fix: escapeLucene(contentletIdentifier).


Minor

  • The getContentletsByLink method (line 115) interpolates link unescaped — pre-existing, not this PR's problem, but the new escapeLucene helper is sitting right there.
  • &&/|| multi-char operator handling in LUCENE_SPECIAL_CHARS produces \&& / \|| (escapes only the first char). This neutralises the operator in practice but isn't the canonical form (\&\& / \|\|). Harmless given typical content search input.
  • Tests are thorough: 8 cases covering multi-word, single-word, empty, whitespace-only, UUID, injection, identifier exclusion, and response mapping. The flushEmpty helper and httpMock.verify() in afterEach are clean. The injection test (spec:143) is a good addition.
  • dot-bubble-menu.component.spec.ts drive-by: correct, no concerns.

@github-actions github-actions Bot added the Area : Frontend PR changes Angular/TypeScript frontend code label Apr 29, 2026
oidacra added 2 commits April 29, 2026 17:25
Prettier flagged a double blank line left over from the previous
import/order fix. nx format:check is now clean.
Address PR review on #34416:

- Narrow the identifier/UUID branch from filter.includes('-') to a hex-only
  segmented pattern (UUID_LIKE). Hyphenated English titles like
  "self-care", "follow-up", or "White-Water Falls" now go through the
  regular tokenized search path instead of degrading to a non-wildcard
  exact-match clause.
- Escape Lucene query-syntax characters (+ - && || ! ( ) { } [ ] ^ " ~
  * ? : \ /) before interpolating user input into the catchall and
  title clauses, preventing a user from injecting arbitrary clauses
  that would bypass the +contentType restriction.
- Drop the redundant .filter(token => token.length > 0) after
  .trim().split(/\s+/) — the regex already collapses interior runs and
  trim removes leading/trailing whitespace, so empty tokens are
  impossible.

Adds tests for the hyphenated-title path, the injection-escaping
behavior, and updates the existing UUID-branch assertion to expect
the escaped hyphen.
@oidacra oidacra marked this pull request as ready for review April 29, 2026 21:32
@oidacra oidacra enabled auto-merge April 29, 2026 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Frontend PR changes Angular/TypeScript frontend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[DEFECT] Block Editor search not allowing spaces.

1 participant