Skip to content

task(Migrate ESMappingAPIImpl) Refs #34933#35289

Draft
fabrizzio-dotCMS wants to merge 6 commits intomainfrom
issue-34933-Mapping-Layer
Draft

task(Migrate ESMappingAPIImpl) Refs #34933#35289
fabrizzio-dotCMS wants to merge 6 commits intomainfrom
issue-34933-Mapping-Layer

Conversation

@fabrizzio-dotCMS
Copy link
Copy Markdown
Member

@fabrizzio-dotCMS fabrizzio-dotCMS commented Apr 10, 2026

Summary

Migrates ContentletIndexAPIImpl — the central content indexing implementation — to the vendor-neutral Phase-aware router pattern. All Elasticsearch-specific types (BulkRequest, BulkProcessor, ActionListener) are replaced with domain-layer abstractions (IndexBulkRequest, IndexBulkProcessor, IndexBulkListener), enabling dual-write routing to both ES and OS backends during the ES→OS migration without leaking vendor types to callers.

Changes

Backend — Core Indexing

  • ContentletIndexAPI (interface): Removed all org.elasticsearch.* imports from method signatures; replaced with vendor-neutral IndexBulkRequest, IndexBulkProcessor, and IndexBulkListener domain types. Added @IndexLibraryIndependent annotation. Changed fullReindexStart() return type from String to IndexStartResult. Added thread-safe DateTimeFormatter alongside deprecated SimpleDateFormat.
  • ContentletIndexAPIImpl: Full rewrite to a phase-aware router:
    • Holds two ContentletIndexOperations instances (operationsES, operationsOS) and delegates via PhaseRouter<ContentletIndexOperations>
    • New DualIndexBulkRequest inner class fans out synchronous bulk batches to both ES and OS in dual-write phases
    • New CompositeBulkProcessor inner class fans out async bulk processing; OS failures are fire-and-forget (shadow) in phases 1/2, propagating only in phase 3
    • ProviderIndices inner class resolves working/live/reindex index names per provider, stripping os:: vendor tags before passing to the OS client
    • Lazy-initialized ContentMappingAPI via AtomicReference to break the circular dependency chain
    • Dropped all direct org.elasticsearch.* imports from the implementation
  • IndexTag: New utility for stripping vendor-specific index name prefixes (os::)
  • IndexStartResult: New domain value object replacing the raw String returned by fullReindexStart()
  • BulkProcessorListener: Updated to implement IndexBulkListener instead of the ES-specific listener
  • ReindexThread: Minor updates to use IndexBulkProcessor/IndexBulkRequest instead of ES types
  • MappingHelper: Replaces ESMappingUtilHelper as the vendor-neutral mapping utility
  • ContentletIndexOperationsES / ContentletIndexOperationsOS: Updated to expose the new domain type APIs

Testing

  • ContentletIndexAPIImplTest: Updated to use the new domain types and refactored test constructors
  • ESSiteSearchAPITest, EMAWebInterceptorTest, CleanUpFieldReferencesJobTest: Minor compile fixes following the API changes

Testing

  1. Start the integration test environment: just test-integration-ide
  2. Run the primary integration test:
    ./mvnw verify -pl :dotcms-integration -Dcoreit.test.skip=false -Dit.test=ContentletIndexAPIImplTest
    
  3. Verify reindexing still works end-to-end by triggering a full reindex from the dotCMS Admin → System → Reindex
  4. Check that ReindexThread completes without errors in both Phase 0 (ES only) and Phase 1 (dual-write) configurations

Breaking Changes

  • ContentletIndexAPI.fullReindexStart() now returns IndexStartResult instead of String. Callers previously using the raw timestamp string must call .timestampSuffix() on the result.
  • BulkRequest, BulkProcessor, and ActionListener<BulkResponse> removed from the ContentletIndexAPI interface. Any code directly casting or importing these types against the interface will not compile.

This PR fixes: #34933

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 10, 2026

Claude finished @fabrizzio-dotCMS's task in 3m 32s —— View job


PR Review — task(Migrate ESMappingAPIImpl) Refs #34933

  • Gather context
  • Read core changed files
  • Review ContentletIndexAPIImpl (main change)
  • Review domain abstractions and routing
  • Review BulkProcessorListener and ReindexThread
  • Synthesize findings and post review

Summary

The architecture is sound and the phase-routing abstraction is clean. A few concrete issues worth fixing before merge.


Issues

1. putToIndex skips OS write when ES throws — violates stated invariant
ContentletIndexAPIImpl:1439

operationsES.putToIndex(dual.esReq);  // throws → OS is never called
try {
    operationsOS.putToIndex(dual.osReq);
} catch ...

If ES throws, the OS shadow write is silently skipped. PhaseRouter.writeChecked (and the class-level Javadoc) explicitly guarantees "all providers are always called regardless of failures." This method doesn't follow that contract. An ES failure now causes OS divergence with no record of it being skipped. The fix is wrapping the ES call in a try-catch that defers the exception until after OS is called, matching PhaseRouter semantics.

Fix this →


2. createContentIndex(String, int) silently returns true on OS IOException
ContentletIndexAPIImpl:551–563

boolean result = true;
for (final ContentletIndexOperations ops : router.writeProviders()) {
    try {
        result &= ops.createContentIndex(physicalName, shards);  // throws IOException
    } catch (Exception e) {
        Logger.error(...);   // caught — result stays true
    }
}
return result;  // true, even if OS creation failed

When ops.createContentIndex() throws, the assignment is never completed and result stays true. The method returns true to the caller. initAndPointReindex (called by fullReindexStart) wraps the call in a try-catch(IOException) expecting propagation, but the exception was already swallowed. Result: OS reindex-slot pointers get updated to an index that doesn't exist.

The private 3-arg version (createContentIndex(String, int, IndexTag)) correctly propagates exceptions. The 2-arg public version should either rethrow after logging, or aggregate and throw a single exception after iterating.

Fix this →


3. loadProviderIndicesQuietly swallows all exceptions without any logging
ContentletIndexAPIImpl:275–277

private ProviderIndices loadProviderIndicesQuietly(final ContentletIndexOperations ops) {
    return Try.of(() -> loadProviderIndices(ops)).getOrElse((ProviderIndices) null);
}

A DotDataException from legacyIndiciesAPI.loadIndicies() (e.g. DB connectivity issue) is completely swallowed. Callers treat null as "OS not initialized yet" and silently skip the write — so documents won't be indexed with no observable signal. At minimum this should log a WARN with the exception message before returning null.


4. OSGi breaking changes without backward-compatible shims (already flagged in prior review)

The previous analysis is accurate and still applies. To summarize: checkAndInitialiazeIndex renamed (fixing spelling), fullReindexStart() return type changed to IndexStartResult, and putToIndex(BulkRequest, ActionListener) / bulk-typed methods removed — all without @Deprecated default-method shims. Both affected packages (com.dotcms.content.elasticsearch.business, com.dotmarketing.common.reindex) are confirmed in osgi-extra.conf. External plugins will fail at activation.


5. AtomicReference.updateAndGet comment overstates thread-safety guarantee
ContentletIndexAPIImpl:201–203

"Thread-safe: uses AtomicReference.updateAndGet to ensure exactly one instance is published without synchronization overhead."

updateAndGet does not guarantee the function is called exactly once — under contention it may call the lambda multiple times and discard extra results. The implementation is correct (same singleton returned), but the comment is misleading. Should say "at most one instance is visible" or "idempotent initialization."


6. DualIndexBulkRequest.size() logic contradicts its own comment
ContentletIndexAPIImpl:307–311

public int size() {
    if(isMigrationComplete() || isReadEnabled()) {
        return osReq.size();
    }
    return esReq.size();
}

The comment says "ES is always the authoritative source in dual-write phases," but isReadEnabled() is true in Phase 2 (dual-write + OS reads), so Phase 2 returns osReq.size(). Since both sub-requests track the same logical documents the count should always match, making this functionally harmless — but the conditional makes the comment false and the intent ambiguous.


7. PR description references non-existent method

The "Breaking Changes" section says: "Callers previously using the raw timestamp string must call .timestampSuffix() on the result." IndexStartResult has no timestampSuffix() method — the actual methods are timeStampES() and timeStampOS(). This should be corrected in the PR description to avoid confusing maintainers making the migration.


8. Redundant boolean conditions in checkAndInitializeIndex
ContentletIndexAPIImpl:481

if (isMigrationStarted() || isReadEnabled() || isMigrationComplete()) {

isMigrationStarted() == !isMigrationNotStarted() covers phases 1–3. isReadEnabled() (phases 2–3) and isMigrationComplete() (phase 3) are already subsets of that. The condition simplifies to isMigrationStarted(). Minor, but it obscures intent.


Items that look fine

  • CompositeBulkProcessor.close() correctly defers the primary exception until after all entries are flushed, and adds further failures as suppressed.
  • fullReindexSwitchoverOS correctly keeps OS-only promotion isolated from legacyIndiciesAPI in Phase 3.
  • The lazy mappingAPI AtomicReference correctly breaks the circular dependency cycle, even if the comment is overstated.
  • BulkProcessorListener shadow semantics are clean: shadow instances never touch the reindex queue or trigger a rebuild.
  • PhaseRouter itself is well-structured and consistent.

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 10, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-4 — OSGi Plugin API Breakage

  • Risk Level: 🟡 MEDIUM

  • Why it's unsafe: The ContentletIndexAPI public interface had multiple breaking method signature changes. Any OSGi plugin compiled against the previous version of this interface will fail at activation time with NoSuchMethodError or IncompatibleClassChangeError.

    Breaking changes introduced in this PR:

    1. Method renamed: checkAndInitialiazeIndex()checkAndInitializeIndex() in ContentletIndexAPI. Any plugin calling the old (misspelled) method receives a NoSuchMethodError.
    2. Return type changed: fullReindexStart() now returns IndexStartResult instead of String. Any plugin assigning the result to a String variable will get a ClassCastException at runtime; any plugin that compiled against the old signature will get NoSuchMethodError.
    3. Method removed: putToIndex(BulkRequest, ActionListener<BulkResponse>) was removed from the ContentletIndexAPI interface entirely and replaced with putToIndex(IndexBulkRequest). Any plugin calling the old two-argument form gets NoSuchMethodError.
    4. Listener contract changed: BulkProcessorListener changed from implementing BulkProcessor.Listener (Elasticsearch type) to implementing IndexBulkListener (domain type). Plugins that registered, subclassed, or cast to the old Elasticsearch-typed listener will fail.
  • Code that makes it unsafe:

    • dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentletIndexAPI.java — method checkAndInitializeIndex() (line ~35, was checkAndInitialiazeIndex); method fullReindexStart() return type change (line ~52); removal of putToIndex(BulkRequest, ActionListener<BulkResponse>)
    • dotCMS/src/main/java/com/dotmarketing/common/reindex/BulkProcessorListener.java — class declaration changed from implements BulkProcessor.Listener to implements IndexBulkListener
  • Alternative (if possible): Follow the two-phase OSGi migration pattern:

    • Release N (this PR): Keep the old method signatures as @Deprecated overloads that delegate to the new signatures, so N-1 plugins continue to activate cleanly.
    • Release N+1: Remove the deprecated shims once N-2 is outside the rollback window.

@github-actions github-actions bot added the Area : Documentation PR changes documentation files label Apr 11, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 11, 2026

Pull Request Unsafe to Rollback!!!

  • Category: M-4 — OSGi Plugin API Breakage

  • Risk Level: 🟡 MEDIUM

  • Why it's unsafe: The ContentletIndexAPI public interface and BulkProcessorListener class both reside in OSGi-exported packages (com.dotcms.content.elasticsearch.business and com.dotmarketing.common.reindex, confirmed in osgi-extra.conf lines 92 and 573). Multiple breaking changes were made without backward-compatible deprecated shims:

    1. Method removed (no shim): checkAndInitialiazeIndex() removed from ContentletIndexAPI and replaced by checkAndInitializeIndex() (spelling fix). Any OSGi plugin calling the old spelling receives NoSuchMethodError. The only in-tree caller — InitServlet.java:94 — was updated, but external plugin callers were not.

    2. Return type changed: fullReindexStart() return changed from String to IndexStartResult. Any plugin that assigns the result to a String variable throws ClassCastException; any plugin compiled against the old signature throws NoSuchMethodError.

    3. Methods removed: putToIndex(BulkRequest, ActionListener<BulkResponse>) and the BulkRequest-typed overloads of createBulkRequest(), appendBulkRequest(), appendBulkRemoveRequest() removed from ContentletIndexAPI. Replaced with IndexBulkRequest-typed equivalents. Plugins calling any of these forms get NoSuchMethodError.

    4. Processor API changed: createBulkProcessor(BulkProcessorListener) replaced by createBulkProcessor(IndexBulkListener); appendToBulkProcessor(BulkProcessor, ...) replaced by appendToBulkProcessor(IndexBulkProcessor, ...). Plugins using async bulk processing cannot compile or activate.

    5. Listener contract changed: BulkProcessorListener changed from implements BulkProcessor.Listener (Elasticsearch type) to implements IndexBulkListener (domain type). The beforeBulk and afterBulk method signatures changed (e.g. BulkRequestint actionCount; BulkResponseList<IndexBulkItemResult>). Plugins that subclass, cast, or register BulkProcessorListener as a BulkProcessor.Listener fail at activation.

  • Code that makes it unsafe:

    • dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentletIndexAPI.java — all lines; checkAndInitializeIndex at line 38 (was checkAndInitialiazeIndex); fullReindexStart return at line 52; all BulkRequest/BulkProcessor/ActionListener method signatures replaced throughout
    • dotCMS/src/main/java/com/dotmarketing/common/reindex/BulkProcessorListener.java — class declaration line 34 (implements IndexBulkListener replacing BulkProcessor.Listener); beforeBulk and afterBulk method signatures
    • dotCMS/src/main/resources/osgi/osgi-extra.conf lines 92, 573 — confirm both affected packages are exported to the OSGi container
  • Alternative (if possible): Follow the two-phase OSGi migration pattern:

    • Release N (this PR): Keep each removed/changed method as a @Deprecated overload that delegates to the new signature. For example, add @Deprecated default void checkAndInitialiazeIndex() { checkAndInitializeIndex(); } as an interface default method; add a @Deprecated default String fullReindexStart() { return fullReindexStart_new().timestampSuffix(); } shim (renaming the new method accordingly). N-1 plugins continue to activate cleanly against these shims.
    • Release N+1: Remove the deprecated shims once N-2 is outside the rollback window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[TASK] Migrate ESMappingAPIImpl

1 participant