Skip to content

Author matching: token-overlap false positives pull in books from same-named co-authors #563

@vavallee

Description

@vavallee

Reported behaviour

Items are being pulled in for an author when a different multi-author work happens to contain the right surname tokens. Example: monitored author "Rachel Reid" gets a book imported whose author field is "Rachel Larsen, Adam Reid, and Ozi Akturk" — the tokens "Rachel" and "Reid" both appear, but on different actual people.

Suspect code paths (in descending likelihood)

After reading the code, three sites do token-style author matching that could produce this. Without a specific imported example I can't say which one fired here:

1. Indexer release filter — internal/indexer/searcher.go:filterRelevant

The release-relevance check requires the release to contain only the surname at a word boundary (line 304: surname := AuthorSurname(author); line 320-323 calls titleMatchesResult(... sn ...)). So a release titled Adam.Reid.Some.Book.epub would pass the gate when Author = "Rachel Reid" because "reid" is in there. It would then be auto-grabbed by the searcher and ingested.

This is the most likely source if the user has indexer auto-grab on.

2. Library scanner — internal/importer/scanner.go:authorMatch (lines 981-991)

parts := strings.Fields(strings.ToLower(parsedAuthor))
lastName := parts[len(parts)-1]
return len(lastName) >= 3 && strings.Contains(strings.ToLower(bookAuthor), lastName)

strings.Contains is substring match, not word-boundary, and only the last name is checked. So parsedAuthor = "Rachel Reid" resolves to lastName = "reid" and matches any bookAuthor containing "reid" anywhere — including "Adam Reid", "Reid Hoffman", etc.

This is the most likely source if the user is scanning a local library.

3. ABS importer — internal/abs/import_author_matcher.go

This one uses full-name comparisons via textutil.MatchAuthorName (Jaro-Winkler whole-string), so it should NOT false-positive on "Rachel Reid" vs "Rachel Larsen, Adam Reid, and Ozi Akturk" — those whole-strings are very different. Listing here for completeness so a fix doesn't accidentally tighten the ABS path.

Likely fix shape (pending diagnosis)

Once we know which path fired:

  • For (1): augment titleMatchesResult to also require the first name (or a reasonable initial form) at a word boundary when the requested author has more than one token. "Rachel" + "Reid" both must match, contiguously preferred.
  • For (2): replace strings.Contains(bookAuthor, lastName) with a stricter check — require the parsed name to appear as a contiguous "first last" phrase in the book author string, OR require ALL of the parsed name's significant tokens to appear at word boundaries.

Need from reporter

Please share one specific imported item that shouldn't have been (book/audiobook title + the author field as it appeared in your library) so I can pinpoint which path is firing. A line from the bindery logs around the time of import would also help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions