Detect duplicate recipients accross to,cc,bcc by Rashmi1613 · Pull Request #1543 · jenkinsci/email-ext-plugin

Rashmi1613 · 2026-03-28T16:07:00Z

Duplicate recipients within each list i.e. to,cc,bcc are already handles using sets, but the same email email address can still appear accross different recipient types.
Changes done here,

Detection of duplicate recipients accross TO, CC, BCC.
Logs the duplicates along with its location- TO,CC,BCC

This improves observability and helps in identification of any type of misconfiguration.

…ging too

Vinamra-tech · 2026-03-28T18:28:25Z

Hi @Rashmi1613 I was wondering if there's a bettter approach to it like first creating a set of seen emails and then applying removeIf() to each of the to , cc , bcc list like this :

Set seenEmails = new HashSet<>();

to.removeIf(addr -> !seenEmails.add(addr.getAddress().toLowerCase()));

cc.removeIf(addr -> !seenEmails.add(addr.getAddress().toLowerCase()));

bcc.removeIf(addr -> !seenEmails.add(addr.getAddress().toLowerCase()));

I would be happy to know if you justify my approach misses something.

Rashmi1613 · 2026-03-29T07:05:20Z

Thanks for the suggestion, that approach is definitely clean and enforces a clear TO > CC > BCC priority. One thing I was considering is whether duplicates across TO/CC/BCC might sometimes be an intentional configuration and if removed it might break user intent and change behaviour. That’s why I initially did detection + logging to preserve user intent. Happy to switch to removal if we want to enforce removal+logging. Let me know what direction you’d prefer

ArpanC6 · 2026-03-29T08:02:49Z

Great work on the detection and logging approach.. Rashmi’s point about preserving user intent makes a lot of sense silently removing duplicates could mess with configurations where the same address needs to receive both a CC and a BCC copy.

A few technical things to keep in mind -

1) Missing Unit Tests: It looks like the new duplicate detection logic in ExtendedEmailPublisher.java doesn’t have any test coverage. It’d be good to add a test to ensure that when an address appears in both the TO and CC fields the log shows the expected warning, and the location is correctly set.

2) CI Failure on Windows-21: The build on windows-21 is failing due to an error in the 'bat' step. This needs a bit of investigation to figure out what’s going wrong and it should be fixed before the PR can be merged.

3) Non-Deterministic Log Output: Since emailLocations uses a HashMap and the location set is a HashSet the output order of entry.getValue() isn't guaranteed. This could result in inconsistent logging like [CC, TO] or [TO, CC]. Switching to a LinkedHashSet or sorting the set before logging will give you a predictable order every time.

4) Typo: In a few places (like the PR title description and commit message) “accross” should be spelled as “across.”

Vinamra-tech · 2026-03-29T12:04:15Z

Hey @Rashmi1613 and @ArpanC6, thanks for the responses!

@Rashmi1613 that's a fair point about user intent, but I'd push back a little. In practice, having the same address in both CC and BCC is almost always a misconfiguration rather than intentional — most SMTP servers deduplicate at delivery anyway, so the "intentional duplicate" scenario is largely theoretical. Can you point to a concrete real-world case where the same address genuinely needs to be in both fields simultaneously?

That said, I do see one valid concern with my removeIf() approach — the implicit TO > CC > BCC priority. If an address exists in both CC and BCC, my approach retains it in CC and drops it from BCC, which could expose a BCC recipient to others — a potential privacy issue. That part I'll concede.

So here's what I'd propose as a middle ground:

Keep Rashmi's detection + logging for the CC/BCC overlap case specifically, to avoid the privacy risk.
Use removeIf() for TO/CC and TO/BCC overlaps, where there's no privacy concern and silent removal is safe and clean.
This way we get the best of both — clean deduplication where it's safe, and cautious logging where it isn't.

@ArpanC6 on your points — unit tests and the typo fix are fair, agreed on those. On the Windows-21 failure, worth checking the actual bat step error log first — it could easily be an infra flake rather than anything in this PR's code, so I wouldn't treat it as a hard blocker just yet.

On the LinkedHashMap/LinkedHashSet suggestion though — I'd actually push back here. Swapping the data structures just to guarantee log order adds accidental complexity without any functional benefit. Log output order is a debugging convenience, not a correctness requirement. A simpler alternative would be to just sort the location list inline before logging — one line, no structural changes, achieves the same result without introducing unfamiliar collection types that future contributors need to reason about.

Happy to discuss further!

ArpanC6 · 2026-03-29T12:45:06Z

That's a fair pushback sorting inline before logging is indeed simpler and avoids introducing unfamiliar collection types. The middle ground approach makes sense. Happy to see it implemented that way.

…t test

Rashmi1613 · 2026-03-29T14:53:18Z

I agree with the middle-ground approach. Removing duplicates for TO/CC and TO/BCC makes sense since the recipient is already visible, so there’s no change in behavior. For CC/BCC overlaps, I’ll keep it as detection + logging only, since it’s ambiguous and we shouldn’t risk changing visibility.

A real-world case I had in mind is when recipients come from multiple sources , for example, Jenkins might automatically add commit authors to CC,while a user manually adds someone to BCC to keep them hidden. In that situation, the same email can appear in both CC and BCC with different intent, and removing it from BCC could unintentionally expose that recipient and potentially rase privacy concerns.
Also, TO feels like a strong signal that the recipient is meant to be visible, so removing duplicates from CC/BCC when the same email is already in TO is safe. But between CC and BCC, it’s not clear which one was intended, so better not to modify it.

For the ordering part, I’ll keep the change minimal and use inline sorting with a fixed order (TO → CC → BCC) just for logging, instead of changing the data structure.Although LinkedHashSet has better time complexity of O(1)(insertion order) as comapred to sorting (O(nlog n)) , since the dataset is small , it will be constant. So sorting would be better than LinkedHashSet which introduces unnecessary complexity.

ArpanC6 · 2026-03-29T14:57:12Z

I agree with the middle-ground approach. Removing duplicates for TO/CC and TO/BCC makes sense since the recipient is already visible, so there’s no change in behavior. For CC/BCC overlaps, I’ll keep it as detection + logging only, since it’s ambiguous and we shouldn’t risk changing visibility.

A real-world case I had in mind is when recipients come from multiple sources , for example, Jenkins might automatically add commit authors to CC,while a user manually adds someone to BCC to keep them hidden. In that situation, the same email can appear in both CC and BCC with different intent, and removing it from BCC could unintentionally expose that recipient and potentially rase privacy concerns. Also, TO feels like a strong signal that the recipient is meant to be visible, so removing duplicates from CC/BCC when the same email is already in TO is safe. But between CC and BCC, it’s not clear which one was intended, so better not to modify it.

For the ordering part, I’ll keep the change minimal and use inline sorting with a fixed order (TO → CC → BCC) just for logging, instead of changing the data structure.Although LinkedHashSet has better time complexity of O(1)(insertion order) as comapred to sorting (O(nlog n)) , since the dataset is small , it will be constant. So sorting would be better than LinkedHashSet which introduces unnecessary complexity.

The new unit test looks good..However the latest commit seems to have accidentally deleted a large portion of ExtendedEmailPublisherTest.java linux-25 is now showing only 301 tests instead of the expected 366. Could you check if the test file was accidentally truncated during the commit??

Vinamra-tech · 2026-03-29T18:48:03Z

I agree with the middle-ground approach. Removing duplicates for TO/CC and TO/BCC makes sense since the recipient is already visible, so there’s no change in behavior. For CC/BCC overlaps, I’ll keep it as detection + logging only, since it’s ambiguous and we shouldn’t risk changing visibility.

A real-world case I had in mind is when recipients come from multiple sources , for example, Jenkins might automatically add commit authors to CC,while a user manually adds someone to BCC to keep them hidden. In that situation, the same email can appear in both CC and BCC with different intent, and removing it from BCC could unintentionally expose that recipient and potentially rase privacy concerns. Also, TO feels like a strong signal that the recipient is meant to be visible, so removing duplicates from CC/BCC when the same email is already in TO is safe. But between CC and BCC, it’s not clear which one was intended, so better not to modify it.

For the ordering part, I’ll keep the change minimal and use inline sorting with a fixed order (TO → CC → BCC) just for logging, instead of changing the data structure.Although LinkedHashSet has better time complexity of O(1)(insertion order) as comapred to sorting (O(nlog n)) , since the dataset is small , it will be constant. So sorting would be better than LinkedHashSet which introduces unnecessary complexity.

Hey @Rashmi1613, that's actually a great real-world example — the Jenkins auto-CC + manual BCC case makes the ambiguity very concrete and I think that fully justifies keeping detection + logging for the CC/BCC overlap. Good call!!!

The final approach looks solid:

removeIf() for TO/CC and TO/BCC — safe, clean deduplication.
Detection + logging only for CC/BCC — preserves intent, avoids privacy risk.
Inline sort for deterministic log output — minimal and clean.

LGTM!!, Nice work!

detect duplicate recipients accross to,cc,bcc along with location log…

6912925

…ging too

Rashmi1613 requested a review from a team as a code owner March 28, 2026 16:07

fixed formatting issues

986bb80

Add duplicate recipient detection with deterministic ordering and uni…

c88cff0

…t test

Rashmi1613 added 4 commits March 29, 2026 20:48

Add selective deduplication and safe CC/BCC logging

5535961

restored old tests back which ight have been deleted accidenatlly

d47e03e

resolved

d846118

fixed formatting

4ffc165

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect duplicate recipients accross to,cc,bcc#1543

Detect duplicate recipients accross to,cc,bcc#1543
Rashmi1613 wants to merge 7 commits intojenkinsci:mainfrom
Rashmi1613:detect-duplicate-recipients

Rashmi1613 commented Mar 28, 2026

Uh oh!

Vinamra-tech commented Mar 28, 2026

Uh oh!

Rashmi1613 commented Mar 29, 2026

Uh oh!

ArpanC6 commented Mar 29, 2026 •

edited

Loading

Uh oh!

Vinamra-tech commented Mar 29, 2026

Uh oh!

ArpanC6 commented Mar 29, 2026

Uh oh!

Rashmi1613 commented Mar 29, 2026

Uh oh!

ArpanC6 commented Mar 29, 2026

Uh oh!

Vinamra-tech commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rashmi1613 commented Mar 28, 2026

Uh oh!

Vinamra-tech commented Mar 28, 2026

Uh oh!

Rashmi1613 commented Mar 29, 2026

Uh oh!

ArpanC6 commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Great work on the detection and logging approach.. Rashmi’s point about preserving user intent makes a lot of sense silently removing duplicates could mess with configurations where the same address needs to receive both a CC and a BCC copy.

A few technical things to keep in mind -

2) CI Failure on Windows-21: The build on windows-21 is failing due to an error in the 'bat' step. This needs a bit of investigation to figure out what’s going wrong and it should be fixed before the PR can be merged.

4) Typo: In a few places (like the PR title description and commit message) “accross” should be spelled as “across.”

Uh oh!

Vinamra-tech commented Mar 29, 2026

Uh oh!

ArpanC6 commented Mar 29, 2026

That's a fair pushback sorting inline before logging is indeed simpler and avoids introducing unfamiliar collection types. The middle ground approach makes sense. Happy to see it implemented that way.

Uh oh!

Rashmi1613 commented Mar 29, 2026

Uh oh!

ArpanC6 commented Mar 29, 2026

The new unit test looks good..However the latest commit seems to have accidentally deleted a large portion of ExtendedEmailPublisherTest.java linux-25 is now showing only 301 tests instead of the expected 366. Could you check if the test file was accidentally truncated during the commit??

Uh oh!

Vinamra-tech commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArpanC6 commented Mar 29, 2026 •

edited

Loading