Detect duplicate recipients accross to,cc,bcc#1543
Detect duplicate recipients accross to,cc,bcc#1543Rashmi1613 wants to merge 7 commits intojenkinsci:mainfrom
Conversation
|
Hi @Rashmi1613 I was wondering if there's a bettter approach to it like first creating a set of seen emails and then applying removeIf() to each of the to , cc , bcc list like this : Set seenEmails = new HashSet<>(); to.removeIf(addr -> !seenEmails.add(addr.getAddress().toLowerCase())); cc.removeIf(addr -> !seenEmails.add(addr.getAddress().toLowerCase())); bcc.removeIf(addr -> !seenEmails.add(addr.getAddress().toLowerCase())); I would be happy to know if you justify my approach misses something. |
|
Thanks for the suggestion, that approach is definitely clean and enforces a clear TO > CC > BCC priority. One thing I was considering is whether duplicates across TO/CC/BCC might sometimes be an intentional configuration and if removed it might break user intent and change behaviour. That’s why I initially did detection + logging to preserve user intent. Happy to switch to removal if we want to enforce removal+logging. Let me know what direction you’d prefer |
Great work on the detection and logging approach.. Rashmi’s point about preserving user intent makes a lot of sense silently removing duplicates could mess with configurations where the same address needs to receive both a CC and a BCC copy.A few technical things to keep in mind -1) Missing Unit Tests: It looks like the new duplicate detection logic in ExtendedEmailPublisher.java doesn’t have any test coverage. It’d be good to add a test to ensure that when an address appears in both the TO and CC fields the log shows the expected warning, and the location is correctly set.2) CI Failure on Windows-21: The build on windows-21 is failing due to an error in the 'bat' step. This needs a bit of investigation to figure out what’s going wrong and it should be fixed before the PR can be merged.3) Non-Deterministic Log Output: Since emailLocations uses a HashMap and the location set is a HashSet the output order of entry.getValue() isn't guaranteed. This could result in inconsistent logging like [CC, TO] or [TO, CC]. Switching to a LinkedHashSet or sorting the set before logging will give you a predictable order every time.4) Typo: In a few places (like the PR title description and commit message) “accross” should be spelled as “across.” |
|
Hey @Rashmi1613 and @ArpanC6, thanks for the responses! @Rashmi1613 that's a fair point about user intent, but I'd push back a little. In practice, having the same address in both CC and BCC is almost always a misconfiguration rather than intentional — most SMTP servers deduplicate at delivery anyway, so the "intentional duplicate" scenario is largely theoretical. Can you point to a concrete real-world case where the same address genuinely needs to be in both fields simultaneously? That said, I do see one valid concern with my So here's what I'd propose as a middle ground:
@ArpanC6 on your points — unit tests and the typo fix are fair, agreed on those. On the Windows-21 failure, worth checking the actual On the Happy to discuss further! |
That's a fair pushback sorting inline before logging is indeed simpler and avoids introducing unfamiliar collection types. The middle ground approach makes sense. Happy to see it implemented that way. |
|
I agree with the middle-ground approach. Removing duplicates for TO/CC and TO/BCC makes sense since the recipient is already visible, so there’s no change in behavior. For CC/BCC overlaps, I’ll keep it as detection + logging only, since it’s ambiguous and we shouldn’t risk changing visibility. A real-world case I had in mind is when recipients come from multiple sources , for example, Jenkins might automatically add commit authors to CC,while a user manually adds someone to BCC to keep them hidden. In that situation, the same email can appear in both CC and BCC with different intent, and removing it from BCC could unintentionally expose that recipient and potentially rase privacy concerns. For the ordering part, I’ll keep the change minimal and use inline sorting with a fixed order (TO → CC → BCC) just for logging, instead of changing the data structure.Although LinkedHashSet has better time complexity of O(1)(insertion order) as comapred to sorting (O(nlog n)) , since the dataset is small , it will be constant. So sorting would be better than LinkedHashSet which introduces unnecessary complexity. |
The new unit test looks good..However the latest commit seems to have accidentally deleted a large portion of ExtendedEmailPublisherTest.java linux-25 is now showing only 301 tests instead of the expected 366. Could you check if the test file was accidentally truncated during the commit?? |
Hey @Rashmi1613, that's actually a great real-world example — the Jenkins auto-CC + manual BCC case makes the ambiguity very concrete and I think that fully justifies keeping detection + logging for the CC/BCC overlap. Good call!!! The final approach looks solid:
LGTM!!, Nice work! |
Duplicate recipients within each list i.e. to,cc,bcc are already handles using sets, but the same email email address can still appear accross different recipient types.
Changes done here,
This improves observability and helps in identification of any type of misconfiguration.