Skip to content

Handle Long Form Names in XLSX#230

Merged
rsh52 merged 6 commits intomainfrom
xlsx-long-form-names
Oct 20, 2025
Merged

Handle Long Form Names in XLSX#230
rsh52 merged 6 commits intomainfrom
xlsx-long-form-names

Conversation

@rsh52
Copy link
Copy Markdown
Collaborator

@rsh52 rsh52 commented Oct 17, 2025

Description

This PR seeks to elegantly handle edge cases where REDCap form names can exceed the 31 character count limit imposed by XLSX sheet names.

Before this we enforced str_trunc() which by default would suffix long names with .... This doesn't work in the case of long names where the first 31 characters are identical.

This PR also considers instances where proposed names given by the new function may already exist, and robustly handles large quantities of long form names. By default the format is to provide a suffix in the form of .(#) within the character limit.

Proposed Changes

List changes below in bullet format:

  • Add new internal sheet_trunc_unique
  • Add tests
  • Add documentation

Issue Addressed

Closes #229

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

  • New/revised functions have associated tests
  • [NA] New/revised functions that update downstream outputs have associated static testing files (.RDS) updated under inst/testdata/create_test_data.R
  • New/revised functions use appropriate naming conventions
  • New/revised functions don't repeat code
  • Code changes are less than 250 lines total
  • Issues linked to the PR using GitHub's list of keywords
  • The appropriate reviewer is assigned to the PR
  • The appropriate developers are assigned to the PR
  • Pre-release package version incremented using usethis::use_version()

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

  • I checked that new files follow naming conventions and are in the right place
  • I checked that documentation is complete, clear, and without typos
  • I added/edited comments to explain "why" not "how"
  • I checked that all new variable and function names follow naming conventions
  • I checked that new tests have been written for key business logic and/or bugs that this PR fixes
  • I checked that new tests address important edge cases

rsh52 added 3 commits October 17, 2025 16:05
Introduces the excel_trunc_unique() utility to generate Excel-compliant, unique sheet names with truncation and numeric suffixes. Updates write_redcap_xlsx to use this function, adds documentation, and includes tests to verify correct behavior.
@rsh52 rsh52 self-assigned this Oct 17, 2025
@rsh52 rsh52 added bug Something isn't working enhancement New feature or request labels Oct 17, 2025
Refactored the function excel_trunc_unique to sheet_trunc_unique for clarity and consistency. Updated all references and documentation accordingly, including tests and Rd files.
@rsh52 rsh52 requested a review from ezraporter October 17, 2025 20:22
@rsh52
Copy link
Copy Markdown
Collaborator Author

rsh52 commented Oct 17, 2025

@ezraporter Leaving this in draft for now but interested in your thoughts. Acknowledging assisted help, there are some environment techniques here that I'm not extremely familiar with but I think the code is relatively clean and works for the issue and associated edge cases I can think of. Not sure if we should be considering opening up additional params and customization to users (like the suffix outputs).

Copy link
Copy Markdown
Collaborator

@ezraporter ezraporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand the idea here for the most part but I would vote against incorporating code this complex to handle such a niche edge case.

Suppose we're committed to solutions other than error and possibly give the user an easier way to specify compliant form names, what's the advantage of this over the simpler suggestion in #229 (comment)?

@ezraporter
Copy link
Copy Markdown
Collaborator

Looking a little closer at the issue I would even suggest truncating at a length less than 30 to handle reasonably large numbers of forms:

sheet_vals <- make.unique(str_trunc(sheet_vals, width = 28), sep = "")

I totally appreciate you diving so deep into this and don't mean to be flippant at all in suggesting a simpler but less robust solution 😅

@rsh52
Copy link
Copy Markdown
Collaborator Author

rsh52 commented Oct 17, 2025

The suggested make.unique() works but doesn't handle the couple of cases I put in the test suite:

  1. If there are >= 10 similarly named sheets (need a more robust way to assign width)
  2. If a qualifying form name exists mimicking the names we're changing (need a more robust way to preserve name and positional location)
  sheet_vals <- c(
    "A",
    "Really Really Long Form Name 111",
    "Really Really Long Form Name 1110",
    "Really Really Long Form Name 112",
    "Really Really Long Form Nam...2", # Qualifying name
    "Really Really Long Form Name 113",
    "Really Really Long Form Name 114",
    "Really Really Long Form Name 115",
    "Really Really Long Form Name 116",
    "Really Really Long Form Name 117",
    "Really Really Long Form Name 118",
    "Really Really Long Form Name 119" # Greater than 10
  )

> make.unique(str_trunc(sheet_vals, width = 30), sep = "")
 [1] "A"                                "Really Really Long Form Nam..."   "Really Really Long Form Nam...1"  "Really Really Long Form Nam...2" 
 [5] "Really Really Long Form Nam...3"  "Really Really Long Form Nam...4"  "Really Really Long Form Nam...5"  "Really Really Long Form Nam...6" 
 [9] "Really Really Long Form Nam...7"  "Really Really Long Form Nam...8"  "Really Really Long Form Nam...9"  "Really Really Long Form Nam...10"

In this case the last element, "Really Really Long Form Nam...10", is 32 characters and fails. And "Really Really Long Form Nam...2" which was position 5 is now position 4.

Acknowledging these are extremely niche edge cases, but figured if a solution was to be implemented it should handle them.

@rsh52
Copy link
Copy Markdown
Collaborator Author

rsh52 commented Oct 17, 2025

I totally appreciate you diving so deep into this and don't mean to be flippant at all in suggesting a simpler but less robust solution 😅

Not flippant at all, appreciated. I'm not married to the idea of committing this, but figured before we say no I can at least show what all the bells and whistles require.

@ezraporter
Copy link
Copy Markdown
Collaborator

  1. If there are >= 10 similarly named sheets (need a more robust way to assign width)
  2. If a qualifying form name exists mimicking the names we're changing (need a more robust way to preserve name and positional location)

Got it! I think 1 is reasonably handled by lowering width a little bit and 2 feels niche enough I'd be comfortable not handling it.

@rsh52
Copy link
Copy Markdown
Collaborator Author

rsh52 commented Oct 20, 2025

Removed function, updated with the simpler make.unique and 28 character limit. Added a new test.

@rsh52 rsh52 marked this pull request as ready for review October 20, 2025 14:35
@pbchase
Copy link
Copy Markdown

pbchase commented Oct 20, 2025

Might I offer a suggestion? When an end user tracks down a bug in your code, writes an issue, offers to submit a PR, answers your questions, provides you with a working patch, reads your feedback, and offers more feedback, maybe you should take the offer of the PR, review that, and let the end user get the commit credit. It's a better way to build a community. Instead, I just feel alienated and ignored. Having talked to Stephan at REDCapCon I had expected more.

@rsh52
Copy link
Copy Markdown
Collaborator Author

rsh52 commented Oct 20, 2025

Hi @pbchase. I am sorry you feel this way, it was not our intention. As the story of this PR indicates, we had a few things to consider in how robust a solution we wanted to implement that we wanted to discuss internally. The purpose was not to alienate you. Since any PR needs to be considered within the scope of our production infrastructure (CI, linting, test suite coverage, etc.), and this was in the end a relatively small change, we figured it would be easier to push it through ourselves. Whether or not a PR results in credit, we acknowledge contributors through our release documentation and news section.

@ezraporter ezraporter self-requested a review October 20, 2025 16:15
@rsh52 rsh52 merged commit b1189f2 into main Oct 20, 2025
4 checks passed
@rsh52 rsh52 deleted the xlsx-long-form-names branch October 20, 2025 17:28
@pbchase
Copy link
Copy Markdown

pbchase commented Oct 20, 2025

@rsh52, @ezraporter, thanks for pushing this PR through so quickly. I'm sorry for my last comment. It was not constructive. I like REDCapTidier and hope to be able to make contributions in the future.

@rsh52
Copy link
Copy Markdown
Collaborator Author

rsh52 commented Oct 20, 2025

@rsh52, @ezraporter, thanks for pushing this PR through so quickly. I'm sorry for my last comment. It was not constructive. I like REDCapTidier and hope to be able to make contributions in the future.

Thank you and apologies for any mishandling on our part. We have documented the contribution in the NEWS.md and will be sure to include it in the next set of release notes. All the best.

@pbchase
Copy link
Copy Markdown

pbchase commented Oct 20, 2025

@rsh52, oh, please, you didn't mishandle anything. Thanks again :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very long form names with differing suffixes crash write_redcap_xlsx()

3 participants