`varchar()` measures characters instead of bytes, causing truncation with multibyte UTF-8 strings (SQL Server)

I believe there’s a bug in the `varchar()` helper when used with SQL Server.

https://github.com/r-dbi/odbc/blob/c8a2e2622ddd9c3b87a66022e807e9fcbd31edeb/R/aaa-odbc-data-type.R#L119-L128

The SQL Server documentation for [`VARCHAR(n)`](https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver17#varchar---n--max--) defines `n` as the **number of bytes**, not characters.
Currently, `varchar()` determines `n` using `nchar(type = "chars")`, which underestimates the required width for multibyte UTF-8 strings. This can cause string truncation when inserting data longer than 255 bytes.

#### Reprex
``` r
string <- paste0("\xe2\x80\x99", paste(rep("a", 255), collapse = ""))
nchar(string)
#> [1] 256
nchar(string, type = "bytes")
#> [1] 258
odbc:::varchar(string) # not 258
#> [1] "varchar(256)"
```

<sup>Created on 2025-10-29 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup>

#### Expected

`varchar()` should use the **byte length** of the input (`nchar(x, type = "bytes")`) rather than character count.
That aligns with SQL Server’s definition and avoids truncation with UTF-8 input. 
Using byte length instead of character count should not adversely affect other database backends.

#### Proposed change

I think that this would be the cleanest fix, given `nchar()` supports returning the byte count. 

```r
# before
max_length <- max(c(255, nchar(as.character(x))), na.rm = TRUE)

# after
max_length <- max(c(255, nchar(as.character(x), type = "bytes")), na.rm = TRUE)
```

I plan to open a PR with this proposed change.

	varchar <- function(x, type = "varchar") {
	# at least 255 characters, use max if more than 8000:
	max_length <- max(c(255, nchar(as.character(x))), na.rm = TRUE)

	if (max_length > 8000) {
	max_length <- "max"
	}

	paste0(type, "(", max_length, ")")
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`varchar()` measures characters instead of bytes, causing truncation with multibyte UTF-8 strings (SQL Server) #960

Reprex

Expected

Proposed change

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

varchar() measures characters instead of bytes, causing truncation with multibyte UTF-8 strings (SQL Server) #960

Description

Reprex

Expected

Proposed change

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`varchar()` measures characters instead of bytes, causing truncation with multibyte UTF-8 strings (SQL Server) #960