-
Notifications
You must be signed in to change notification settings - Fork 26
Error in read_xml.raw: Input is not proper UTF-8, indicate encoding ! #70
Description
Hi,
Running spelling::spell_check_test() fails on the crosstable package with the following error:
spelling::spell_check_package()
#>Error in read_xml.raw(charToRaw(enc2utf8(x)), "UTF-8", ..., as_html = as_html, :
#> Input is not proper UTF-8, indicate encoding !
#>Bytes: 0x93 0x63 0x79 0x94 [9]I have no clue where this error can come from and the error message is unfortunately not very informative.
Would it be possible to terminate early from spelling instead of xml2 so that the path is in the error message?
Of course, if we can also have the line and the specific bad character, it would be even better!
Note that in this case, UTF8 is the default encoding in the package's DESCRIPTION and in RStudio parameters. R CMD CHECK completes without error so I guess any encoding problem is not that severe, don't you think?
REPREX
- Download this file and open it in RStudio. https://raw.githubusercontent.com/DanChaltiel/crosstable/dd561f3ef405f6621357912c53ab53a6299b99cd/README.md
- There are non-UTF8 characters on rows 147 and 150
- Try to
spell_check()(I useddevtools::spell_check())
EDIT
After more debugging, it seems to pertain to this line:
Line 24 in 008417f
| doc <- xml2::xml_ns_strip(xml2::read_xml(md)) |
In my case, it pointed to my README.md file which indeed contained special characters. I have no idea how they ended up there though, and they are far too numerous that I can correct it manually (a knitting problem from README.Rmd I guess).
EDIT2
Since this confusing problem is not that rare (#52, #58, #62), a fix might be found useful.
Here are some proposals:
- simply use a
tryCatch()onxml2::xml_ns_strip()so that we can addpathin the error message - add a warning in the specific case of non-UTF8 characters:
text <- readLines(path, warn = FALSE, encoding = "UTF-8")
invalid = !validUTF8(text)
if(any(invalid)){
warning(message = c("The file ", path, " has non-UTF-8 characters on rows: ", paste(which(invalid), collapse=", ")))
}- use this trick from
xfun::read_utf8()to ignore the problem (spell_check_package()will have no error):
opts = options(encoding = "native.enc")
on.exit(options(opts), add = TRUE)
text <- readLines(path, warn = FALSE, encoding = "UTF-8")We can do the 3 at the same time. I can make a PR if needed.