Skip to content
Maarten Hilferink edited this page Apr 8, 2026 · 1 revision

String functions to_utf

The to_utf function converts strings from the system's local encoding to UTF-8.

syntax

to_utf(strings: E->String) -> E->String

definition

Converts strings from the current system locale encoding (e.g., Windows-1252, Latin-1) to UTF-8 encoding. This is useful when:

  • Reading data from legacy systems with non-UTF-8 encoding
  • Processing files created with Windows code pages
  • Ensuring consistent UTF-8 output

UTF-8 is the standard encoding for:

  • Web content
  • Modern databases
  • Cross-platform data exchange

arguments

argument description type
strings Strings in local/system encoding E->String

performance

Time complexity: O(n × L) where n is the number of strings and L is the average string length.

UTF-8 encoded strings may be longer than input strings for characters outside the ASCII range.

conditions

  • Input is assumed to be in the system's default encoding
  • Invalid byte sequences may produce replacement characters or errors
  • Already UTF-8 encoded strings should not be converted again

example

unit<uint32> LegacyData: nrofrows = 100;
attribute<String> names (LegacyData);  // read from Windows-1252 encoded file

// Convert to UTF-8 for consistent processing
attribute<String> names_utf8 (LegacyData) := to_utf(names);

// Now safe to use with UTF-8 aware operations

see also

since version

7.0

Clone this wiki locally