Skip to content

Latest commit

 

History

History
320 lines (267 loc) · 26.8 KB

File metadata and controls

320 lines (267 loc) · 26.8 KB

Available functions & operators

Summary

Operators

Unary operators

!x - boolean negation
-x - numerical negation

Numerical comparison

Warning: those operators will always consider operands as numbers or dates and will try to cast them around as such. For string/sequence comparison, use the operators in the next section.

x == y - numerical equality
x != y - numerical inequality
x < y  - numerical less than
x <= y - numerical less than or equal
x > y  - numerical greater than
x >= y - numerical greater than or equal

String/sequence comparison

Warning: those operators will always consider operands as strings or sequences and will try to cast them around as such. For numerical comparison, use the operators in the previous section.

x eq y - string equality
x ne y - string inequality
x lt y - string less than
x le y - string less than or equal
x gt y - string greater than
x ge y - string greater than or equal

Arithmetic operators

x + y  - numerical addition
x - y  - numerical subtraction
x * y  - numerical multiplication
x / y  - numerical division
x % y  - numerical remainder
x // y - numerical integer division
x ** y - numerical exponentiation

String/sequence operators

x ++ y - string concatenation

Logical operators

x && y     - logical and
x and y
x || y     - logical or
x or y
x in y
x not in y

Indexing & slicing operators

Negative indices are accepted and mean the same thing as with the Python language.

x[y]         - get y from x (string or list index, map key)
x[start:end] - slice x from start index to end index
x[:end]      - slice x from start to end index
x[start:]    - slice x from start index to end

Pipeline operator

using "_" for left-hand side substitution.

trim(name) | len(_)         - Same as len(trim(name))
trim(name) | add(1, len(_)) - Can be nested
add(trim(name) | len, 2)    - Can be used anywhere

Boolean operations & branching

  • and(a, b, *n) -> T: Perform boolean AND operation on two or more values.
  • if(cond, then, else?) -> T: Evaluate condition and switch to correct branch.
  • unless(cond, then, else?) -> T: Shorthand for if(not(cond), then, else?)
  • not(a) -> bool: Perform boolean NOT operation.
  • or(a, b, *n) -> T: Perform boolean OR operation on two or more values.
  • try(T) -> T: Attempt to evaluate given expression and return null if it raised an error.

Comparison

  • eq(s1, s2) -> bool: Test string or list equality.
  • ne(s1, s2) -> bool: Test string or list inequality.
  • gt(s1, s2) -> bool: Test string or list s1 > s2.
  • ge(s1, s2) -> bool: Test string or list s1 >= s2.
  • lt(s1, s2) -> bool: Test string or list s1 < s2.
  • le(s1, s2) -> bool: Test string or list s1 <= s2.

Arithmetics

  • abs(x) -> number: Return absolute value of number.
  • add(x, y, *n) -> number: Add two or more numbers.
  • argmax(numbers, labels?) -> any: Return the index or label of the largest number in the list.
  • argmin(numbers, labels?) -> any: Return the index or label of the smallest number in the list.
  • ceil(x, unit?) -> number: Return the smallest integer greater than or equal to x. Optionally ceil to nearest given unit.
  • div(x, y, *n) -> number: Divide two or more numbers.
  • idiv(x, y) -> number: Integer division of two numbers.
  • int(any) -> int: Cast value as int and raise an error if impossible.
  • float(any) -> float: Cast value as float and raise an error if impossible.
  • floor(x, unit?) -> number: Return the smallest integer lower than or equal to x. Optionally floor to nearest given unit.
  • log(x, base?) -> number: Return the natural or custom base logarithm of x.
  • log2(x) -> number: Return the base 2 logarithm of x.
  • log10(x) -> number: Return the base 10 logarithm of x.
  • max(x, y, *n) -> number: Return the maximum number.
  • max(list_of_numbers) -> number: Return the maximum number.
  • min(x, y, *n) -> number: Return the minimum number.
  • min(list_of_numbers) -> number: Return the minimum number.
  • mod(x, y) -> number: Return the remainder of x divided by y.
  • mul(x, y, *n) -> number: Multiply two or more numbers.
  • neg(x) -> number: Return -x.
  • pow(x, y) -> number: Raise x to the power of y.
  • round(x, unit?) -> number: Return x rounded to the nearest integer. Optionally round to nearest given unit.
  • sqrt(x) -> number: Return the square root of x.
  • sub(x, y, *n) -> number: Subtract two or more numbers.
  • trunc(x, unit?) -> number: Truncate the number by removing its decimal part. Optionally trunc to nearest given unit.

Formatting

  • bytesize(string) -> string: Return a number of bytes in human-readable format (KB, MB, GB, etc.).
  • escape_regex(string) -> string: Escape a string so it can be used safely in a regular expression.
  • fmt(string, *arguments) -> string: Format a string by replacing "{}" occurrences by subsequent arguments.
    Example: fmt("Hello {} {}", name, surname) will replace the first "{}" by the value of the name column, then the second one by the value of the surname column.
    Can also be given a substitution map like so:
    fmt("Hello {name}", {name: "John"}).
  • fmt(string, map) -> string: Format a string by replacing "{}" occurrences by subsequent arguments.
    Example: fmt("Hello {} {}", name, surname) will replace the first "{}" by the value of the name column, then the second one by the value of the surname column.
    Can also be given a substitution map like so:
    fmt("Hello {name}", {name: "John"}).
  • lower(string) -> string: Lowercase string.
  • pad(string, width, char?) -> string: Pad given string with spaces or given character so that it is least given width.
  • lpad(string, width, char?) -> string: Left pad given string with spaces or given character so that it is least given width.
  • rpad(string, width, char?) -> string: Right pad given string with spaces or given character so that it is least given width.
  • printf(format, *arguments) -> string: Apply printf formatting with given format and arguments. Arguments can also be provided as a list.
    For instance: split('John Landy') | printf('first: %s, last: %s', _)
  • numfmt(number, thousands_sep=",", comma=false, significance=5) -> string: Format a number with thousands separator and proper significance.
  • trim(string, chars?) -> string: Trim string of leading & trailing whitespace or provided characters.
  • to_fixed(number, precision) -> string: Format given number using fixed point notation with specified number of decimal places.
  • ltrim(string, chars?) -> string: Trim string of leading whitespace or provided characters.
  • rtrim(string, chars?) -> string: Trim string of trailing whitespace or provided characters.
  • upper(string) -> string: Uppercase string.

Strings

  • count(string, substring) -> int: Count number of times substring appear in string. Or count the number of times a regex pattern matched the strings. Note that only non-overlapping matches will be counted in both cases. Remember a regex pattern must be written with slashes e.g. /france|french/i.
  • count(string, regex) -> int: Count number of times substring appear in string. Or count the number of times a regex pattern matched the strings. Note that only non-overlapping matches will be counted in both cases. Remember a regex pattern must be written with slashes e.g. /france|french/i.
  • endswith(string, substring) -> bool: Test if string ends with substring.
  • match(string, regex, group) -> string: Return a regex pattern match on the string. Remember a regex pattern must be written with slashes e.g. /france|french/i.
  • replace(string, substring, replacement) -> string: Replace all non-overlapping occurrences of substring in given string with provided replacement. Can also replace regex pattern matches. Remember a regex pattern must be written with slashes e.g. /france|french/i.
    See regex replacement string syntax documentation here:
    https://docs.rs/regex/latest/regex/struct.Regex.html#replacement-string-syntax
  • replace(string, regex, replacement) -> string: Replace all non-overlapping occurrences of substring in given string with provided replacement. Can also replace regex pattern matches. Remember a regex pattern must be written with slashes e.g. /france|french/i.
    See regex replacement string syntax documentation here:
    https://docs.rs/regex/latest/regex/struct.Regex.html#replacement-string-syntax
  • split(string, substring, max?) -> list: Split a string by a given separator substring. Can also split using a regex pattern. Remember a regex pattern must be written with slashes e.g. /france|french/i.
  • split(string, regex, max?) -> list: Split a string by a given separator substring. Can also split using a regex pattern. Remember a regex pattern must be written with slashes e.g. /france|french/i.
  • startswith(string, substring) -> bool: Test if string starts with substring.

Strings, lists and maps

  • concat(string, *strings) -> string: Concatenate given strings into a single one.
  • contains(string, substring) -> bool: If target is a string: return whether substring can be found in it or return whether given regular expression matched.
    If target is a list, returns whether given item was found in it.
    If target is a map, returns whether given key was found in it.
  • contains(string, regex) -> bool: If target is a string: return whether substring can be found in it or return whether given regular expression matched.
    If target is a list, returns whether given item was found in it.
    If target is a map, returns whether given key was found in it.
  • contains(list, item) -> bool: If target is a string: return whether substring can be found in it or return whether given regular expression matched.
    If target is a list, returns whether given item was found in it.
    If target is a map, returns whether given key was found in it.
  • contains(map, key) -> bool: If target is a string: return whether substring can be found in it or return whether given regular expression matched.
    If target is a list, returns whether given item was found in it.
    If target is a map, returns whether given key was found in it.
  • first(seq) -> T: Get first char of string or first item of list.
  • last(seq) -> T: Get last char of string or first item of list.
  • len(seq) -> int: Get number of chars in string or number of items in list.
  • get(string, index, default?) -> any: If target is a string, return the nth unicode char. If target is a list, return the nth item. Indices are zero-based and can be negative to access items in reverse. If target is a map, return the value associated with given key. All variants can also take a default value when desired item is not found.
  • get(list, index, default?) -> any: If target is a string, return the nth unicode char. If target is a list, return the nth item. Indices are zero-based and can be negative to access items in reverse. If target is a map, return the value associated with given key. All variants can also take a default value when desired item is not found.
  • get(map, key, default?) -> any: If target is a string, return the nth unicode char. If target is a list, return the nth item. Indices are zero-based and can be negative to access items in reverse. If target is a map, return the value associated with given key. All variants can also take a default value when desired item is not found.
  • slice(seq, start, end?) -> seq: Return slice of string or list.

Lists

  • all(list, lambda) -> bool: Returns whether the given lambda returned true for all elements of the list.
    For instance: all(names, name.startswith('A'))
  • any(list, lambda) -> bool: Returns whether the given lambda returned true for any element of the list.
    For instance: any(names, name.startswith('A'))
  • compact(list) -> list: Drop all falsey values from given list.
  • filter(list, lambda) -> list: Return a list containing only elements for which given lambda returned true.
    For instance: filter(names, name => name.startswith('A'))
  • find(list, lambda) -> any?: Return the first item of a list for which given lambda returned true.
    For instance: find(names, name => name.startswith('A'))
  • find_index(list, lambda) -> int?: Return the index of the first item of a list for which given lambda returned true.
    For instance: find_index(names, name => name.startswith('A'))
  • index_by(list, key) -> map: Take a list of maps and a key name and return an indexed map from selected keys to the original maps.
  • join(list, sep) -> string: Join list by separator.
  • map(list, lambda) -> list: Return a list with elements transformed by given lambda.
    For instance: map(numbers, n => n + 3)
  • mean(numbers) -> number?: Return the mean of the given numbers.
  • range(stop) -> list[number]: Return the specified range as a list of integers.
  • range(start, stop, step=1) -> list[number]: Return the specified range as a list of integers.
  • repeat(string_or_list, times) -> string_or_list: Repeat target string or list n times.
  • sum(numbers) -> number?: Return the sum of the given numbers, or nothing if the sum overflowed.

Maps

  • keys(map) -> [string]: Return a list of the map's keys.
  • values(map) -> [T]: Return a list of the map's values.

Dates & time

  • datetime(string, format=?) -> zoned?_datetime: Attempt to parse a datetime with or without timezone info from given string. If no format is provided, string is parsed using ISO 8601 date format.
    https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications
  • date(string_or_datetime, format=?) -> date: If given a datetime, will return its date component. Else, attempt to parse a date from given string. If no format is provided, string is parsed using ISO 8601 date format.
    https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications
  • time(string_or_datetime, format=?) -> time: If given a datetime, will return its time component. Else, attempt to parse a time from given string. If no format is provided, string is parsed using ISO 8601 time format.
    https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications
  • span(string) -> span: Parse given string as a time span that can be added or subtracted to temporal elements.
    Format: https://docs.rs/jiff/latest/jiff/struct.Span.html#parsing-and-printing
  • now() -> zoned_datetime: Return current datetime in local timezone.
  • from_timestamp(int_or_float) -> zoned_datetime: Interpret given int as seconds timestamp, or given float as seconds timestamp with fractional subseconds component.
  • from_timestamp_ms(int) -> zoned_datetime: Interpret given int as milliseconds timestamp.
  • to_timestamp(zoned_datetime) -> int_or_float: Convert given datetime to seconds timestamp or seconds with fractional subseconds timestamp if datetime has enough precision. Will error if given datetime has no timezone info.
  • to_timestamp_ms(zoned_datetime) -> int: Convert given datetime to milliseconds timestamp. Will error if given datetime has no timezone info.
  • earliest(t1, t2, *tn) -> temporal: Return the earliest point in time. Expects homogeneous types (all dates, all datetimes etc.).
  • earliest(list_of_temporals) -> temporal: Return the earliest point in time. Expects homogeneous types (all dates, all datetimes etc.).
  • latest(t1, t2, *tn) -> temporal: Return the latest point in time. Expects homogeneous types (all dates, all datetimes etc.).
  • latest(list_of_temporals) -> temporal: Return the latest point in time. Expects homogeneous types (all dates, all datetimes etc.).
  • fractional_days(t1, t2) -> float: Returns number of days between two points in time, as a signed float. Expect homogenous types (2 dates, 2 datetimes etc.).
  • strftime(target, format) -> string: Format temporal value according to format.
    https://docs.rs/jiff/latest/jiff/fmt/strtime/index.html#conversion-specifications
  • to_timezone(zoned_datetime, timezone) -> zoned_datetime (aliases: to_tz): Convert given datetime to given timezone. Will error if given datetime has no timezone info.
  • to_local_timezone(zoned_datetime) -> zoned_datetime (aliases: to_local_tz): Convert given datetime to local timezone. Will error if given datetime has no timezone info.
  • with_timezone(datetime, timezone) -> zoned_datetime (aliases: with_tz): Arbitrarily indicate that given civil datetime should be understood as being in given timezone. Will error if given datetime already has timezone info.
  • with_local_timezone(datetime) -> zoned_datetime (aliases: with_local_tz): Arbitrarily indicate that given civil datetime should be understood as being in local timezone. Will error if given datetime already has timezone info.
  • without_timezone(zoned_datetime) -> datetime (aliases: without_tz): Return the civil datetime of a datetime with timezone info. Will error if given datetime has no timezone info.
  • year_month_day(target) -> string (aliases: ymd): Extract the year, month and day of a datetime. If the input is a string, first parse it into datetime, and then extract the year, month and day.
    Equivalent to strftime(string, format="%Y-%m-%d").
  • month_day(target) -> string: Extract the month and day of a datetime. If the input is a string, first parse it into datetime, and then extract the month and day.
    Equivalent to strftime(string, format="%m-%d").
  • month(target) -> string: Extract the month of a datetime. If the input is a string, first parse it into datetime, and then extract the month.
    Equivalent to strftime(string, format="%m").
  • year(target) -> string: Extract the year of a datetime. If the input is a string, first parse it into datetime, and then extract the year.
    Equivalent to strftime(string, format="%Y").
  • year_month(target) -> string (aliases: ym): Extract the year and month of a datetime. If the input is a string, first parse it into datetime, and then extract the year and month.
    Equivalent to strftime(string, format="%Y-%m").

Urls & web-related

  • html_unescape(string) -> string: Unescape given HTML string by converting HTML entities back to normal text.
  • lru(string) -> string: Convert the given URL to LRU format.
    For more info, read this: https://github.com/medialab/ural#about-lrus
  • mime_ext(string) -> string: Return the extension related to given mime type.
  • parse_dataurl(string) -> [string, bytes]: Parse the given data url and return its mime type and decoded binary data.
  • urljoin(string, string) -> string: Join an url with the given addendum.

Fuzzy matching & information retrieval

  • fingerprint(string) -> string: Fingerprint a string by normalizing characters, re-ordering and deduplicating its word tokens before re-joining them by spaces.
  • soundex(name) -> string: Compute the SOUNDEX code (a phonetic encoding) of given name.
  • refined_soundex(name) -> string: Compute the refined SOUNDEX code (a phonetic encoding) of given name.
  • phonogram(name) -> string: Compute the "phonogram" code (yomguithereal's own phonetic encoding) of given name.
  • carry_stemmer(string) -> string: Apply the "Carry" stemmer targeting the French language.
  • s_stemmer(string) -> string: Apply a very simple stemmer removing common plural inflexions in some languages.
  • unidecode(string) -> string: Convert string to ascii as well as possible.

Utils

  • col() -> bytes: Without argument, return current column's value, if relevant. Else, return value for given column, by name, by position or by name & nth, in case of duplicate header names.
  • col(name_or_pos, nth?) -> bytes: Without argument, return current column's value, if relevant. Else, return value for given column, by name, by position or by name & nth, in case of duplicate header names.
  • col?(name_or_pos, nth?) -> bytes: Return value of cell for given column, by name, by position or by name & nth, in case of duplicate header names. Allow selecting inexisting columns, in which case it will return null.
  • header() -> bytes: Without argument, return current column's name, if relevant. Else, return header name for given column, by name, by position or by name & nth, in case of duplicate header names.
  • header(name_or_pos, nth?) -> bytes: Without argument, return current column's name, if relevant. Else, return header name for given column, by name, by position or by name & nth, in case of duplicate header names.
  • header?(name_or_pos, nth?) -> bytes: Return header namefor given column, by name, by position or by name & nth, in case of duplicate header names. Allow selecting inexisting columns, in which case it will return null.
  • col_index() -> bytes: Without argument, return current column's zero-based index, if relevant. Else, return zero-based index of given column, by name, by position or by name & nth, in case of duplicate header names.
  • col_index(name_or_pos, nth?) -> bytes: Without argument, return current column's zero-based index, if relevant. Else, return zero-based index of given column, by name, by position or by name & nth, in case of duplicate header names.
  • col_index?(name_or_pos, nth?) -> bytes: Return zero-based index of given column, by name, by position or by name & nth, in case of duplicate header names. Allow selecting inexisting columns, in which case it will return null.
  • cols(from_name_or_pos?, to_name_or_pos?) -> list[bytes]: Return list of cell values from the given column by name or position to another given column by name or position, inclusive. Can also be called with a single argument to take a slice from the given column to the end, or no argument at all to take all columns.
  • prev_col(offset=1) -> bytes: Return cell value of column just before current column. Take an optional offset if you want a larger stride.
  • next_col(offset=1) -> bytes: Return cell value of column just after current column, by an optional offset. Take an optional offset if you want a larger stride.
  • err(msg) -> error: Make the expression return a custom error.
  • headers(from_name_or_pos?, to_name_or_pos?) -> list[string]: Return list of header names from the given column by name or position to another given column by name or position, inclusive. Can also be called with a single argument to take a slice from the given column to the end, or no argument at all to return all headers.
  • row_index() -> int?: Return current row's zero-based index, if relevant.
  • regex(string) -> regex: Parse given string as regex. Useful when your patterns are dynamic, e.g. built from a CSV cell. Else prefer using regex literals e.g. "/test/".
  • typeof(value) -> string: Return type of value.

IO & path wrangling

  • abspath(string) -> string: Return absolute & canonicalized path.
  • basename(path, suffix?) -> string: Return the final component of given path, usually the file name, all while stripping it of an optional suffix.
  • cmd(string, list[string]) -> bytes: Run a command using the provided list of arguments as a subprocess and return the resulting bytes trimmed of trailing whitespace.
  • copy(source_path, target_path) -> string: Copy a source to target path. Will create necessary directories on the way. Returns target path as a convenience.
  • dirname(path) -> string: Return target path without final component if any.
  • ext(path) -> string?: Return the path's extension, if any.
  • filesize(string) -> int: Return the size of given file in bytes.
  • isfile(string) -> bool: Return whether the given path is an existing file on disk.
  • move(source_path, target_path) -> string: Move a source to target path. Will create necessary directories on the way. Returns target path as a convenience.
  • parse_json(string) -> any: Parse the given string as JSON.
  • parse_py_literal(string) -> any: Parse the given string as a python literal.
  • pathjoin(string, *strings) -> string (aliases: pjoin): Join multiple paths correctly.
  • read(path, encoding="utf-8", errors="strict") -> string: Read file at path. Default encoding is "utf-8". Default error handling policy is "replace", and can be one of "replace", "ignore" or "strict".
  • read_csv(path) -> list[map]: Read and parse CSV file at path, returning its rows as a list of maps with headers as keys.
  • read_json(path) -> any: Read and parse JSON file at path.
  • shell(string) -> bytes: Convenience function running cmd("$SHELL -c <command>") on unix-like systems and cmd("cmd \C <command>") on Windows.
  • shlex_split(string) -> list[string]: Split a string of command line arguments into a proper list that can be given to e.g. the cmd function.
  • write(string, path) -> string: Write string to path as utf-8 text. Will create necessary directories recursively before actually writing the file. Return the path that was written.

Randomness & hashing

  • md5(string) -> string: Return the md5 hash of string in hexadecimal representation.
  • random() -> float: Return a random float between 0 and 1.
  • uuid() -> string: Return a uuid v4.