Module utf8
Basic UTF8 character counting support for Luakit
This module provides a partial implementation of the Lua 5.3 UTF-8 library.
Functions
utf8.len (s, begin, end)
Return the number of characters (not bytes) of a UTF-8-encoded string.
If the optional parameters begin and/or end are given, then characters within s will only be counted if they begin between positions begin and end (both inclusive).
An error is raised if s (or the characters that start in the slice from begin to end) contains invalid UTF8 characters, of if begin or end point to byte indices not in s.
Parameters
-
sType: stringThe string whose length is to be returned.
-
beginType: integerOptionalDefault: 1Only consider
sfrom (1-based byte) indexbeginonwards. If negative, count fromendofs(with -1 being the last byte). -
endType: integerOptionalDefault: -1Only consider
sup to and including (1-based byte) indexend. If negative, count fromendofs(with -1 being the last byte).
Return Values
-
integerThe length (in UTF8 characters) of
s.
utf8.offset (string, woffset, base)
Convert an offset (in UTF8 characters) to a byte offset.
If optional parameter base is given and positive, count characters starting from (byte) index base.
An error is raised if base is smaller than 1 or larger than the (byte) length of string, or if base points to a byte inside string that is not the starting byte of a UTF8 encoding.
Examples
utf8.offset("abc",2,2)would return3utf8.offset("abc",-3)would return1
Parameters
-
stringType: stringThe string in which offsets should be converted.
-
woffsetType: integerThe offset (1-based, in UTF8 characters) which should be converted.
-
baseType: integerOptionalA (1-based byte) index in
string. Defaults to 1 ifwoffsetis positive, and to the (byte) length ofstringifwoffsetis negative. See the description above.
Return Values
-
integerThe (1-based) byte offset of the
woffset-th UTF8 character instring.
Attribution
Copyright
- 2017 Dennis Hofheinz