The SUBSTRING signature specifies manipulations on an abstract
representation of a sequence of contiguous characters in a string. A
substring value can be modeled as a triple (s, i, n), where s is the
underlying string, i is the starting index, and n is the size of the
substring, with the constraint that 0 <= i <= i + n <= |s|.
The substring type and its attendant functions provide a convenient
abstraction for performing a variety of common analyses of strings,
such as finding the leftmost occurrence, if any, of a character in a
string. In addition, using the substring functions avoids much of the
copying and bounds checking that occur if similar operations are
implemented solely in terms of strings.
The SUBSTRING signature is matched by two structures, the required
Substring and the optional WideSubstring. The former is a companion
structure to the Char and String structures, which are based on the
extended ASCII 8-bit character set. The structure WideSubstring is
related in the same way to the structures WideChar and WideString,
which are based on characters of some size greater than or equal to 8
bits. In particular, the types Substring.string and Substring.char are
identical to those types in the structure String and, when
WideSubstring is defined, the types WideSubstring.string and
WideSubstring.char are identical to those types in the structure
WideString.
All of these connections are made explicit in the Text and WideText
structures, which match the TEXT signature. In the exposition below,
references to a String structure refers to the substructure of that
name defined in either the Text or the WideText structure, which ever
is appropriate.
returns the i(th) character in the substring, counting
from the beginning of s. It is equivalent to String.sub(string s,
i). The exception Subscript is raised unless 0 <= i < |s|.
returns a triple (s, i, n) giving a concrete representation
of the substring. Here s is the underlying string, i is the starting
index, and n is the size of the substring. It holds that 0 <= i <= i +
n <= |s|.
If opt=NONE, the function returns the substring
of s from the i(th) character to the end of the string, i.e., the
string s[i..|s|-1]. If opt=SOME j, the function returns the substring
of size j starting at index i, i.e., the string s[i..i+j-1]. Raises
Subscript unless 0 <= i <= |s|. The function must perform bounds
checking in such a way that the Overflow exception is not raised.
returns the substring s[i..i+j-1], i.e., the
substring of size j starting at index i. This is equivalent to
extract(s, i, SOME j). We require that base o substring be the
identity function on valid arguments. An implementation must perform
bounds checking in such a way that the Overflow exception is not
raised.
removes k characters from the left of the substring s. If
k is greater than the size of the substring, an empty substring is
returned. Specifically, for substring ss = substring(s, i, j) and k <=
j, we have (triml k ss = substring(s, i+k, j-k)). The exception
Subscript is raised if k < 0. This exception is raised when (triml k)
is evaluated.
removes k characters from the right of the substring s. If
k is greater than the size of the substring, an empty substring is
returned. Specifically, for substring ss = substring(s, i, j) and k <=
j, we have (trimr k ss = substring(s, i, j-k)). The exception
Subscript is raised if k < 0. This exception is raised when (trimr k)
is evaluated.
returns a substring of s starting at the i(th)
character. If opt=SOME m, the size of the resulting substring is
m. Otherwise, the size is |s| - i. To be valid, the arguments in the
first case must satisfy 0 <= i, 0 <= m and i + m <= |s|. In the second
case, the arguments must satisfy 0 <= i <= |s|. If the arguments are
not valid, the exception Subscript is raised.
returns a string that is the concatenation of the
substrings in l. This is equivalent to String.concat o (List.map
string). This raises Size if the sum of all the sizes is greater than
the corresponding maxSize for the string type.
returns the concatenation of the substrings in the
list l using the string s as a separator. Raises Size if the size of
the resulting string would be greater than maxSize for the string
type.
scans s from left to right looking for the first
character that does not satisfy the predicate f. It returns the pair
(ls, rs) giving the split of the substring into the span up to that
character and the rest. ls is the left side of the split, and rs is
the right side.
scans s from right to left looking for the first
character that does not satisfy the predicate f. It returns the pair
(ls, rs) giving the split of the substring into the span up to that
character and the rest. ls is the left side of the split, and rs is
the right side.
returns the pair of substring (ss, ss'), where ss
contains the first i characters of s and ss' contains the rest,
assuming 0 <= i <= size s. Otherwise, it raises Subscript.
scans the substring s, left to right, for the first
character not satisfying the predicate p. The function drops the
maximal substring consisting of characters satisfying the
predicate. Equivalent to #2(splitl p s).
scans the substring s, right to left, for the first
character not satisfying the predicate p. The function drops the
maximal substring consisting of characters satisfying the
predicate. Equivalent to #2(splitr p s).
scans the substring s, left to right, for the first
character not satisfying the predicate p. The function returns the
maximal substring consisting of characters satisfying the
predicate. Equivalent to #1(splitl p s).
scans the substring s, right to left, for the first
character not satisfying the predicate p. The function returns the
maximal substring consisting of characters satisfying the
predicate. Equivalent to #1(splitr p s).
splits the substring ss into a pair (pref, suff) of
substrings, where suff is the longest suffix of ss that has s as a
prefix and pref is the prefix of ss preceding suff. More precisely,
let m be the size of s and let ss correspond to the substring (s', i,
n). If there is a least index k >= i such that s = s'[k..k+m-1], then
suff corresponds to (s', k, n+i-k) and pref corresponds to (s', i,
k-i). If there is no such k, then suff is the empty substring
corresponding to (s', i+n, 0) and pref corresponds to (s', i, n),
i.e., all of ss.
produces a substring composed of a prefix ss, suffix
ss', plus all intermediate characters in the underlying string. It
raises Span if ss and ss' are not substrings of the same underlying
string or if the start of ss is to the right of the end of ss'. More
precisely, if we have ((s, i, n) = base ss) and ((s', i', n') = base
ss'), then span returns substring(s, i, (i'+n')-i) unless s <> s' or
i'+n' < i, in which case it raises Span. Note that this does not
preclude ss' from beginning to the left of ss, or ss from ending to
the right of ss'. This function allows one to scan for a substring
using multiple pieces and then coalescing the pieces. When applied to
substrings derived from the identical base string, the string equality
test should be constant time. This can be achieved by first doing a
pointer test and, only if that fails, then checking the strings
character by character.
applies f to every character of s, from left to right,
and returns the concatenation of the results. This is equivalent to
String.concat(List.map f (explode s)).
decomposes a substring into a list of tokens from left to
right. A token is a non-empty maximal substring not containing any
delimiter. A delimiter is a character satisfying predicate f. Two
tokens may be separated by more than one delimiter. For example, if
the only delimiter is the character #"|", then the substring
"|abc||def" contains two tokens "abc" and "def".
decomposes a substring into a list of fields from left to
right. A field is a (possibly empty) maximal substring of s not
containing any delimiter. A delimiter is a character satisfying
predicate f. Two fields are separated by exactly one delimiter. For
example, if the only delimiter is the character #"|", then the
substring "|abc||def" contains the four fields "", "abc", "" and
"def".