String-like objects in ScriptFu's TinyScheme
This is a language reference for the string-like features of Script-fu’s TinyScheme language.
These features are recent changes to ScriptFu version 3. This document does not describe the version 2 behavior.
This feature is different in the upstream TinyScheme. ScriptFu has more than the original TinyScheme because it has been modified to support unichars and bytes. These are not part of R5RS, on which they are recent and optional implementations.
Regarding string-port, SRFI-6 is one specification for string-port behavior, but it does not discuss unichar or byte so we do not use it, implementing string-ports ourselves in C instead.
Overview
Script-fu’s TinyScheme has these string-like objects:
stringstring-port(being of two kinds: input and output)
Both are:
- sequences of chars
- practically infinite
- UTF-8 encoded, i.e. the chars are unichars of multiple bytes
- terminated with the NUL character (the 0 byte)
They are related, you can: initialize an input string-port from a string, or get a string from an output string-port. See Initialize/get methods
Differences between string and string-ports are noted in the topics below:
Allocations
A main concern of string-like objects is how they are allocated and their lifetimes.
All string-like objects are allocated from the heap and with one cell from TinyScheme’s cell pool (which is separately allocated from the heap.)
A string-port and any string used with a string-port are separate objects with separate lifetimes. So, for example, a passed string is not owned by a string-port but is a separate object with its own lifetime.
The length of string-like objects is limited by the underlying allocator (malloc) and the OS.
string allocations
Strings and string literals are allocated. They are allocated exactly.
Any append to a string causes a reallocation.
Any substring of a string causes a new allocation and returns a new string instance.
String literals are allocated but are immutable.
string-port allocations
string-ports do not have substring methods nor indexing.
-
String-ports of kind output have an allocated, internal buffer. A buffer has a “reserve” or free space.
The buffer can sometimes accomodate writes without a reallocation.
Writes to an output string-port can be less expensive (higher performing) than appends to a string, which always reallocates. But note that writes are not the same as appending, because a string written to an output string-port writes escaped quotes into the string-port.The write method can write larger than the size that is pre-allocated for the buffer (256 bytes.)
-
A string-port of kind input is not a buffer. It is allocated once. It’s size is fixed when opened.
Char, byte and object methods
Terminology. We use “read method” to denote the function whose name is “read*”, which is one of the functions: read, read-byte, or read-char. The same goes for “write method” accordingly.
string methods
Strings are composed of characters so have a char method:
string-refaccesses a character component
Strings have no byte methods. Characters can be converted to integers and then to bytes. See: Built-in functions on the byte type
Strings have no object methods: read and write.
string-port methods
Ports, and thus string-ports, have char methods:
read-charwrite-char
And also byte methods. See: Port operations by byte: read, write, and peek
Ports also have methods trafficing in objects:
readwrite
Actually, string-ports implements the full port API (open, read, write, and close).
You should only use the “read” methods on a string-port of kind input. You should only use the “write” methods on a string-port of kind output. A call of a read method on a string-port of kind output returns an error, and vice versa.
Mixing char, byte, and object methods
Length methods
string length
The length of a string is in units of characters. Remember that each character in UTF-8 encoding may comprise many bytes.
(string-length "foo") => 3
string-port length
Ports have no methods for obtaining the length, either in characters or byte units. Some other Schemes have such methods.
Initialize/get methods
The open-input-port method
The method open-input-port takes an initial string.
The initial string can be large, limited only by malloc, or can be empty.
TinyScheme copies the initial string to the port. Subsequently, these have no effect on the port:
- the initial string going out of scope
- an append to the initial string
If the initial string is empty, then the first read will return the EOF object.
If not, subsequent calls to read methods progress along the port contents, until finally EOF is returned.
There are no methods for appending to an input string-port after it is opened.
The open-output-port method
The method open-output-port optionally takes an initial string but it is ignored.
An output string-port is initially empty and not the initial string.
You can write more to an output port than the length of the initial string.
The get-output-string method
The method get-output-string returns a string that is the accumulation of
all prior writes (chars, bytes, and objects) to the output string-port, since
it can be read only by getting its entire contents.
The port must be open at the time of the call.
The returned string is a distinct object from the port. These subsequent events have no effect on the returned string:
- writes to the port
- closing the port
- the port subsequently going out of scope
A get-output-string call on a newly opened empty port returns the empty string.
Consecutive calls to get-output-string return two different string objects, but they are equivalent.