GIMP Developer Site now have improved navigation!
String-like objects in ScriptFu's TinyScheme

String-like objects in ScriptFu's TinyScheme

This is a language reference for the string-like features of Script-fu’s TinyScheme language.

These features are recent changes to ScriptFu version 3. This document does not describe the version 2 behavior.

This feature is different in the upstream TinyScheme. ScriptFu has more than the original TinyScheme because it has been modified to support unichars and bytes. These are not part of R5RS, on which they are recent and optional implementations.

Regarding string-port, SRFI-6 is one specification for string-port behavior, but it does not discuss unichar or byte so we do not use it, implementing string-ports ourselves in C instead.

Overview

Script-fu’s TinyScheme has these string-like objects:

  • string
  • string-port (being of two kinds: input and output)
There is also an input-output kind of string-port in TinyScheme, but this document does not describe it and any use of it is not supported in ScriptFu.

Both are:

  • sequences of chars
  • practically infinite
  • UTF-8 encoded, i.e. the chars are unichars of multiple bytes
  • terminated with the NUL character (the 0 byte)

They are related, you can: initialize an input string-port from a string, or get a string from an output string-port. See Initialize/get methods

Symbols also have string representations, but no string-like methods besides conversion to and from strings.

Differences between string and string-ports are noted in the topics below:

Allocations

A main concern of string-like objects is how they are allocated and their lifetimes.

All string-like objects are allocated from the heap and with one cell from TinyScheme’s cell pool (which is separately allocated from the heap.)

A string-port and any string used with a string-port are separate objects with separate lifetimes. So, for example, a passed string is not owned by a string-port but is a separate object with its own lifetime.

The length of string-like objects is limited by the underlying allocator (malloc) and the OS.

string allocations

Strings and string literals are allocated. They are allocated exactly.

Any append to a string causes a reallocation.

Any substring of a string causes a new allocation and returns a new string instance.

String literals are allocated but are immutable.

string-port allocations

string-ports do not have substring methods nor indexing.

  1. String-ports of kind output have an allocated, internal buffer. A buffer has a “reserve” or free space.

    The buffer can sometimes accomodate writes without a reallocation.

    Writes to an output string-port can be less expensive (higher performing) than appends to a string, which always reallocates. But note that writes are not the same as appending, because a string written to an output string-port writes escaped quotes into the string-port.

    The write method can write larger than the size that is pre-allocated for the buffer (256 bytes.)

  2. A string-port of kind input is not a buffer. It is allocated once. It’s size is fixed when opened.

Char, byte and object methods

Terminology. We use “read method” to denote the function whose name is “read*”, which is one of the functions: read, read-byte, or read-char. The same goes for “write method” accordingly.

You cannot read the NUL character or byte from a string or string-port since the interpreter always sees it as a terminator. And you should not write the NUL character to strings or string-ports. The result can be surprising.
But you can read and write the NUL character to file ports that you are treating as binary files and not text files.

string methods

Strings are composed of characters so have a char method:

  • string-ref accesses a character component

Strings have no byte methods. Characters can be converted to integers and then to bytes. See: Built-in functions on the byte type

Strings have no object methods: read and write.

string-port methods

Ports, and thus string-ports, have char methods:

  • read-char
  • write-char

And also byte methods. See: Port operations by byte: read, write, and peek

Ports also have methods trafficing in objects:

  • read
  • write

Actually, string-ports implements the full port API (open, read, write, and close).

You should only use the “read” methods on a string-port of kind input. You should only use the “write” methods on a string-port of kind output. A call of a read method on a string-port of kind output returns an error, and vice versa.

Mixing char, byte, and object methods

You should not mix byte methods with char methods, unless you are careful. You must understand UTF-8 encoding to do so.
You should not mix char method with read/write methods, unless you are careful. You must understand parsing and representation of Scheme to do so.

Length methods

string length

The length of a string is in units of characters. Remember that each character in UTF-8 encoding may comprise many bytes.

Scheme
(string-length "foo") => 3

string-port length

Ports have no methods for obtaining the length, either in characters or byte units. Some other Schemes have such methods.

Initialize/get methods

The open-input-port method

The method open-input-port takes an initial string.

The initial string can be large, limited only by malloc, or can be empty.

TinyScheme copies the initial string to the port. Subsequently, these have no effect on the port:

  • the initial string going out of scope
  • an append to the initial string

If the initial string is empty, then the first read will return the EOF object.

If not, subsequent calls to read methods progress along the port contents, until finally EOF is returned.

There are no methods for appending to an input string-port after it is opened.

The open-output-port method

The method open-output-port optionally takes an initial string but it is ignored.

An output string-port is initially empty and not the initial string.

Since the initial string is ignored, it may go out of scope without effect on an output string-port.

You can write more to an output port than the length of the initial string.

The get-output-string method

The method get-output-string returns a string that is the accumulation of all prior writes (chars, bytes, and objects) to the output string-port, since it can be read only by getting its entire contents.

Again, you should not mix write-byte, write-char, and write to an output string-port, without care.

The port must be open at the time of the call.

The returned string is a distinct object from the port. These subsequent events have no effect on the returned string:

  • writes to the port
  • closing the port
  • the port subsequently going out of scope

A get-output-string call on a newly opened empty port returns the empty string.

Consecutive calls to get-output-string return two different string objects, but they are equivalent.

Last updated on