Table of Contents
Character strings are mostly associated with dynamic memory and garbage collection. That is an overkill with the string handling that is used in most Forth programs. In particular we can get by with buffers that are statically allocated using CREATE. It is still useful to lift manipulating single characters to manipulating strings as a whole.
(We define) A few words that make string manipulation in forth a little smoother.
Original idea Albert van der Horst. Examples are:
- Manipulate files
- Start programs
- Add, delete and use folders/directories
Construction of strings
Strings in Forth are of the type address & length. The length is stored in front of the string. There are two views possible. The classic view is to store the length in a byte.
The so called counted strings, as is shown in the picture:
However it more useful to use a construct of a cell containing a byte count followed by that many bytes. You should not store a count in a character unless there is a dire need to conserve space. Also a string in Chinese is some of the UTF representation is a sequence, where an interpretation of a byte is dependant on previous bytes. The abstraction presented here work equally well on this kind of data structure.
Function: $VARIABLE reserve a buffer for the count-byte + 'maxlen' characters Alternatively" reserve a buffer for the count-cell + 'maxlen' characters Define: ( maxlen "name" -- ) Save maxlen & buffer-address Action: ( -- s ) Leave address of string variable Function: $@ ( s -- c ) Read counted string from address Function: $+! ( c s -- ) Extend counted string at address Function: $! ( c s -- ) Store counted string at address Function: $. ( c -- ) Print counted string Function: $C+! ( char s -- ) Add one character to counted string at address
The original idea also contains :
$^ $? $/ $\
See the reference in the introduction.
Two tools, idea Albert Nijhof:
Function: -HEAD ( adr len i -- adr' len' ) cut first 'i' characters from string Function: -TAIL ( adr len i -- adr len' ) cut last 'i' characters from string
However that flies in the face of the goals mentionned in the introduction. We promised to get rid of characters, never count characters, only concern ourselves with strings.
A better example in this context is:
Function: -TRAILING ( c -- c' ) remove trailing blanks space from string. Function: -LEADING ( c -- c' ) remove leading blanks space from string.
The idea of strings is that a character string (s) is in fact a counted string (c) that has been stored. s (c-addr) is the string, c (c-addr u) is constant string
: $VARIABLE \ Reserve space for a string buffer here swap 1+ allot align \ Reserve RAM buffer create ( here) , ( +n "name" -- ) does> @ ; ( -- s ) : C+! ( n a -- ) >r r@ c@ + r> c! ; \ Incr. byte with n at a : $@ ( s -- c ) count ; \ Fetch string : $+! ( c s -- ) >r tuck r@ $@ + swap cmove r> c+! ; \ Extend string : $! ( c s -- ) 0 over c! $+! ; \ Store string : $. ( c -- ) type ; \ Print string : $C+! ( char s -- ) dup >r $@ + c! 1 r> c+! ; \ Add char to string
The version where the count is stored in a cell is hardly different, but simpler.
Note that it uses the non Generic Forth word
@+ you can find an implementation example in
the well known words list.
: $VARIABLE \ Reserve space for a string buffer here swap CELL+ allot align \ Reserve RAM buffer create ( here) , ( +n "name" -- ) does> @ ; ( -- s ) : $@ ( s -- c ) @+ ; \ Fetch string : $+! ( c s -- ) >r tuck r@ $@ + swap cmove r> +! ; \ Extend string : $! ( c s -- ) 0 over ! $+! ; \ Store string : $. ( c -- ) type ; \ Print string : $C+! ( char s -- ) dup >r $@ + c! 1 r> +! ; \ Add char to string
Have a look at the sub directories for implementations for different systems.
- String word sets
Note that Albert Nijhof's string version puts the address of the structure of the
$VARIABLE on the stack. The original example puts the address of the string on the stack. Functionally they are equivalent.
| || ||Read string variable|
| || ||Add string to string variable|
| || ||Store string in string variable|
| || ||Type string|
| || ||Add char to string variable|
Two string tools as implemented by Albert Nijhof.
-HEAD cuts the first 'i' characters from the given string.
-TAIL cuts the last 'i' characters from the given string.
\ Extra: cut i characters from a string, with underflow protection : -TAIL ( adr len i -- adr len' ) 0 max over min - ; : -HEAD ( adr len i -- adr' len' ) 0 max over min tuck - >r + r> ; \ -HEAD and -TAIL do not store anything.