=== Lesson 3 === \ Lesson 3 - How Forth Works \ The Forth Course \ by Richard E. Haskell \ Dept. of Computer Science and Engineering \ Oakland University, Rochester, MI 48309 comment: Lesson 3 HOW FORTH WORKS 3.1 VARIABLES 3-2 3.2 MORE ABOUT VARIABLES -- FETCH AND STORE 3-3 3.3 CONSTANTS 3-5 3.4 THE FORTH COLON WORD : 3-6 3.5 ARRAYS 3-7 3.6 THE RETURN STACK 3-8 3.7 CODE WORDS 3-9 3.8 THE FORTH DICTIONARY 3-10 3.9 TABLES 3-11 3.10 CHARACTER OR BYTE DATA 3-12 3-11 FINDING DICTIONARY ADDRESSES 3-13 3-12 THE NAME FIELD 3-14 3-13 OPERATION OF THE F-PC INNER INTERPRETER 3-15 EXERCISES 3-17 3.1 VARIABLES The Forth word VARIABLE is a defining word used to define variable names. If you type VARIABLE my.name then Forth will create a new dictionary entry called my.name. All dictionary entries have the same general form, consisting of a header (made up of a view field, a name field and a link field) and a body (made up of a code field and a parameter field). In F-PC the header and the body are physically stored in separate segments. (In the 8086/8088/80286 the 1 Mbyte address space is divided into 64 Kbyte segments. A physical address is specified by a 16-bit segment address, seg, and a 16-bit offset address, off. The total address is written as seg:off. The segment address can begin on any 16-byte boundary, called a paragraph. Therefore, to address any byte in memory, one has to specify both a segment address and an offset address.) For the word MY.NAME the dictionary entry will look like this HEADER ________ view field address ---> VFA | VIEW | |------| link field address ---> LFA | LINK | |------| name field address ---> NFA | 7 M | | Y . | | N A | | M E | |------| pointer to CFA -------> | ^CFA | -------| |------| | Head Segment YSEG | | BODY ________ | code field address ---> CFA | CODE | <------| |------| parameter field address ---> PFA | 0 0 | |------| Code Segment ?CS: The view field contains a character count equal to the offset address of the beginning of the colon definition within the file. It is used to locate the colon definition when you ask to VIEW the definition of a word. The link field contains a pointer to the LFA of a previously defined word. The name field contains the name, beginning with the character count and followed by 1-31 characters. The pointer to CFA contains the offset address of the CFA in the code segment. This segment address is given by the F-PC word ?CS:. The code field contains code that will be executed when the name in the name field is interpreted. This is called direct threaded code and is used by F-PC. Many Forths use indirect threaded code in which the code field contains a pointer to the code that is executed. For a VARIABLE the code field contains the three bytes corresponding to the instruction CALL >NEXT where >NEXT is the F-PC inner interpreter to be described later in this lesson. The CALL instruction automatically puts the address of the next instruction on the stack. In this case it is not the address of an instruction at all but rather the parameter field address. The parameter field contains different things for different kinds of words. For a VARIABLE word, the parameter field contains the 16-bit value of the variable. An initial value of zero is stored in the parameter field when a VARIABLE is defined. When you type the name of a VARIABLE the CALL >NEXT instruction in the code field will be exectuted which has the effect of leaving the parameter field address on the stack. If you type my.name . the PFA of my.name will be printed. Try it. 3.2 MORE ABOUT VARIABLES -- FETCH AND STORE Forth words: ! ( n addr -- ) ( "store" ) stores the value of n at addr 6 my.name ! will store the value 6 at the PFA of my.name. @ ( addr -- n ) ( "fetch" ) reads the value at addr and puts it on the stack. my.name @ . will print the value of my.name. Stack variables The system variable SP0 contains the value of the stack pointer for an empty stack. Thus, SP0 @ will return the address of the stack pointer when nothing is on the stack. The Forth word SP@ returns the address of the last item pushed on the stack. That is, it is the current value of the stack pointer. The Forth word DEPTH returns the number of items on the stack. It is defined as follows: : DEPTH ( -- n ) SP@ SP0 @ SWAP - 2/ ; Note that since each element on the stack is a word that contains two bytes, the number of bytes on the stack must be divided by 2 to give the number of words on the stack. 3.3 CONSTANTS The Forth word CONSTANT is a defining word that lets you define constants. For example, if you type 25 CONSTANT quarter the name quarter will be entered into the dictionary as follows: ________ VFA | VIEW | |------| LFA | LINK | |------| NFA | 7 Q | | U A | | R T | | E R | |------| | ^CFA | -------| |------| | Head Segment YSEG | | ________ | CFA | CODE | <------| |------| PFA | 25 | |------| Code Segment ?CS: The code field contains the three bytes corresponding to the instruction CALL DOCONSTANT. The code at DOCONSTANT pops the PFA from the stack (where it was put by the CALL instruction) and then pushes on the stack the value stored at the PFA. Thus, if you type 25 CONSTANT quarter and then type quarter . the value 25 will be printed. NOTE: The value stored in CONSTANTs are 16-bit signed numbers. 3.4 THE FORTH COLON WORD : The Forth word : ("colon") is also a defining word that lets you define new Forth words. When you type : squared DUP * ; the colon word : is executed. It creates your Forth word squared as a dictionary entry that looks like this: ________ ________ VFA | VIEW | |-------> | DUP | ES:0 |------| | |------| LFA | LINK | | | * | |------| ES = LSO + XSEG |------| NFA | 7 S | | |UNNEST| | Q U | | |------| | A R | | List Segment XSEG | E D | | |------| | | ^CFA | -------| | |------| | | Head Segment YSEG | | | | ________ | | CFA | CODE | <------| | |------| | PFA | LSO | --------------| |------| Code Segment ?CS: The code field contains the three bytes corresponding to the instruction JMP NEST. The code at NEST is part of the inner interpreter whose operation will be described later in this lesson. The parameter field contains a list segment offset, LSO, whose value is added to the base list segment address contained in the variable XSEG. The resulting segment address is stored in the register ES and the list of code field addresses making up the definition of squared is stored in memory starting at address ES:0. UNNEST is the address of another routine that is part of the Forth inner interpreter. 3.5 ARRAYS Suppose you want to create an array that contains five 16-bit values. If you type VARIABLE my.array Forth will create the dictionary entry my.array that contains a single 16-bit value in its parameter field. ________ | CFA | CODE | <------| |------| PFA | 00 | |------| Code Segment ?CS: Here we have not bothered to show the head segment associated with the definition of my.array. Note that the parameter field contains 2 bytes to store the 16-bit value. The Forth word ALLOT will add n bytes to the dictionary in the code segment where n is the value on the stack when ALLOT is executed. Thus, 8 ALLOT will add 8 bytes, or 4 words to the dictionary. The code segment part of the dictionary entry my.array will then look like this. _______________ | CFA | CODE | <------| |-------------| PFA | my.array(0) | |-------------| PFA + 2 | my.array(1) | |-------------| PFA + 4 | my.array(2) | |-------------| PFA + 6 | my.array(3) | |-------------| PFA + 8 | my.array(4) | |-------------| Code Segment ?CS: To print the value of my.array(3) you could type my.array 3 2* + @ . 3.6 THE RETURN STACK When you type a number it is put on the parameter stack. All of the arithmetic operations and words such as DUP, ROT, DROP, SWAP, and OVER operate on numbers that are on the parameter stack. Forth also has a second stack called the return stack. The return stack is used by the Forth inner interpreter to store the return address of the next word in a colon definition. It is also used by certain Forth words such as DO. You can also use the return stack IF YOU ARE CAREFUL. You can move a number from the parameter stack to the return stack temporarily if you are sure to move it back before the end of a colon definition. Otherwise, the proper return address will not be on top of the return stack, and the inner interpreter will not be able to find its way back to the next word to execute. The following Forth words use the return stack, R: >R ( n -- ) ( "to-R" ) pops the top element of the parameter stack and pushes it onto the return stack. Ex. 3 >R moves 3 to the top of the return stack and leaves the parameter stack empty. R> ( -- n ) ( "from-R" ) pops the top element of the return stack and pushes it onto the parameter stack. R@ ( -- n ) ( "R-fetch" ) copies the top element of the return stack to the top of the parameter stack. Here is a possible definition of ROT: : ROT ( n1 n2 n3 -- n2 n3 n1 ) >R \ n1 n2 SWAP \ n2 n1 R> \ n2 n1 n3 SWAP ; \ n2 n3 n1 3.7 CODE WORDS Code words are Forth words that are defined in terms of 8086 machine code rather than in terms of other Forth words. Of course, eventually some real 8086 machine code must be executed! The inner interpreter is written in machine code. In addition, many of the F-PC Forth words are written in machine code to maximize their speed of operation. In Lesson 7 we will show you how you can write your own Forth code words. Since F-PC uses direct-threading, the machine code in a code word is stored directly at the CFA in the code segment. Here are examples of how some F-PC primatives are defined. In each case the header of the word is stored in the head segment in the same way as described previously for VARIABLES, CONSTANTS, and colon definitions. DUP _______________ | CFA | POP AX | <------| |-------------| | PUSH AX | |-------------| | PUSH AX | |-------------| | JMP >NEXT | |-------------| Code Segment ?CS: SWAP _______________ | CFA | POP DX | <------| |-------------| | POP AX | |-------------| | PUSH DX | |-------------| | PUSH AX | |-------------| | JMP >NEXT | |-------------| Code Segment ?CS: @ ("Fetch") _______________ | CFA | POP BX | <------| |-------------| | PUSH [BX] | |-------------| | JMP >NEXT | |-------------| Code Segment ?CS: 3.8 THE FORTH DICTIONARY The Forth dictionary consists of a linked list of all words that have been defined. These words can be variables, constants, colon definitions or code words. The names of all of these words are stored in the head segment and linked by means of the link field pointers. The code field of each word is pointed to by the code field pointer in the header. The code field always contains real executable code and therefore must be in the 8086 code segment. In a colon definition the list of CFA's for each word in the definition is stored in a separate list segment that is pointed to by a pointer at the PFA in the code segment. When you define a new word using the colon definition, the compilation process consists of storing the word in the dictionary. F-PC links the name of your word into one of 64 threads using a hashing scheme to increase the speed at which words can be found when searching the dictionary. The next available address in the code segment of the dictionary is given by HERE. That is, HERE is a Forth word that returns the next available address on the stack. The variable DP, the dictionary pointer, contains this next available dictionary address. The word HERE is just : HERE ( -- n ) DP @ ; The outer interpreter is what is being executed when you boot up a Forth system and the ok prompt is displayed. When you type a word and press the outer interpreter searches the dictionary for the word you typed. If it finds the word, it executes it by executing the code located at the code field address. If it doesn't find the word, it executes a word called NUMBER that tries to convert your word to a valid number. If it succeeds, it pushes the value of the number on the stack; otherwise, it displays the <- What? message that tells you that is doesn't understand your word. We will take a closer look at the operation of the inner interpreter in Section 3.13. 3.9 TABLES A table is like a constant array. You could create an array and then fill the values of the array using the ! store word. Another way to create a table is to use the Forth word CREATE which works the same way as VARIABLE except that no space is reserved in the parameter field. For example, if you type CREATE table you will create the following dictionary entry. table ________ | CFA | CODE | <------| |------| PFA <---- HERE Code Segment ?CS: As with the case of a VARIABLE the code field contains the three bytes corresponding to the instruction CALL >NEXT where >NEXT is the F-PC inner interpreter. The CALL instruction will leave the parameter field address on the stack when the word table is called. At this point the dictionary pointer, DP, contains the value of the PFA of table. The forth word comma (,) will store the value on the stack at the location pointed to by the dictionary pointer; that is, at the next available location in the dictionary. Therefore, if you type CREATE table 5 , 8 , 23 , the following dictionary entry will be created: table ________ | CFA | CODE | <------| |------| PFA | 5 | 0 |------| | 8 | 1 |------| | 23 | 2 |------| Code Segment ?CS: You could now define a new word called @table as follows: : @table ( ix -- n ) 2* table \ 2*ix pfa + @ ; \ @(pfa + 2*ix) For example, 2 @table will return 23 to the top of the stack. 3.10 CHARACTER OR BYTE DATA Character (ASCII) data can be stored in one byte. Data can be stored and retieved from single bytes using the following Forth words: C, ( c -- ) ("C-comma") Store least significant byte (LSB) of value on top of the stack at HERE (next available location in the dictionary). C! ( c addr -- ) ("C-store") Store LSB of value on top of the stack at addr. C@ ( addr -- c ) ("C-fetch") Fetch the byte at addr. Leave as LSB on stack. You can create a table containing byte values rather than word values by typing CREATE table 5 C, 8 C, 23 C, You can then define a word called C@table as follows: : C@table ( ix -- c ) table + C@ ; 2 C@table will then return 23 to the top of the stack. Note the difference between C@table and @table defined in Section 3.9. 3.11 FINDING DICTIONARY ADDRESSES The following words can be used to locate and examine Forth dictionary entries: ' ( -- cfa ) ("tick") The statement ' table will leave the CFA of table on the stack. >NAME ( cfa -- nfa ) ("to-name") Converts the code field address, CFA, (in the code segment) to the name field address, NFA, (in the head segment). >LINK ( cfa -- lfa ) ("to-link") Converts the code field address, CFA, (in the code segment) to the link field address, LFA, (in the head segment). >BODY ( cfa -- pfa ) ("to-body") Converts the code field address, CFA, (in the code segment) to the parameter field address, PFA, (in the code segment). You can go to the code field address by using BODY> ( pfa -- cfa ) ("from-body") NAME> ( nfa -- cfa ) ("from-name") LINK> ( lfa -- cfa ) ("from-link") You can also go from NAME to LINK and from LINK to NAME using N>LINK ( nfa -- lfa ) ("name-to-link") L>NAME ( lfa -- nfa ) ("link-to-name") The Forth word HEX will change the base used to print out values to hexadecimal. The word DECIMAL will change the base back to decimal. You can change to any base by storing the value of the base in the variable BASE. For example, the definition of HEX is : HEX 16 BASE ! ; Note that the base must be DECIMAL when this definition of HEX is loaded. The Forth word U. prints the top value on the stack as an unsigned number between 0 and 65535, or if in the HEX mode, between 0000 and FFFF. As an example, to print the hex address of the name field of the word OVER type HEX ' OVER >NAME U. DECIMAL The Forth word LDUMP ( seg off #bytes -- ) can be used to get a hex dump of #bytes of memory starting at address seg:off. For example, type YSEG @ ' OVER >NAME 20 LDUMP and see if you can see the name field of OVER. 3.12 THE NAME FIELD When you define a new word such as TEST1 using a colon definition the following name field is created: Precedence bit ----| |--- Smudge bit Hex value | | ___________________________________ NFA | 1 | 0 | 0 | Name char count = 5 | 85 |---------------------------------| | 0 | Character #1 T = 54H (hex) | 54 |---------------------------------| | 0 | Character #2 E = 45H | 45 |---------------------------------| | 0 | Character #3 S = 53H | 53 |---------------------------------| | 0 | Character #4 T = 54H | 54 |---------------------------------| | 1 | Character #5 1 = 31H | B1 |---------------------------------| If the precedence bit is set, the word will be executed immediately. Immediate words will be discussed in Lesson 9. If the smudge bit is set, the word will not be able to be found in a dictionary search. This bit is set while a colon definition is being compiled. Type in the following dummy colon definition: : TEST1 ; Then examine the name field by typing YSEG @ ' TEST1 >NAME 10 LDUMP The six hex values shown in the figure above should be displayed. Note that the most significant bit is set to 1 in the first and last byte in the name field. The maximum number of characters actually stored in the name field is the value stored in the variable WIDTH. For example, 10 WIDTH ! will cause a maximum of 10 characters to be stored in the name field. F-PC sets the default value of WIDTH to 31 -- its maximum possible value. 3.13 OPERATION OF THE F-PC INNER INTERPRETER The following diagram illustates the operation of the F-PC inner interpreter. cubed ________ | CFA | CODE | <------| |------| _________ PFA | LSO | -------- +XSEG -------> | DUP | ES:0 |------| |-------| Code Segment ?CS: |<------- |squared| IP = ES:SI | |-------| | | * | | |-------| |<- W = AX = current word pointer <-| |UNNEST | | |-------| | List Segment XSEG | squared | ________ | |-----> CFA | CODE | <------| |------| ________ PFA | LSO | -------- +XSEG -------> | DUP | ES:0 |------| |------| Code Segment ?CS: | * | |------| |UNNEST| |------| List Segment XSEG >NEXT LODSW ES: \ Load AX with CFA at ES:SI & inc SI JMP AX \ Execute the code at the CFA in AX NEST XCHG BP,SP \ Push IP = ES:SI PUSH ES \ on the return stack PUSH SI XCHG BP,SP MOV DI,AX \ AX = CFA of word to execute MOV AX,3[DI] \ Get LSO at PFA ADD AX,XSEG \ and add to XSEG MOV ES,AX \ Put this sum in ES SUB SI,SI \ Make new IP = ES:SI = ES:0 JMP >NEXT \ Go to >NEST UNNEST XCHG BP,SP \ Pop IP = ES:SI POP SI \ from the return stack POP ES XCHG BP,SP JMP >NEXT \ Go to >NEXT The inner interpreter contains the three routines >NEXT, NEST, and UNNEST. An interpreter, or instruction pointer, IP, points to the memory location in the list segment containing the code field address of the next word to be executed. In F-PC this instruction pointer is made up of the two registers ES:SI. Suppose this is pointing to the CFA of squared in the definition of cubed as shown in the previous figure. The routine >NEXT puts this CFA into a word address register, W, (which is AX in F-PC) increments IP (SI) by 2 so that it will point to the next word (*) in the current definition, and then executes the code at the CFA in W. If this is a colon definition as it is in this case then the code at the CFA is a jump to the routine NEST. As shown above NEST will push IP (both ES and SI) on the return stack so that the program can later find its way back to the next word in cubed when UNNEST is executed. NEST then gets the list segment offset, LSO, for the word squared, adds it to the base list segment address in XSEG and stores this value in ES. It then sets SI to zero so that the new IP is given by ES:0, which points to the first word in the definition of squared. It then jumps to >NEXT which will repeat the process, this time executing the first word in squared, namely, DUP. Since DUP is a code word, its actual machine code is located at its CFA. This code will therefore be executed when >NEXT is executed. The last instruction in the definition of DUP is another jump to >NEXT. But now the IP will have been incremented to point to the CFA of *. Again this is a code word which will be executed and then jump to >NEXT again. The last word in a colon definition is UNNEST. The CFA of UNNEST is added to the list segment in the dictionary when the semi-colon in a colon definition is executed. The code field of UNNEST contains the machine code shown above which pops IP (SI and ES) from the return stack and then jumps to >NEXT. Since this is the IP pushed on the stack by NEST when squared was executed, it will point to the word following squared in the definition of cubed. This is the word * which is the next word to be executed. This is how Forth works. Colon definitions are stored as lists of CFA's in the list segment. When the CFA to be executed is that of another colon definition, the IP is incremented and pushed on the stack and then changed to point to the first CFA in the definition of the new word to be executed. When the CFA to be executed is that of a code word, then the actual machine code located at the CFA is executed. This process continues with a jump to >NEXT following the execution of each word. EXERCISE 3.1 Define the colon words : squared DUP * ; and : cubed DUP squared * ; Use the F-PC words ' ("tick"), >LINK ("to-link"), and LDUMP (seg off #bytes -- ) to answer the following questions: 1) What is code segment address ?CS:? 2) What is the head segment address stored in YSEG? 3) What is the list segment address stored in XSEG? 4) What is the CFA of squared? 5) What is the LFA of squared? 6) What is the NFA of squared? 7) What is the PFA of squared? 8) What is the CFA of cubed? 9) What is the LFA of cubed? 10) What is the NFA of cubed? 11) What is the PFA of cubed? 12) Draw a picture of the header of squared showing the hex values in all locations. What is the value stored in the ^CFA location? Draw a picture of CFA and PFA fields as well as the list segment dictionary entries for the word squared. Show all addresses and values stored in the dictionary. 13) What is the LFA of the dictionary entry defined just prior to cubed? What is the name of that word? 14) What is the address of NEST? 15) What is the CFA of DUP? 16) What is the CFA of *? 17) What is the address of UNNEST? comment;