Forth Lesson 5

From OLPC
Revision as of 04:52, 5 January 2007 by Quozl (talk | contribs) (tick lost in double quotes that look the same)
Jump to navigation Jump to search

Review

In the previous lesson, we learned that:

  • New words can be created with ": newname ...  ;"
  • The interpreter has a "compile state" used for making new words
  • New words are not "visible" until they are finished
  • The interpreter finds the most recent definition of a given name
  • Words can call themselves recursively
  • A word can call another word of the same name, creating a "chain"

Introspection

Today's lesson is about introspection - interactive exploration of the Forth system itself. Many Forth systems have some degree of introspection; the facilities in Open Firmware are fairly extensive.

Decompiler

The decompiler "see" displays a pretty-printed representation of the source code for the target word.

 ok see here
 : here
    dp @
 ;

In the example above, the target word was a system-defined colon definition. In addition to the colon definitions which we have already learned about, Forth has some other kinds of words which will learn about them later. The decompiler knows about those other kinds of words:

 ok see dp
 224 user dp  value = -1210197548 
 ok see bl
 32 constant bl 
 ok see +
 code + 
 b7db1430     pop     ebx
 b7db1431     pop     eax
 b7db1432     add     eax,ebx
 b7db1434     push    eax
 b7db1435     jmp     edi

The example above shows a word "+" which is implemented in assembly language. A few words at the core of the system are like this, but most of the several thousand words are implemented as colon definitions.

You can also use the disassembler for assembly language code that is not in the form of Forth code words. For example, if you have loaded a Linux kernel at the address 0x100000, you could inspect it with:

 ok 100000 dis

You can decompile the entire Forth system, down to the very roots. So in some sense, Open Firmware has always been "open source", since the very first Open Firmware system included the full decompiler. (Some will no doubt quibble about this claim; in practice, for all the "free speech" talk, most people really want "free beer".)

Look-ahead Words

Most Forth words take their arguments from the stack, but there are a few that get one or more string arguments by parsing ahead in the input stream. We have already seen some examples of this, notably comments and ":". "see" is another example. The amount of "look ahead" in such cases is typically fixed, not context-dependent. (You could write a context-dependent look-ahead word, but such usage is exceedingly rare in common Forth usage.)

For most "look-ahead" words, there is a variant version with a slightly different name that takes its arguments from the stack instead of from the input stream. The look-ahead versions are convenient for interactive use, whereas the stack-based versions are convenient for use within other words.

Wordfinder

How do you find the word you want among the thousands available?

 ok sifting valid
         In vocabulary  forth 
 (b7dcfc60) bp-address-valid?    (b7dce2d4) state-valid   
 (b7dce248) %state-valid         (b7dbbd04) ?block-valid   

The "sifting" command parses a whitespace-delimited string from the input stream and searches for words whose name have it as a substring.

You may see several lines like "In vocabulary forth", with other names in place of "forth". The system contains several "vocabularies" (lists of words). The "forth" vocabulary is by far the largest, containing all of the core Forth words. There are other vocabularies with words that implement special-purpose functions. There is a search order that controls which vocabularies the interpreter searches at any given time. A future less will describe vocabularies and search orders in more detail.

Open Firmware "device tree nodes" are implemented as vocabularies with a special structure. "sifting" looks only in the ordinary vocabularies, but there is a similar word for looking in device tree nodes:

 ok sift-devs show-b
         In device  /pci/nandflash@c
 (ff88db78) show-bbt

Callfinder

To find where a word is used,

 ok ' save-image .calls
                   Called from save-image                at  b7dca444
                   Called from save-forth                at  b7dca4e8

The first item in the list of usages is often the word itself. That does not necessarily mean that the word is recursive. Rather, it is an artifact of the algorithm that the callfinder uses. So don't believe the list completely. The callfinder will find every compiled-in use of the word, but it will sometimes get spurious "hits" too. The callfinder will *not* find transient usages, i.e. references to the word that are stored in variables.

Execution Tokens and "'"

The stack diagram of ".calls" is "( xt -- )". "xt" is the notation for "execution token", which is a numeric identifier that uniquely specifies a particular instance of a word. Recall that you can reuse the same word name several times, with the older instances remaining present in the system (possibly still called within other words). Each such instance has a distinct execution token. In many implementations, the execution token is the memory address of a data structure representing the word, but portable code must treat execution tokens as opaque identifiers.

"'" (a single quote) is a standard word that parses a whitespace-delimited name from the input stream and pushes the execution token of the most recent definition of that name. Normally, when the Forth interpreter encounters a word, it executes it, but there are times when you want to refer to the word instead of executing it. That is what "'" is for. You can think of it as "quoting" the word. Note that "'" applies to already-defined Forth words, not to arbitrary text strings. If you try to apply "'" to a string that is not a defined word, you will get an error message.

Referring back to the "sifting" example above, the number in parentheses before each word is its execution token. You can use this to get a handle on words whose names have been redefined. "(see)" is a variant of "see" whose argument is an execution token from the stack instead of a string from the input stream:

 ok b7dce2d4 (see)

The number above comes from the "sifting valid" example; it happens to be the execution token of "state-valid" in the system I'm using right now. In general, that number probably won't work on your system. But you could say:

 ok ' here (see)

"sifting <something>" then "<xt> (see)" (where <xt> is a number you type based on looking at the output of "sifting") is a good way to look at the definitions of superseded versions of words.

"'" inside definitions

Inside a colon definition, "'" won't do what you probably expect.

 ok : myword   ' save-image .calls  ;

You might expect your new word "myword" to be equivalent to interactively executing:

 ok ' save-image .calls

but that is not what happens. What happens is that "'" parses the input stream when "myword" is executed, not when "myword" is compiled.

Here is how to write what you probably meant:

 ok : myword  ['] save-image .calls  ;

"[']" is a variant of "'" that does the look-ahead parsing during compilation.

Yes, I realize this is confusing. Just use "'" outside colon definitions and "[']" inside, and the right thing will happen.

Thus endeth the lesson