Back to
Main page for the programming language Joy
by Manfred von Thun
Revised March 2003
Contents
Abstract Joy is a functional programming language which is not based on the application of functions to arguments but on the composition of functions. This paper serves as a manual for the current prototype implementation. Other implementations might differ in some details.
The prototype implementation of Joy is written in unadorned C. It is not the first implementation, earlier ones have been written in Pascal, Prolog and Lisp. A small version in C was written by a group of students. The current prototype bears almost no resemblance to these earlier versions.
This manual includes references to John Cowan's (2001) major extension to the earlier version of the prototype, and also to some additions by Nick Forde. Their help has been invaluable, and it is a pleasure to acknowledge their support. I am also grateful to many other members of the "concatenative" mailing group for finding and fixing numerous bugs.
The source files for the current implementation are freely available, and can be downloaded in various formats by following links from the main page.
In the next three sections the following (extended) BNF notation is used for the grammar: A BNF production consists of a BNF nonterminal followed by ::= (pronounced "can be"), followed by a BNF expression. A BNF expression consists of one or more BNF terms separated by the alternation symbol | pronounced "or". A BNF term consists of one or more BNF factors concatenated. A BNF factor can be a non-terminal, or a BNF terminal enclosed in double quotes. It can also be an English phrase. A BNF factor can also be a BNF expression enclosed in parentheses () to override precedences, or in curly braces {} to indicate zero or more repetitions, or in square brackets [] to indicate options.
The remainder of this paper gives a provisional manual for the prototype implementation. The focus of this manual is on general structure, and it leaves out many details that have been described elsewhere. Also, the manual should be understood to be provisional; other implementations of Joy might differ considerably. The first few sections to follow describe the syntax and the general semantics of this implementation of Joy. The later sections describe some very specific features that have not been covered elsewhere might not be present in other implementations.
This section describes the lexicon of Joy, the individual tokens which make up a program.
A Joy program consists of an ASCII file,
which consists of lines of characters.
Lines beginning with the escape character $
are not processed by the Joy interpreter
but are dispatched to the command shell.
Under VMS
typical uses are the following:
$ show time $ directory $ edit myfile.joyUnder Unix the corresponding uses are:
$ date $ ls $ vi myfile.joyWhen the request has been processed, control returns to the Joy system.
Lines not beginning with the escape character are processed
by the Joy system.
There are two styles of comments:
parenthesis style and end-of-line style.
The first kind of comments are enclosed in starred parentheses,
(* and
(* this is a comment continued on the second line *)The second kind of comments occur after the hash-symbol
# this comment terminates at the end of this line # and this comment terminates hereComments are not further processed but are treated just like blank space.
An input line consists of a number of tokens.
Some of these are single characters,
the reserved characters.
They are the two (square) brackets [ and ] ,
the two (curly) braces { and },
the semicolon ; and the period . :
reserved-character ::= "[" | "]" | "{" | "}" | ";" | "."
Another kind of token is that of integer constant,
a string of one or more decimal digits "0" .. "9"
possibly immediately preceded by a unary minus "-".
integer-constant ::= [ "-" ] ( "0" | "1" .. | "9" ) { "0" | "1" .. | "9" }An integer constant can contain up to 10 digits but must not exceed the platform dependent maximum integer (generally $2^{32}-1$).
Apart from integers, the current implementation of Joy also has real numbers or "floats". In the simplest form a float constant consists of a string of one or more decimal digits, then a decimal point . and then at least one more decimal digit or possibly more. As for integer constants, a float constant can optionally be preceded by a unary minus. For scientific notation, a float constant can be followed by a scale factor, either of the letters "e" or "E" followed by one or more digits.
Instead of writing integers or floats in decimal notation, they can be preceded by "0" to signify octal notation (using digits "0" .. "7"), or preceded by "0x" pr "0X" to signify hexadecimal notation (using digits "0" .. "9" "A" .. "F").
Others tokens are string constants and character constants.
Some character sequences beginning with the backslash "\" character
inside string constants or character constants
are treated as escaped characters.
A character other than the backslash character is an ordinary character.
escaped-character ::= "\n" newline | "\t" tab | "\b" backspace | "\r" carriage return | "\f" formfeed | "\'" single quote | "\"" double quote | "\ddd" ASCII whose value is ddd where ddd are three decimal digits ordinary-character ::= any character except "\"A string constant begins and ends with the double quote character ", enclosing any sequence of zero or more special characters or ordinary characters. A character constant begins with the single quote character ' followed by a special character or an ordinary character; there is no closing single quote.
string-constant ::= '"' { escaped-character | ordinary-character } '"' character-constant ::= "'" ( escaped-character | ordinary-character )
Of the remaining tokens
some are the reserved symbols:
reserved-word ::= "==" | "MODULE" | "PRIVATE" | "PUBLIC" | "END" | "HIDE" | "IN" | "DEFINE" | "LIBRA"The reserved words "HIDE", "IN", "DEFINE" and "LIBRA" are currently synonyms for "PRIVATE" and "PUBLIC", and they may eventually declared obsolescent.
Any other token beginning with a printing character
and possibly continuing with upper or lower case letters,
digits or the three characters "=", "_" or "-"
is an atomic symbol.
atomic-symbol ::= any printing character which is not a reserved character or a digit or the single or double quote character { letter | digit | "=" | "_" | "-" }
To summarise, here is the definition of tokens:
token ::= reserved-character | reserved-word | integer-constant | float-constant | character-constant | string-constant | atomic-symbol
Individual tokens are used to built up two kinds of syntactic entities called factors and terms. Their definitions are mutually recursive.
A factor can be a single token such as
an atom or one of the three constants:
integer constant, character constant or string constant.
A factor can also consists of a (possibly empty)
sequence of integer or character constants
enclosed in the curly brace tokens { and }.
Finally, a factor can also be
a (possibly empty) term enclosed in the square bracket tokens
[ and }.
factor ::= atomic-symbol | integer-constant | float-constant | character-constant | string-constant | "{" { character-constant | integer-constant } "}" | "[" term "]"A term consists of zero or more factors.
term ::= { factor }
For the semantics another classification is essential.
Factors are either literals or operators or combinators.
A literal can be one of the two atomic symbols
"true" or "false" -- these are the logical literals.
A literal can also be one of the four constants
and these are then called
integer literal,
float literal, character literal or string literal.
A literal can also consist of the braces enclosing
characters or integers - these are the set literals.
Finally a literal can consist of the brackets enclosing a term
- these are the quotation literals or simply quotations.
If the term consists just of literals,
the factor is called a list list or just a list.
literal ::= "true" | "false" | integer-constant | float-constant | character-constant | string-constant | "{" { character-constant | integer-constant } "}" | "[" term "]"The other atomic factors are either built in or defined. All built in atomic factors are either operators or combinators. Most defined atomic factors are likely also to be operators or combinators, but it is possible to define atomic symbols which are literals.
factors and terms denote functions which take one argument and yield one value or result. The argument and the result consist principally of a stack of values, but have as minor components several input files and one output file. The stack is normally a sequence of values of the kind denoted by constants and thus it resembles a list. But in (rather rare) cases the stack can contain operators or combinators. So the stack is essentially the same thing as a term. Most of the functions affect only the stack component. An empty term denotes the identity function. A non-empty term, consisting of one or more factors concatenated, denotes the composition of the functions denoted by the factors. The functions denoted by the factors are applied in the order in which the factors are written.
All literal factors denote functions yielding as result a stack component which is like the argument stack except that the value of the literal has been pushed on top of the stack. The operator factors expect as argument a stack which has to contain one or more values. They denote functions yielding as a result another stack which is like the argument stack except that the top few values have been replaced by others. The combinator factors are like the operator factors in that they expect a stack containing certain values. Specifically, the combinator factors expect a stack containing one or more quotations and possibly some other values. Combinators will execute the terms contained in the quotations.
The file Online manual (raw Joy1) gives a very synoptic explanation of the inbuilt primitives of raw Joy1 -- that is, excluding anything defined in one or the other library. The arrow -> indicates that the operator or combinator that is being described expects on top of the stack the items on the left of the arrow and returns a stack on top of which are the items on the right. The content of this file is identical in content with that obtained by the manual command.
Often one needs not just one definition but a sequence
of definitions, and these are then separated by semicolons ";".
definition-sequence ::= definition { ";" definition }
Definitions can be of two kinds:
simple or compound.
A simple definition consists of a user-chosen atomic symbol
followed by the == token followed by a term:
simple-definition ::= atomic-symbol "==" termA simple definition introduces the new atomic symbol into the current working dictionary. The initial dictionary is empty. After the definition the new symbol can be used, and if executed its effect is that of executing the term.
A compound definition consists of
1) optionally the MODULE token followed by
an atomic symbol, then
2) optionally either the PRIVATE token
followed by a definition sequence, then
3) optionally the PUBLIC token
followed by a definition sequence, and
finally either the "." (period) token or the END token.
At least one of the three options must be present:
compound-definition ::= [ "MODULE" atomic-symbol ] [ "PRIVATE" definition-sequence ] [ "PUBLIC" definition sequence ] ( "." | "END" )Instead of PRIVATE...PUBLIC... the combination HIDE...IN... can be used, and instead of PUBLIC... the syntax LIBRA... or DEFINE... can be used. These alternative symbols may become obsolscent in the future, but will be supported for the time being.
In the simplest case only the PUBLIC-part is present.
In that case all definitions in the definition sequence are
entered into the symbol table.
If there is a PRIVATE-part,
then any definitions in its definition sequence are
entered into the symbol tableand are then available
for use by the PUBLIC-part.
However, the PRIVATE definitions are
then obscured at the end of the PUBLIC
definition sequence and are then not available any further.
If there is a MODULE-part,
then all subsequent PRIVATE and PUBLIC
definition are visible as before except
that at the end of the PUBLIC definition sequence
all definitions are obscured except the name of the module.
In that case the definitions of the PUBLIC-part
can only be accessed as factors by what is called
"qualified access", in the form
m . f or m.fwhere m and f are respectively the name of the module and of a function from its PUBLIC-part.
In many uses of Joy the initial input will be from the terminal keyboard
and the output will be to the terminal screen.
Operating systems will generally allow the input and the output
to be redirected,
so that the input comes from a disk file
or the output goes to a disk file, or both.
Whatever the actual input file is,
it will consist of a a number of requests to the Joy system.
A request is either a compound definition or it is a term
followed by either the END token or just a period ".".
cycle == { compound-definition | term ( "END" | "." ) }In the first case the compound definition is processed and control returns to the top level. In the second case the term is executed on the current stack. In the default mode the top value on the stack is then written the output file, normally the screen. If several terms are executed in succession, each term inherits from the preceding term the remainder of the stack. This can be useful when debugging larger programs. On the other hand, every library declaration clears the stack. The inbuilt factor abort returns to the main cycle, but it leaves the stack intact.
$ joyand under VMS it might be
$ RUN JOYor, if "JOY" has been appropriately defined as a symbol,
$ JOYThe Joy interpreter will then announce itself with a brief banner. In the simplest case the Joy interpreter will then use the keyboard as the first input device and start the main cycle by taking input from there. Output will be to the screen.
However, although this is the simplest case, it is not the most common. If the current directory contains a file by the name "usrlib.joy", this will become the first input file, and the main cycle will start reading requests from there. Typically this file will contain some definitions specific to the particular user, or to the particular installation, or to the particular operating system. Typically this file will also contain a directive to continue by reading some of the standard libraries, or to continue with some other files. When these requests have been serviced, the main cycle will continue by reading inputs from the keyboard, unless the main cycle has been terminated before that happens.
Most operating systems allow input to and output from
a program to be "redirected" to come not from the keyboard
or go to the screen, but to come from and go to a file.
In Unix the Joy interpreter can be run with the input
redirected from a file, say "my-input.joy",
like this:
$ joy <my-input.joyIt is also possible to redirect the output to go to, say "my-output.txt", like this:
$ joy <my-input.joy >my-output.txtThe same can also be done in VMS, but here it is necessary that both redirections occur:
$ JOY /INPUT=MY-INPUT.JOY /OUTPUT=MY-OUTPUT.TXtWith redirection of the input the behaviour of the Joy interpreter again depends on whether the current directory contains a file "usrlib.joy". If there is such file, then that is read first, and only then the main cycle reads not from the keyboard but from the redirected file. If there is no file "usrlib.joy", the main cycle will immediately read from the redirected file.
Under some operating systems it is possible to give what are called
command line arguments to a program that is to run.
This is very familiar under Unix,
where the Joy interpreter might be run like this:
$ joy P1 P2 P3with in this case three arguments P1 P2 P3. Under VMS it is also possible to define programs as "foreign commands" by a definition (possibly in login.com):
$ JOY == "$DSK-XYZ[MYNAME.JOYDIR]JOY.EXE"where DSK-XYZ, MYNAME and JOYDIR are respectively the user's disk, the user's name and the directory on which the executable interpreter is located. The joy interpreter is then run with parameters P1 P2 P3 by
$ JOY P1 P2 P3
When the Joy interpreter is run with command line arguments, the first parameter P1 must be a valid file name. The interpreter then attempts to open that file for reading, and if that attempt fails it exits. If the attempt succeeds, all further commands are taken not from standard input (possibly redirected) but from that file. This means that any explicit input from the "get" command can still come from standard input.
When Joy is run with command line parameters, say by
$ joy test.joy mary 42the number of parameters is available by the operator argc which pushes an "argument count", in this case 3. Also, a list of the parameters is available by the operator argv which pushes the "argument vector", in this case the list [ "test.joy" "mary" "42" ].
Under Unix and under VMS one can write files that are to be executed by the command interpreter and whose first line calls Joy and uses the rest of the file s input to Joy. In Unix these are "here documents", for VMS they are ".com" files. With the command line arguments to Joy there is another, more elegant, possibility using the "#!" facility (interpreted by the Unix exec system call). For Joy such a file would have to start with "#!joy" as the first line, and the rest of the file as input to Joy. The file gcd is an example which uses Joy to compute the greatest common divisor. After it has been created, it has to be made executable with
$ chmod +x gcdand thereafter it can be run with command lines such as
$ gcd 45 100which will give the answer 5.
For many applications no explicit output operation is needed because in the default mode the main cycle will always write the value of the top of stack to the output stream. But sometimes explicit output is useful. There are two basic output operators: put will remove a value of any type from the stack and write it to the output stream. putch will remove a value of a numeric type (character or integer) and write it as a single ASCII character to the output stream. Other useful output operations are defined in the library.
In the default mode the Joy system will write the value of the
top of stack item to the output.
The mode is under the control of a flag with value 0 and 1.
If the value is 0,
no automatic output occurs.
If the value is 1, the default,
output of the top value occurs after the evaluation of every term
in the main cycle.
If the value is 2,
the entire stack is output after such an evaluation.
The value of this flag can be changed by the setautoput operator,
typical use is
0 setautoput.
Most Joy programs operate on a stack, several input files and one output file.
The initial input comes either from the keyboard
or it can be redirected to come from a diskfile,
or it can be not from standard input but from
a file specified as a command line argument.
Often it is useful to include other files, typically diskfiles,
in the input.
Joy has an include operator for this purpose.
It takes a string on the stack
as a parameter and opens a file with that name and henceforth accepts
input from there.
A typical use might be
"test.joy" include.Included files can contain library definitions or terms to be executed. They can contain other include operators, up to a level of 10. When the end of an included file is reached, input reverts to the previous file which contained the relevant include operator. The include files therefore form a stack.
Users will often want their own selection of
systems or private libraries read in automatically
when the system starts up.
For this purpose the Joy system always
starts by checking whether a file "usrlib.joy"
exists in the user's current directory.
That file can be empty,
but it can contain user specific definitions
and it can contain particular terms to be executed.
One such term might be a directive to
include yet another file, perhaps
"inilib.joy" include.which transfers input to the initial library, inilib.joy, one of the systems libraries for Joy. These libraries are described in the next section.
If input does not come from the keyboard but from an include file,
it is often useful to see the current input on the screen.
Normally the Joy system does not echo keyboard or other input
on the screen or output file,
but it can be made to do so.
Joy has an internal echoflag
with values 0, 1, 2 or 3..
The initial default value is 0,
and then no echo occurs.
If the value is 1,
then input lines are echoed to the output without change.
If the value is 2,
then each line is preceded by a tab (ASCII 9) --
this can help to distinguish echoed lines in the output from others.
If the value is 3,
then each tab is preceded by a line number specific to the current input file.
When input reverts to an earlier file,
the earlier line numbering is resumed.
The value of the echo flag can be set
by the operator setecho.
Typically the value will be set by something like
2 setecho.
When a Joy program is processed, two kinds of programming errors can occur. The first kind can occur during reading of the input file. Because the syntax of Joy is so simple, there are only few possibilities for error. The second kind can occur at runtime when a term is being executed. These errors concern incorrect parameters for operators and combinators. Because there are many kinds of operators and combinators, the possibilities for runtime errors are great.
Errors of the first kind occur during reading a program. Since the input line is still available at that time, it is possible for the error message to point to the exact place at which the error occurred. If input comes from the initial input file, most likely the terminal, it is not repeated. In other case it is repeated, including its line number. The next line then consists just of an up-arrow "^" pointing to the error position. Following that is a line containing the specific error message.
The following (rare) message can be reported when reading tokens,
it occurs when the second or third digit
in an escaped character is missing.
digit expectedThe following errors can occur when reading factors:
numeric expected in set ']' expected a factor cannot begin with this symbolThe following errors can occur when reading HIDE declarations:
IN expected in HIDE-definition END expected in HIDE-definitionThe following errors can occur when reading definitions:
atom expected at start of definition == expected in definition
All other errors are runtime errors.
These are given in the form:
"run time error: " MESSAGE " needed for " NAMETwo examples are
run time error: fewer include files needed for include run time error: valid file name needed for includeIn general, the NAME is the name of the operator or combinator whose execution was attempted but failed. The MESSAGE is often the same for many operators or combinators. It can be one of the following:
one parameter two parameters three parameters four parameters four parameters quotation as top parameter quotation as second parameter quotation as third parameter quotation as fourth parameter two parameters of the same type integer two integers numeric numeric second parameter non-zero operand list list as second parameter user defined symbol internal list small numeric non-empty set non-empty string non-empty list smaller index aggregate parameter different type
Some apparently inexplicable runtime errors can occur when reading an included library file which is syntactically correct and hence does not produce an error message as it is read in. The program can still contain a logical error which causes something unintended to be executed that produces a run time error. In that case it is helpful to use setecho to set the echoflag to 3, resulting in line numbers to be printed as the included file is read in. There have been two occasions where this trick helped clear up a problem.
The Joy system has a growing number of library files containing definitions which extend the core Joy language. he following is a brief description of the various libraries in the current distribution. The actual content of all the libraries is still subject to minor change, so these descriptions should only be taken to be a rough guide.
The first two libraries below are very atypical: the first because it does not even have to exist at all, the second because it is assumed by just about all other libraries and hence is quite non-specific.
This user library is intended to be just an example of what a personalised file usrlib.joy might look like. When Joy is run with input coming from the standard input stream, on startup Joy will first determine whether there is a file usrlib.joy in the current directory. If there is, it is opened for reading. When the end is reached, or when there is no such file, the terminal or the designated inpu file is used. For any particular user this file might look very different -- it might be empty, or it might might contain just a directive to include the library described next, or it might contain fewer or it might contain more definitions, or it might contain definitions of a very different kind.
This initial library contains a disparate collection of operators and combinators that are widely useful. Most are not dependent on particular types. Some of them might eventually be incorporated as primitives into the Joy language. Currently there are a few simple operators to help with explicit input and output, some operators for dates and times, some operators on predicates, and some useful combinators. Finally there are some utilities to help loading and managing libraries. All other libraries wiil ensure either directly or indirectly that this library is also loaded.
The following basic libraries contain definitions of still rather general operators and combinators but related to specific types or groups of types.
This aggregate library contains definitions of operators and combinators most of which are applicable to all of the three aggregate datatypes such as lists, strings and sets. (Note that the next library seqlib.joy is specifically for sequence datatypes such as lists and strings.)
The agglib.joy library contains definitions of several unary and binary operators for building special cases of these datatypes, and for accessing parts of these datatypes. There are also some ("dipped") variants which operate not on the top of the stack but one below. The combinators in this library are mainly for traversing these datatypes. (At the end there are a few definitions of statistical operators, but in the future these might be moved to a more extensive statistics library.)
This library contains definitions of functions for reversing lists or strings, for flattening lists of sequences, for generating lists of various subsquences of sequences, sorting and merging of sequences and inserting into and deleting from sequences. (At the end there are a few definitions of functions applicable to trees or recursive lists, but in the future these might be moved to a more extensive tree library.)
The followiing special libraries contain definitions of operators and combinators that are very specific to a particular field or problem.
The file allhelp.html is the output of the help command when all libraries have been loaded. It gives some impression of the size of the Joy implementation so far.
The file joylibs.tar.gz is a gzipped tar file of all *.joy libraries, input test files and output files. This file is automatically updated when there is any change to the Joy distribution.
The following libraries are external to the main Joy web pages. They are part of the "files" folder for the "concatenative" yahoo mailing group. To access them you need to join the group.
Joy has a help command to remind users of the names
of currently defined symbols.
The command does not expect any parameters on the stack,
and it does not affect the stack.
The effect is to write all user defined symbols and all
inbuilt symbols to the output which is normally the screen.
The list begins with the most recently defined symbols,
typically those specially defined just for the current run,
then the library symbols and finally the inbuilt symbols.
The help command is just a factor:
help
More details about a particular symbol can be obtained by
the helpdetail operator.
It expects a list of atoms on the stack
and gives a short message for each symbol.
If the symbol is defined,
then the message consists of the definition for the symbol.
If the symbols is inbuilt,
then the message consists of a brief description of its usage.
[ swap step product ] helpdetail.asks for detailed help about the inbuilt operator swap, the inbuilt combinator step and the defined operator product. The message is
swap : X Y -> Y X Interchanges X and Y on top of the stack. step : A [P] -> ... Pushing successive members of aggregate A onto stack, executes P for each. product == 1 swap [*] stepFor the inbuilt primitives the -> arrow indicates the type of the symbol. To the left of the arrow are the required parameters in ascending order. To the right of the arrow are the results, where possible. If that cannot be specified in this simple notation, a dieresis ... is used.
As described in section 2,
Joy already allows escapes to the operating system at the level of input lines.
Another such escape is at the level of Joy operators.
The system operator expects a string as a parameter
and passes that to the operating system.
A typical use might be:
"directory" systemThe string parameter does not have to be pushed as a literal, it can equally well have been constructed. For example, the system operator can be used as in the following. My personal USRLIB.JOY contains the definition
editor == "EDIT/TECO "The general STDLIB.JOY file contains
edit == dup editor swoncat system includeThen, to edit and immediately read in a file, say TEST.JOY, I can use the Joy command
"TEST.JOY" editThis calls the edit operator to make a duplicate of the string "TEST.JOY". One copy is concatenated with the editor string, and the result passed on to the system. That will call my favourite editor to edit the file. When the editing is finished, the original string "TEST.JOY" is used to include the edited file. Note that other users might put a different name of their editor in their own USRLIB.JOY. For example, Unix users would probably put
editor == "vi "The definition of edit in STDLIB.JOY remains unchanged.