Emu.
Embeddable Utility Language
v1.0

by Aaron Kimball


Introduction:

Text files are common. However, they are only text. They are inherently inflexible, in that they can not be executed. Sometimes, however, such files need to be modified quickly, such as via a computer program.

Emu allows for rapid creation of scripts readily embedded within text files, or as standalone programs.

Emu scripts can:

This short guide will explain the key features of the Emu language and get the aspiring Emu Programmer on his or her feet.

Good luck!

-- Aaron Kimball


Contents:

1. Scripts and syntax
2. Reserved words
3. Variables
4. Functions
5. Function arguments
6. Flow control
7. Function Libraries, and Expansion Library Authoring

Appendicies:

A. Built-in Functions
B. Operator Precedence
C. Command-line Switches


1. Scripts and syntax

Any text document can be used as an Emu script. When taken as input, text is echoed straight without parsing.

The Emu scripts themselves are contained within <$ and $>. When the compiler sees a <$, it starts parsing the input as a script. When it sees a $>, it switches back into straight-echo mode. Therefore, the following can be considered as a script:

hello.emu:
                This is raw text
                <$ print "Hello, world!\n" $>
                This is more raw text

This script, when executed, would produce the following output:

                This is raw text
                Hello, world!
                This is more raw text

The '\n' is called an escape sequence. This tells the interpretter to produce a character which can not be typed in directly. \n is an escape sequence for 'newline'. Newlines (like when you hit the Enter key) are stripped out of scripts by the interpretter.

Some escape sequences:

Because carriage returns are ignored, function calls do not end at the end of a line (there are a few exceptions, in which a carriage return is needed, but more on that later). Function calls are terminated by a ; (semicolon). Our script above could just as easily have been written:


                This is raw text
                <$ print "Hello, world!\n"; $>
                This is more raw text

However, Emu will allow you to leave off the final semicolon, as long as there is a $> instead. If we needed to perform two function calls, we would have to use the semicolon:

                This is raw text
                <$ print "Hello, world!\n";
                   print "This is more script!\n"; $>
                This is more raw text

Comments are allowed in your code; these will help you remember what you are doing, or what you need to do. A comment is any line that begins with a '#'. Therefore, the following would be the same script as before:

                This is raw text
                <$ print "Hello, world!\n";
                # I am a comment
                   print "This is more script!\n"; $>
                This is more raw text

The output would be the same.


2. Reserved words:

Some keywords are reserved as part of the language, and can not be used to name variables or functions. These are:

and
array
break
if
int
integer
else
elseif
endif
eq
for
func
ge
gt
is
le
local
loop
lt
ne
or
real
retnow
return
uint
uinteger
while

Additionally, some constants and variables are set by the interpretter, and may be read, but not written:

input_available - Is a constant (0 or 1) set by the interpretter, denoting if Emu may accept user input. (If started as a background process, this is set to 0, and input functions will return 0 or a null string, depending upon variable type.)

ret - Contains the return value of the function just called (Set by 'return' or 'retnow')

sockets_available - Is a constant (0 or 1) set by the interpretter, based upon availablility of sockets in this implementation of the language.

termfuncs_available - Is a constant (0 or 1) set by the interpretter, based upon availability of terminal function control (such as gotoxy, clrscr, etc).


3: Variables

Variables are spaces where data can be stored. Variables are given a type, so that the interpretter knows how to read the data. There are four basic types in Emu:

These four keywords are used to create variables for later use. A script could thus read:

                <$
                int x;

                x = 5 + 23;
                print x;
                $>

The interpretter would then produce the following output:

                28

Basic algebraic operations can be performed on real, int, or uint variables. Expressions are evaluated left-to-right, with precedence given to order of operations.

The following operators may be used: (listed in order of evaluation)

        =               assignment
        [ ]             array subscript
        ( )             parenthases (force first evaluation) (may be nested)
        * and /         multiplication and division
        + and -         addition and subtraction
        &, |, and ^     binary AND, OR, and XOR

Anywhere one can use a number, one can use a variable of the same type. The opposite is also true. Therefore, one can write the following:

                <$
                real x,y,z;

                x = 4;
                y = 2;
                z = (x + y) * (x*y) - 2/x;
                print z;
                $>

The result is:

                47.5

Sometimes you may need to store a list of numbers or strings. Such lists are called arrays. They are defined by using the 'array' keyword in conjunction with an appropriate type. Thus, if you wanted to store a list of numbers, you might type:

                <$
                real array x;
                # Use x here..
                $>

To access a member of an array, you must give the array a subscript. A subscript is a number which is an index into the array. The first member of an array is element 0. Therefore, it would be referenced: x[0]. The 50th element is x[49]. (Remember: numbers, variables, and math are inter- changable. You could also write that as x[20+29], or x[y+29], if y=20.)

By defining variables the way we just were, we leave their values a mystery. The interpretter does not automatically initialize their values. However, we can tell the interpretter to initialize them, by using an = sign. Look two examples up. This can be rewritten:

                <$
                real x=4,y=2,z;
                z = (x + y) * (x*y) - 2/x;
                print z;
                $>

Simpler, eh?

Arrays can be initialized too. If you would like to only initialize the first value of an array, you can use the same syntax you just learned. If you are initializing more than one value, however, they must be enclosed in brackets.

                <$
                real array n = 4;
                real array x = {5, 8, 245, 23, 66, 90};
                # Use the values here.
                $>

In this example, n[0] = 4, and n[1], etc, are undefined. Also, x[0] = 5, x[1] = 8... up to x[5] = 90. x[6] remains uninitialized.

Memory allocation is done completely on-the-fly. Unlike other languages, such as C, the boundaries of arrays need not be explicitly declared. Because of this, you can use as many array values as you wish. In the above example, we only defined n[0]. Later on in that script, it would be quite valid to have a line such as:

		n[1] = 6;

This would automatically allocate a new slot for n[1] if needed. Additionally, you do not need to initialize elements consecutively. It would also be valid to have a line such as:

		n[24] = n[0] + 1;

Which would allocate n[2] through n[24]. (Be careful! Don't use large values just because you can! The interpretter initializes all values up to and including the one you referenced, so it can use up memory quick! If you use an array value that you have not defined in an equation, the interpretter will create it and set it equal to zero. Therefore, you can do (But it isn't recommended):

		n[25] = n[0] + n[26];

Now, all of the variables declared in those examples above are "global" variables. Anywhere in the program, following their declaration, those variables can be used. When using functions, however, you sometimes want a fresh version of the variable for each time the function is called (especially if the function calls itself recursively!). This can be achieved using local variables. By prefacing the variable declaration with the local keyword, the variable is local to the function in which it is defined. (Note: this has no effect if not in a function.) A local variable is only visible to the function that defines it, and the variable vanishes after the function returns.

Important: local variables supercede global variables! If a global variable 'foo' is defined, and a function then declares a local instance of foo, only the local one will be seen by the function. The global one is "blocked out." (Note: some sort of namespace separator may be added in a later version.)


4. Functions

Functions are how we manipulate data. While math is done without functions, Emu scripts would be pretty boring if all we could do is assign to variables! Functions, such as 'print' allow us to do more things. Emu comes with several functions built in (a list is at the end of this manual), but you can also write your own.

A function is, basically, a command, which may or may not take data, which does something, using any data it is given.

Let's look at the print function. The print function takes one or more pieces of data (called an argument) and prints them to the screen.

The following are all legal uses of print:

                <$
                int x=4,y=7,z=12;
                print "Hello!\n";
                print x+y;
                print 26;
                print "x equals: ",x,"\n";
                print x," ",y," ",z,"\n";
                $>

As you can see, it can use variables, numbers, and strings. It can use several of these elements at a time, and it is intelligent to figure out what to do with many different types of data.

Beginning in Emu version 0.5.7, the syntax of the print statement has been simplified. Variables and strings need not be comma-deliniated arguments. Instead, variables can be referenced inside a quoted string by prefacing the variable name with &. For example, the final line of the previous script: print x," ",y," ",z,"\n";, can also be written: print "&x &y &z\n";. Function calls may not be referenced in this way (they must be called as a comma-deliniated element). If you wish to include an ampersand (&) in a quoted string without it referring to a variable, escaping it, or prefixing a backslash (\) to it (e.g.: "\&") will cause it to act as a normal character. This only applies to quoted strings directly given to print, exec, and fprint. Percent-signs in string variables do not act in this manner.

Your programs can include your own functions. Functions can be defined using the 'func' keyword. Functions start with 'func' and end with 'return', which instructs the interpretter to go back to where the function was called from.

The syntax of func is: func functionname

'func' is a special keyword, so it stops at the end of a line, not A semicolon. 'return' works the same way. These keywords behave this way because they are "block" keywords - a block of code is isolated between the 'func' and the 'return'. The 'if' and 'endif' keywords are another example of block keywords.

Look at the following program:

                <$

                func foo
                print "I'm skipped the first time..\n";
                print "But I'm here now!\n";
                return

                print "You'll see this first..\n";
                foo;
                print "And this last!\n";

                $>

The output of this looks like:

                You'll see this first..
                I'm skipped the first time..
                But I'm here now!
                And this last!

By typing 'return', one can end the function. However, there are times when you want to exit a function early. The 'retnow' keyword will do just this. Any time a 'retnow' is encountered, it is treated the same way as a 'return' - execution jumps back to the calling function. Note that 'retnow' does not define the block of code that is the function, and therefore a line containing 'retnow' terminates with a semicolon.

Functions are designed to do some task, and then return to the position where the function was called. Often this function's job is to manipulate data in some way. There therefore needs to be some way to send this data back to where the function was called from. The 'return' and 'retnow' keywords can accept one argument (either a string, or a number), which can be sent back to the calling function via the special register variable 'ret'. The argument to return/retnow is taken, and placed into 'ret'. The calling function can then save this data into a different variable.

For instance:

		<$

		func use_number
			argsrequired 1;
			int n = $0;
			n =* 2;
			# perhaps, do other complicated things to n here.
		return n

		use_number 23;

		print "The use_number function returned: ",ret,"\n";
	
		$>

(Note: this example program makes use of argsrequired and function arguments. See the next section for an explanation of how function arguments work. Essentially, they let you pass numbers and strings -to- the function. What we're learning right now is how to pass strings and numbers back -from- the function.)

Assuming that the "other complicated things" actually had no effect on the number, n, (like if you copied this example and ran it directly), the output would be:

		The use_number function returned: 46

The 'retnow' function can also return a value. Be sure to follow it with a semicolon!

Function calls can also be used inside an expression. Emu uses infix notation, meaning that a function-name and all its arguments are surrounded by parenthases. e.g., int x; x = (use_number 23);. Wherever one can use a number or a variable, one can also use a function call that returns a number. (Likewise, a function that returns a string can be used in place of a string.)


5. Function arguments:

User defined functions can receive arguments from other functions. They can access these arguments through the '$' set of pseudovariables. These variables are read-only, but can be assigned to real variables in the function.

They can be used as follows:

	        <$

        	func foo

	        print "I have recieved ",$#," arguments.\n");
        	print "The first of these is: ",$0,"\n";

	        int n3;
        	n3 = $3;
	        print "n3=",n3,"\n";

        	return

	        foo 5, 3, 121;

        	$>

Your function can require a certain number of arguments. Within the function, include the line: argsrequred n, where i is an integer indicating how many arguments the function must receive. If the function receives more or less arguments than i, it prints an error message and returns immediately.


6. Flow control:

The flow of execution - the order in which lines of code are executed - can be changed with flow control elements. These keywords allow you to skip or repeat blocks of code in your script.

One of the most common flow control elements is an if-block. This is a block of code that begins with "if" and a condition, and ends with an "endif". For instance, you may only want to execute some code if a variable equals a certain value. Let's look at this example:

		<$
		int x=5;

		# Some code here...

		if x eq 5
			print "Hey, x equals five!";
		endif

		# More code here...
	 	$>

(Note that if and endif dont end in semicolons! This is because they're "block keywords", like func and return.)

Since x equals five, that line in the middle gets executed. If the first line of the script was "int x=4;", that print statement would be skipped over.

You can test for more conditions than equality.
Here are all conditions recognized by Emu:

		condition		operator
		=================================
		equality		eq (or 'is')
		inequality		ne
		greater-than		gt
		greater-or-equal	ge
		less-than		lt
		less-or-equal		le

The equality operator has two names: eq and is. Both act identically. The is operator is included for legibility, for use with the constants 'true' and 'false', which are set to 1 and 0 by Emu automatically. (e.g., one could test: if result is false, supposing result contains the return value from a function.)

These operators all work left-to-right. For instance, to test for greater-than, one would do:

		<$
		if 4 gt 3
			# code...
		endif
		$>

If-blocks can be made more powerful through the addition of other cases. What if the if-block's condition isn't true? This is where else and elseif come in.

Let's go back to that first example. Suppose we changed x's value to 4. We could expand that if-block to:

		<$
		int x=4;

		# some code here...

		if x eq 5
			print "Hey, x equals five!";
		else
			print "Hey, x doesn't equal five! x equals ", x, "\n";
		endif
		$>

In this case, the second print-statement would execute.

Here's another idea: what if we wanted to do a number of things based on the value of x? x might be a switch with multiple settings. The elseif keyword lets us check conditions if the previous one(s) in the if-block weren't true.

Here's that example, expanded further:

		<$
		int x=4;

		# some code here...

		if x eq 5
			print "Hey, x equals five!";
		elseif x eq 1
			print "x equals one!";
		elseif x eq 3 
			print "x equals three!";
		else
			print "we should do something else entirely!\n";
		endif
		$>

This code will do a variety of things based on the value of x. The final 'else' at the end of that catches any conditions which weren't met by the if and the elseif's.

If-blocks can also be nested inside one another, to impose further restrictions on the program's flow as neccessary. Additionally, multiple conditions can be checked at the same time. Emu recognized two basic joining keywords: "and" and "or". Consider the following code:

		<$
		int x=4,y=5;

		if x eq 4
			if y eq 5
				print "hello!\n";
			endif
		endif
		$>

This can be rewritten as follows:

		<$
		int x=4,y=5;

		if x eq 4 and y eq 5
			print "hello!\n";
		endif
		$>

Both keywords operate from right to left. (Meaning, if more than one 'and' and/or 'or' statement appear in the same conditional, the rightmost gets evaluated first. To lessen ambiguity, you should use parenthases to force the order of evaluation.

This is an example of such ambiguous code:

		<$
		if x and y or z
			# something....
		endif
		$>

To clarify it for both yourself and for Emu, use parenthases:

		<$
		if (x and y) or z
			# something....
		endif
		$>

A second element of flow control is the ability to simply jump from one line to another. This is done using the 'goto' and 'label' keywords. The label keyword sets an "anchor" that can be referred to by a goto. The syntax is "label labelname". A goto command will then jump to a line that has been named with a label. (The syntax of goto, incidentally, is "goto labelname".)

For instance:

		<$
		goto bob;
		print "This is skipped!\n";
		label bob;
		print "This isn't!\n";
		$>

When Emu sees the goto command, it looks for a label with a corresponding name, and goes to it. The use of a goto will automatically break out of any loops or if-blocks. One can not goto a line that is not in the current function. (Emu will return an error.) A goto and its label must reside in the same function.

If you would like to repeat an action several times, the easiest way to do so would be inside a loop.

Emu supports two types of loops: for loops, and while loops.

A for loop is designed to run for a set number of times, by using an iterator, or "counting variable." It is ideal for use when a loop should run for a set number of times, and/or for when the loop needs to use a series of numbers, in sequence.

The syntax for the for keyword is:

		for initialization-statement, end-condition, increment-statement

That isn't very concrete; let me break it down for you.

initialization-statement - is an expression which sets up your counting variable. (e.g., x = 1)

end-condition - is an expression which is checked like an if statement. When it is true, the loop continues to run. When it is false, the loop ends. (e.g., x < 11)

increment-statement - is a statement which is run at the end of the loop, each time the loop runs through. Generally, it is used to increase the counting variable by one. (e.g., x =+ 1)

(Note: for and loop open and close a block of code, therefore, they don't need terminating semi-colons.)

A perfect use of a loop would be to count from 1 to 100. Typing "print 1; print 2; print 3..." would get very tiring, and make the program very long. Using a for loop, however, this can be done much more concisely:

		<$
		int x;
		print "The numbers from 1 to 100:\n";
		for x = 1, x le 100, x =+ 1
			print x,"\n";
		loop
		$>

This program would produce the following output:

		The numbers from 1 to 100:
		1
		2
		3
		.
		. (Skipped)
		.
		99
		100

A while loop should be used when the number of iterations of the loop is not known, and needs to be rechecked each time. (For instance, a loop may continually ask for input until it receives either a "yes" or a "no" from the user.)

The while keyword has a much simpler syntax than the for loop.

		while end-condition

The end-condition is the same syntax as in a for loop.

The following is an example of a while loop.

		<$
		str inputline;
		while inputline ne "yes" and inputline ne "no"
			GetALineOfInput;
			inputline = ret;
		loop

This example relies upon a fictitious function, GetALineOfInput, which we can assume polls the user to type "yes" or "no". If the user does not type one of those values, the while loop runs around again, and re-polls the user.

Sometimes, you need to exit a loop from the middle of the loop's body. This can be done via the break keyword. The break keyword will, when reached, automatically jump to the "loop" line that terminates the for or while loop it is in. A good application of the break statement is a loop that has several possible exit conditions, which appear at different times throughout the loop, such as the following:

	<$
	int x eq 1;
	str inputline;
	while x ne 0 
		GetALineOfInput;
		inputline = ret;
		if inputline eq ""
			break;
		endif
		UseUserInput;
		x = ret;
	loop
	$>

In that example, a series of lines were looped until some function contingent upon user input returned 0. If the user, however, did not enter any input (leaving inputline blank), the loop was automatically aborted via the break keyword.

Sometimes two or more loops are nested within one another. A break keyword may have to escape out of several levels of loops. In this case, one can pass an optional argument to break, indicating the number of nested loops to escape out of. If the argument is greater than the number of nested loops, then Emu will only escape out of the number of nested loops available.

A final element of loop-based flow control is the continue keyword. Sometimes, instead of needing to break out of a loop, you may need to make the loop skip the rest of the current iteration, but stay in the loop. The continue keyword will do just this. Like the break statement, continue can accept an argument specifying which loop in a series of nested loops it operates on. For example:

	<$
	for int x = 0; x < 10; x += 1
		for int y = 0; y < 10; y += 1
			continue 1;  # This will jump back to the "for int y..." line.
			continue 2;  # This will jump back to the "for int x..." line.
		loop
	loop
	$>


7. Function Libraries, and Expansion Library Authoring

Emu contains a set of basic "intrinsic" functions which allow for basic input, output, text manipulation, etc. Aside from these functions, many others exist in additional libraries. These libraries exist as shared object (.so) or dynamic-link library (.dll) files, depending upon if Emu is running in a linux or win32 environment.

Additional library modules can be used by your Emu script via the UseModule function. The syntax is: UseModeule "module name". This can either be an exact filename, or just the name of the module (generally part of the filename). Module files are named libemumodname.so (or .dll), and are generally placed in one of the standard lib directories (on a linux system).

You can create your own expansion libraries, as well. This allows you to use the greater power and versatility of C or C++ code in your Emu scripts. For more information, read the Library SDK Reference.


Appendix A: Built-in Functions

argsrequired int n
n is an integer indicating how many arguments the current function must receive from any calling function. If the function receives more or less arguments than n, it prints an error message and returns immediately.

append int handle
Moves the access pointer of a file (indicated by handle) to the end of the file, so that any data is written to the end of the file.

close int handle
Closes a file (identified by handle) opened with open.

clrscr
clears the screen

eof int handle
Returns 1 or 0 in ret to indicate an end-of-file condition on the file represented by handle. (1 = eof, 0 = not eof)

erase int handle
Clears the contents of a file (indicated by handle) so that data can be written to a blank file.

error str message
Prints an error message (in the message string) to the output, and is handled as a syntax error is. The string "Error: line_number: " is added to the beginning, and a newline character (\n) is added to the end automatically.

exec (varying arguments)
Passes an argument string to the shell, /bin/sh to execute. Arguments are passed identically to the print statement.

fprint int handle, (varying arguments)
Acts like the print command, except that output is directed to the file indicated by handle.

getkey
Returns a string containing the character the user last typed. This function blocks while waiting for input.

gotoxy int x, int y
moves the output cursor to coordinates (x,y).

include str filename
inserts the contents of another file verbatim at the current point of execution in the script. File inclusion may occur inside of a function. (If a file is included in the middle of a function, however, unless the included file ends the function with a return statement, it may not contain other functions, as function nesting is illegal.) The included file will, however, be treated as text -- not as script -- unless a new <$ in the included file marks it as executable script.

input [str prompt]
Returns a string containing a line of input from the user's console. If a prompt string is given, it is printed out before polling for input.

instr str major, str minor
Returns 1 if minor is found inside major, or 0 if not.

left str instring, int count
Returns a string consisting of the count left-most characters in instring.

mid str instring, int offset, int count
Returns a string consisting of count characters from instring, beginning at offset offset into the string.

open str filename
opens a file for reading and writing.
ret will contain an integer file handle number.
open returns 0 on failure.

print (varying arguments)
prints text and variables to the screen.
The print function can accept any comma-delineated list of variables, numbers, and strings, and will output them all to the screen properly. If a quoted string given to print contains an ampersand followed by a variable name, it will replace the character and the variable with the variable's contents, unless the ampersand is escaped.

read int handle, int count
Reads count characters from a file (identified by handle) opened with open.
The result is stored as a string in ret.
If count bytes do not remain in the file, then as many bytes as can be read, are read.

readline int handle
Reads a line from a file (identified by handle) opened with open.
The result is stored as a string.
A tailing newline (\n) character is chomped off. If not an entire line remains in the file, then the remainder of the file is returned.

right str instring, int count
Returns a string consisting of the count right-most characters in instring.

stop
Halts execution of the script. Emu exits gracefully.

tostr numerical expression
Returns a string representation of the numerical expression given.

toval string expression
Returns a real value consisting of the numerical representation of the string. If the string is not numerical in nature (e.g., "toast"), toval returns 0.

UseModule str filename
Includes a library of additional functions. The filename can be an exact filename, or just the name of the module itself (e.g., "mysql" for libemumysql.so.) If an exact filename is not given, then Emu will add either .so or .dll (depending on the operating system) to the end, and libemu to the beginning of the filename. It will also search the standard library paths. (In Linux, it will also search any paths indicated by the LD_LIBRARY environment variable.)


Appendix B: Operator precedence

        operator:       function:                       evaluated:
        ==========================================================
        =               assignment                      left-to-right
        [ ]             array subscript                 left-to-right
        ( )             force evaluation order          left-to-right,
                                                          with respect to nesting
        *     /         multiplication and division     left-to-right
        +     -         addition and subtraction        left-to-right
        &     |     ^   binary AND, OR, and XOR         left-to-right
        and   or        logical AND and OR              right-to-left
        eq, is, ne, gt
        ge, le, lt      comparative evaluation          right-to-left

Appendix C: Command-line Switches

        short switch:   long switch:    description:
        =================================================================
	-a              --all           Assumes that all of the file is a 
                                        script, so the file does not need
                                        to begin with <$ and close with $>.

        -b              --background    Redirects output to /dev/null
                                        and disables user input.

        -c dir          --curdir dir    Sets current directory to 'dir'.

        -o file         --output file   Redirects all output to 'file'.

        -s              --strict        All warnings are treated as errors,
                                        and Emu exits with code 1 on error.

        -q              --quiet-input   If a script is to be piped into stdin,
                                        this will suppress the user interface
                                        provided by Emu.

        -u              --usage         Displays a list of accepted switches.

        -v              --version       Displays version information.

        -w              --no-warnings   Warnings are disabled. May be used
                                        with -s.

        -                               The script is read from stdin.
                                        If piping the script in, the use of
                                        'input' or 'getkey' may cause unexpected
                                        results.
                                        (If you must poll the user, try opening
                                        /dev/console as a file and using the
                                        read and readline functions.)

                        --safe          File/device i/o is disabled. Console
                                        input is still allowed.