REBOL 3 Concepts: Extensions: Making Extensions

The previous sections defined extensions as a package that can contain both REBOL and C code and data, and showed you how to use them. This section explains how to make your own extensions.

Overview

Example extension

How extensions work

Multi-typed arguments

Accessing strings and blocks

Accessing external APIs

Summary of Prefix Names

Notes

Overview

An extension is a type of module.

Extensions easy to create. They can contain both C and REBOL code, and in less than one page of C you can create a useful extension.

Before you can use an extension, you must import it. That loads the module's code and data, including both REBOL or C.

There are four main concepts that you need to know to write your own extensions:

DLL functions	three standard functions for initializing the extension, dispatching functions, and cleanup.
init block	a text string that defines the extension and its options, variables, exports, and initialization.
commands	the native functions provided by your extension.
reb-lib	a function API for accessing REBOL datatypes, structures, and services.

Each of these will be explained in detail below.

Example extension

To give you a general idea for what a extension looks like, here is an example (written in the C language but, a similar technique can be done in any compiled language.)

#include "reb-c.h"
#include "reb-ext.h"

const char *init_block =
    "REBOL [\n"
        "Title: {Example Extension Module}\n"
        "Name: example\n"
        "Type: module\n"
        "Exports: [add-mul]\n"
    "]\n"
    "add-mul: command [{Add and multiply integers.} a b c]\n"
;

RXIEXT const char *RX_Init(int opts, RL_LIB *lib) {
    RXI = lib;
    if (!CHECK_STRUCT_ALIGN) exit(100);
    return 0;
}

RXIEXT int RX_Call(int cmd, RXIFRM *frm) {
    RXA_INT64(frm, 1) =
        (RXA_INT64(frm, 1) + RXA_INT64(frm, 2)) *
        RXA_INT64(frm, 3);
    return RXR_VALUE;
}

After compiling that code into a DLL, you can use it in your REBOL code:

import %example.dll

print add-mul 1 2 3
9

The speed of function evaluation is about the same as other REBOL native functions. (Normally within 5%.)

How extensions work

As shown above, a extension is a dynamically loaded library (DLL). When the extension is loaded by REBOL, it expects to find one or more pre-defined function names.

RX_Init	called when the extension has been loaded. The purpose is to provide any special option flags as well as a pointer to the extension library (RL_LIB).
RX_Quit	called when the extension is no longer needed. This is optional.
RX_Call	dispatches the native command functions defined by the extension. This function is passed the command number and an array that holds the command's arguments (called the command frame.)

After the DLL has been loaded, its RX_Init function will be called. If the RX_Init function cannot be found in the DLL, REBOL will throw an error that the extension is not valid.

The RX_Init function will perform these actions:

set the RL variable to be used to access library functions.
verify that the lib version number is what you expect. If it is not, then your code should not attempt to continue.
return a pointer to a string (ASCII or UFT-8) that provides the extension module identification and initialization code. If an error occurred, a zero is returned. (Later we may allow an error string here).

The init string is REBOL source similar to that used to define modules. It can define functions (both internal and exported), variables, strings, or other data used by your extension.

In the code example above, the init_block holds this source:

REBOL [
    Title: {Example Extension Module}
    Name: example
    Type: module
    Exports: [add-mul]
]
add-mul: command [{Add and multiply integers.} a b c]

Although we use the quote mechanism of C to embed it, you can use any technique you want, as long as what is returned is a valid ASCII or UTF-8 string.

Command functions

The native functions defined within a extension are called commands. They are similar to the native functions found in REBOL, and evaluate at the full speed of the CPU.

Each command has two parts:

spec	the interface specification (in REBOL format) that provides a help string (title) and lists the arguments for the function.
body	the C code that makes the command do its job.

In the example above, the spec for the add-mul command was defined by this line:

add-mul: command [{Add and multiply integers.} a b c]

You will note that this is identical to the function definition methods used throughout REBOL. And, it should be noted that the command word is a specially defined function itself, similar to func and function used for defining other functions. More information about how command works is described below.

The body of the add-mul function is found in this code:

RXIEXT int RX_Call(int cmd, RXIFRM *frm, REBCEC *ctx) {
    RXA_INT64(frm, 1) =
        (RXA_INT64(frm, 1) + RXA_INT64(frm, 2)) *
        RXA_INT64(frm, 3);
    return RXR_VALUE;
}

The details of the RXIFRM structure will be explained below. Also, this example is a bit simplistic because the extension only handles a single command (add_mul). More examples will be shown below.

Qualifying arguments

In the code above, the add-mul command arguments have no datatype qualifier; however, for most code you will want to provide a list of one or more valid datatypes. This makes it possible for the datatype to be verified prior to calling your native code. It also makes error messages easier to understand.

For example, here is a better definition for the add-mul command:

add-mul: command [
    {Add and multiply integers.}
    a [integer!]
    b [integer!]
    c [integer!]
]

If an attempt is made to pass a datatype other than integer, the normal error message will be thrown.

You can also accept multiple datatypes for the arguments of your function. For example, if you want to accept integer and decimal:

add-mul: command [
    {Add and multiply integers.}
    a [integer! decimal!]
    b [integer! decimal!]
    c [integer! decimal!]
]

Of course, now the C code body of your function will need to check which datatype is being passed.

The datatypes allowed for commands are listed in the Datatypes section below.

Command dispatching

Within the DLL, the RX_Call function dispatches command functions. For extensions with only a few commands, all of the related code can be put into the same RX_Call function. For extensions with many commands, you may want to build a function table and redirect to sub-functions.

In the arguments to RX_Call the cmd arg provides the index number for the command, and you can use if or switch statements to process the correct command. If you only have a few commands, if is probably faster. If you have several commands, switch will be faster.

RXIEXT int RX_Call(int cmd, RXIFRM *frm, REBCEC *ctx) {
    if (cmd == 0) {
    }
    else if (cmd == 1) {
    }
    ...
}

RXIEXT int RX_Call(int cmd, RXIFRM *frm, REBCEC *ctx) {
    switch (cmd) {
    case 0:
        <command code>
        break;
    case 1:
        <command code>
        break;
    case 2:
        ...
    }
}

If you have a larger number of commands, you will want to create an enum to help relate command numbers to their function names.

Argument access

Command arguments are passed to RX_Call in an argument frame (a structure) accessed via the frm pointer which is of the RXIFRM type.

A frame consists of two parts:

types	a byte array of datatypes. The zeroth byte provides the number of arguments. The size of this array is the number of arguments rounded up to a multiple of eight. Normally, it only occupies 64 bits (enough to support seven function arguments.)
values	64 bit values. The format of each value is dependent on the argument's datatype. For example, if the datatype is an integer, it's value is a 64 bit integer. If the datatype is a decimal, the value is a 64 bit IEEE float (double). The RXIARG typedef provides a union to properly access each type of value.

Graphically, a frame looks like this:

Command frame
type array (64 bits)
argument 1 (64 bits)
argument 2 (64 bits)
argument 3 (64 bits)
...

To make it easier to access argument related information, macros are provided:

RXA_COUNT(frm)      returns the arg count
RXA_TYPE(frm,n)     returns the datatype for the n-th arg

To access a specific argument, such as an integer, you write:

RXA_INT64(frm, n)

Where the value of n normally begins with 1 (because the 0 slot is the type array)

Here is a list of these datatype specific macros:

RXA_INT64(f,n)      integer!
RXA_DEC64(f,n)      decimal! and percent!
RXA_LOGIC(f,n)      logic!
RXA_CHAR(f,n)       char! (32 bits)
RXA_TIME(f,n)       time!
RXA_DATE(f,n)       date! (encoded)
RXA_WORD(f,n)       word! (all)
RXA_PAIR_X(f,n)     pair!
RXA_PAIR_Y(f,n)     pair!
RXA_TUPLE(f,n)      tuple!
RXA_SERIES(f,n)     series! (reference)
RXA_INDEX(f,n)      series! (index)
RXA_HANDLE(f,n)     any pointer (32 bit address)

In addition, this macro is provided:

RXA_REF(f,n)        refinement flag

Refinements are discussed below.

Multi-typed arguments

Similar to other functions, commands can accept multiple datatypes for a single argument. Within your C code you will need to be able to detect which datatype has been passed, and access its value properly.

Here is an example command that allow both an integer and a decimal for its argument:

cmd: command [n [integer! decimal!]]

The body code would be something like:

RXIEXT int RX_Call(int cmd, RXIFRM *frm, REBCEC *ctx) {
    i64 i;
    d64 d;

    if (cmd == 1) {
        if (RXA_TYPE(frm, 1) == RXT_INTEGER) {
            i = RXA_INT64(frm, 2);
        }
        else {
            d = RXA_DEC64(frm, 1);
        }
        ...
    }
}

Note that the i64 and d64 are general typedefs used to abstract compiler differences (e.g. on older MSVC the use of _int64 for 64 bit integers.)

Many examples are provided in the extensions: example extensions section.

Refinements

As with other functions, commands are allowed to accept refinements are arguments. Such refinements are passed as normal arguments with a value of none or true. A simple test will determine if the refinement has been specified.

For example, if you write a special trigonometric function, you may want to provide a refinement to specify either radians rather than degrees:

hyper-sine: command [d [decimal!] /radians]

This code will handle the refinement flag:

d = RXA_DEC64(frm, 1);
if (RXA_REF(frm, 2)) rads = TRUE;
...

Note that you do not need to check the datatype of the argument. The REBOL extension caller will assure that none has a zero 32 bit value, and that true has a non-zero 32 bit value.

Command results

The integer return code from RX_Call determines what the command returns. Like other functions, a command can return none, one, or multiple results.

An enum of results is defined. The constants are:

RXR_UNSET	Do not return a value.
RXR_NONE	A shortcut for returning NONE.
RXR_TRUE	A shortcut for returning TRUE.
RXR_FALSE	A shortcut for returning FALSE.
RXR_VALUE	Return a single value (that found in the arg[1] position).
RXR_BLOCK	A shortcut method to return multiple values. See below.
RXR_ERROR	Return an error (special case.)
RXR_BAD_ARGS	Throws the error: Bad command arguments. This is a generic result you can return for errors in simple functions.
RXR_NO_COMMAND	Throws the error: The command at that index is not implemented.

The first few are shortcuts to make your code simpler and smaller for such cases.

RXR_VALUE indicates that you want to return the first argument of the frame as the result using its indicated datatype.

For example, take this code that adds the first and second argument, then returns the first:

RXA_INT64(frm, 1) += RXA_INT64(frm, 2);
return RXR_VALUE;

As a variation, in this code the arguments are integers, but it returns a decimal result:

RXA_DEC64(frm, 1) = (d64)(RXA_INT64(frm, 1) + RXA_INT64(frm, 2));
RXA_TYPE(frm, 1) = RXT_DECIMAL;
return RXR_VALUE;

When multiple results are needed, the command must return a block. Often, you command will only need to return just a few values, so a shortcut technique is provided.

If you store your results within the argument slots of the frame, and also set their datatypes within the type array, they will be considered a block if you return the RXR_BLOCK return code. You must also indicate how many values are within the block.

Here's an example that returns three values, an integer, decimal, and a time:

RXA_COUNT[frm] = 3;

RXA_INT64(frm, 1) = 1;
RXA_TYPE(frm, 1) = RXT_INTEGER;

RXA_INT64(frm, 3) = 2.2;
RXA_TYPE(frm, 2) = RXT_DECIMAL;

RXA_INT64(frm, 3) = 1200000000;
RXA_TYPE(frm, 3) = RXT_TIME;

return RXR_BLOCK;

You can only return up to seven values in this way. Beyond that, you must use the RL_Make_Block function and append each value into the new block.

Extended frames

The examples shown above are valid for the most common command frame, those with less than seven arguments. It is very rare to require more than seven arguments to a function, and in general programming practice, if you find that necessary, then it may be better to pass your arguments encapsulated within a block.

Although the initial implementation of commands does not support extended frames, we may add it in the future if it seems important for some reason.

For frames larger than seven arguments, the type array is expanded in increments of 8 bytes. This means that argument references would be shifted by the appropriate amount. To better abstract such offsets, new macros would be provided to account for those offsets.

How COMMAND Works

As described above the command word is a special function that creates new command functions within an extension module.

Basically, command calls make on the command! datatype, in the general form:

make command! reduce [args module index]

where:

args	is the argument spec for the new function.
module	is the extension module context and is used to reference back to the extension dispatcher.
index	is the dispatch index for a specific command.

You can directly create commands using this make method; however, in addition to the argument spec, you will need to provide the module and correct dispatch index each time.

To make extension modules easier to read, the command function method was created. This function is defined within the context of the extension module allowing the module argument to be implied (with self.) In addition, the command dispatch index can be a module local variable that is auto-incremented for each new command.

This mechanism simplifies command definitions and requires very little code to do so.

Here's the code that the module system automatically inserts into each module:

cmd-index: 0
command: func [
    "Define a new command for an extension."
    args [block!]
][
    make command! reduce [args self ++ cmd-index]
]

To work properly, this code must be bound to the context of the module. That is why it resides within the module itself.

It should be noted that other fields are also inserted into the extension module. The system/standard/extension object defines those fields and is used by the system's [bad-link:functions/load-extension.txt] native function.

Datatypes supported

These datatypes are currently supported for commands.

Immediate datatypes

Name	Description
logic	An integer representing TRUE and FALSE.
integer	A 64-bit integer.
decimal	64-bit IEEE floating point (double).
percent	64-bit IEEE floating point (double).
char	A character as a 32 bit code point.
pair	Two 32 bit signed integers for x and y.
tuple	A length byte followed by seven bytes. (Note truncation.)
time	A 64 bit time in nano-seconds.
date	A 32 bit encoded date and time zone.
word	A 32 bit identifier for a word.
set-word	A 32 bit identifier for a word.
get-word	A 32 bit identifier for a word.
lit-word	A 32 bit identifier for a word.
refinement	A 32 bit identifier for a word.

Series datatypes

The series datatypes are indirect datatypes and can be divided into these general groups:

Group	Description
strings	Including: string, file, email, url, tag, and issue.
blocks	Including: block, paren, path, set-path, get-path, and lit-path.
special	Including: binary, bitset, image, and vector.

Special datatypes

A few special datatypes are also allowed:

Name	Description
unset	Means that a variable is not initialized or a function returned no result.
none	No value. (For example, a find found no match.)
handle	A way to store code and data pointers.

Referencing words

Out of date

This section is out of date and needs revision.

Within extensions it can be quite useful to access words as symbols. For example, if you are writing an extension that has it's own special control dialect, you will want to easily handle the words that are part of it. (If you were familiar with AREXX in AmigaOS, then you know what can be done with just little programming effort.)

There are generally two ways to use a word! type:

symbols	words that represent themselves (the word itself is the meaning)
variables	words used to represent storage

In the R3 1.0 extension interface, words are supported as symbols only.

When you specify your extension, within its module initialization, define a block of words. Later within your code, the word will be indicated by its index within that block.

For example, if within your init block you define:

words: [jpeg mpeg gif tiff]
resize-image: command [img [image!] 'action [word!]]

then you can use this C code to determine which word was passed:

switch (RXA_WORD(frm, 2)) {
case 1: // jpeg
    ...
case 2: // mpeg
    ...
case 3: // gif
    ...
case 4: // tiff
    ...
}

This same technique can be used for words found in blocks. (See block value access below.)

Now, writing:

resize-image data 'gif

will enter the case 3 code above. (Of course, this can also be done using [bad-link:datatypes/refinements.txt], see earlier notes.)

Accessing strings and blocks

The extension library provides functions for accessing and creating strings and blocks. These functions are access via macros that use the library pointer passed in RX_Init.

RL_MAKE_BLOCK	make a new block of given length
RL_MAKE_STRING	make a new string of given length and width
RL_MAP_WORDS	map a block of words to their canonical symbol identifiers
RL_FIND_WORD	find word in an array of symbol identifiers
RL_SERIES_INFO	get series info: length, size, etc.
RL_GET_CHAR	get a char from a string
RL_SET_CHAR	set a char in a string
RL_GET_VALUE	get a value from a block
RL_SET_VALUE	set a value in a block
RL_GET_STRING	get string as an array

It is likely that more functions will be added as needed.

Note: allocation GC concerns

Accessing external APIs

If you write an extension to accesses external APIs including standard OS libraries, you will need to be careful. R3 in general uses an asynchronous model for I/O. If you call APIs that perform I/O which may block, then your REBOL process will also block during that I/O. This cause your GUI to block or for other pending I/O operations to overflow or fail.

If the external API does not block, then it's probably fine to call it. However, for blocking functions, a better solution is to write them as an asynchronous R3 device. This is a special type of extension. (As of this 1.0 draft release, this is not available, we want to make you aware of it.)

Summary of Prefix Names

RX_	the main functions of the DLL itself (not the API)
RL_	functions in the reb-lib (REBOL library)
RXT_	type (datatype) identifiers for command arguments
RXA_	command argument access macros
RXR_	command return codes

Notes

Editor note: pending

Editor note: mention make-host-ext tool for building the body of the module

dealing with handles

Pending features: codecs, devices

Output a DLL.

equivalent to:

make command! [specs extension-handle func-num]

More information about this will be available in the advanced section.

The method of using a single RX_Call entry point for all command functions was decided because it gives you a central location to setup your code's "environment" variables as well as a place to put debugging breakpoints or your own trace output.
We use this name to differentiate it from function!, native!, action!, and other REBOL function-oriented datatypes.

REBOL 3 Concepts: Extensions: Making Extensions

Contents

Overview

Example extension

How extensions work

Command functions

Qualifying arguments

Command dispatching

Argument access

Multi-typed arguments

Refinements

Command results

Extended frames

How COMMAND Works

Datatypes supported

Immediate datatypes

Series datatypes

Special datatypes

Referencing words

Accessing strings and blocks

Accessing external APIs

Summary of Prefix Names

Notes