REBOL 3 Docs Guide Concepts Functions Datatypes Errors
  TOC < Back Next >   Updated: 26-Oct-2009 Edit History  

REBOL 3 Datatypes: Bitset!

Contents

Concept

Bitsets are bit-based data sets, logic! bitmaps. They map collections of integer and character values to true and false.

For example, bitsets define character classes used with the parse or find functions. They are also commonly used for allocation maps and search hash markers.

More information about bitsets, including details of changes from R2 to R3 can be found on the DocBase Bitsets page.

Creating bitsets

New bitsets are created with make, which accepts several datatypes as arguments, including a bitset specification block as described below. Bitsets can also be created from copy and complement.

Using MAKE

The standard method of creating bitsets is by providing make with a datatype and argument:

bits: make bitset! arg

The argument can be one of several datatypes, providing a range of possible results.

If arg is an integer!, a bitset of at least that size in bits is allocated:

bits: make bitset! 1000

If arg is a binary!, the bitset is initialized according to the bits that are set within its bytes. This is common when mold was used to output the bitset or to-binary was used to convert it to binary.

bits: make bitset! #{0F0016}

In addition, the spec can be a char!, any string!, or a block!.

bits: make bitset! #"A"
bits: make bitset! "ABC"
bits: make bitset! [...]  ; see below
Zero based

Unlike other series datatypes, bitsets are zero-based. This allows the NULL character to be included in the bitset and tested:

if find bits #"^(null)" [...]

Unicode

Bitsets are not limited to just ASCII characters. Unicode characters can also be specified.

You can write them directly within the UTF-8 encoding of REBOL source, or you can write them as:

other: make bitset! [#"^(02FF)" - #"^(0FFF)"]

or even as integers:

other: make bitset! [612 - 990]

Bitsets will only allocate as many bits as necessary to reach the upper limit of the specified range. The bitset example above will require 124 bytes of memory storage.

The CHARSET helper

Because bitsets are so commonly used for character maps, the charset helper function is also provided for creating bitsets.

digits: charset "0123456789"
alpha: charset [#"A" - #"Z" #"a" - #"z"]

Bitset specification block

If a block is provided for make (and also several of the other bitset-related functions) it represents bit settings in the following way:

Accepted datatypes

Datatype Example Description
integer! 123 the bit associated with the integer 123 (or part of a range)
char! #"a" the bit associated with character code "a" (or part of a range)
string! "abc" multiple bits, each character specifies a bit
binary! #{010203} multiple bits, each byte specifies a single bit to be set

Bit ranges

Note that characters and integers can specify inclusive ranges using a dash (-). Do not forget the spaces around it. Spaces are required to delimit it.

[0 - 99]       ; bits 0 to 99 inclusive
[#"A" - #"Z"]  ; bits associated with A to Z, inclusive

More examples are shown below.

Complement (not) bitsets

Bitsets can be complements, meaning that each bit is inverted (a logical not.) In order to specify such bitsets, a specification block can begin with the keyword not.

Here are examples:

no-space: make bitset! [not #" "]
no-whitespace: make bitset! [not "^-^/ "]
no-ctrl-space: make bitset! [not 0 - #" "]

Binaries as bits

It should be noted that these two lines are not equivalent:

make bitset! #{01FF}
make bitset! [#{01FF}]

The first sets bits 7 through 15. The second sets bits 1 and 255.

This difference occurs because the mold function outputs bitsets in a binary form that indicates what specific bits are set (the first line.)

However, within a block, this binary form can also be used if the bits keyword is used. These lines are equivalent:

make bitset! #{0060000080}
make bitset! [bits #{0060000080}]

And for the complement, these lines are also equivalent:

complement make bitset! #{0060000080}
make bitset! [not bits #{0060000080}]
MOLD output

When mold is used on a complemented bitset, in order to preserve the complement, the bitset must be output as a block in the form:

make bitset! [not bits #{...}]

This is necessary in order to include the complement indication.

Examples

Create a bitset with the bit associated with #"a" set:

bits: make bitset! #"a"

Create a bitset large enough to holds bits up to 1000 and set bit 1000:

bits: make bitset! [1000]  ; note that bit 1000 is set

Create a bitset large enough to hold the bits for each character of "AxZ3?" and set each of those:

bits: make bitset! "AxZ3?"

Note that upper and lower case bits are different.

A block allows multiple bits to be specified using one or more combination of the above. For example:

bits: make bitset! [#"?" #"A" - #"Z" "!@#$" 201 - 220]

If the contents of bitset block needs to be computed, use reduce or compose first. For example:

start: 200
len: 50
bits: make bitset! reduce [start '- start + len]

Supported actions

These datatype actions are defined for the bitset! datatype:

Action Description
make creates and returns a new bitset
copy returns a copy of a bitset
complement inverts each bit; returns a new bitset
find test that value is set
append add new bits to the set (set them to true)
poke set one or more bits true or false
remove remove specific bits from the bitset (requires /part refinement)
clear clear entire bitset
length? returns the number of bits used
and bitwise and of two bitsets; returns new bitset
or bitwise or of two bitsets; returns new bitset
xor bitwise xor of two bitsets; returns new bitset

In addition, these actions are identical:

Action Same as
to make
insert append
pick find
negate complement

These comparisons are supported:

Action Description
equal? the bitsets are equal
not-equal? the bitsets are not equal
same? the bitsets use the same memory storage
tail? provided to allow empty? to work for bitsets
zero? always returns false (because bitsets are not scalar values)

Using bitsets

When you use a bitset, you normally want to:

Checking bits

To check for one or more bits in a bitset most of the time you will use the find function.

find the-bitset some-bits

It's quite common for find to be part of a conditional expression, such as:

if find the-bitset some-bits [... do something ...]

Note that while less common, you can also use pick or a path selection in a similar way.

Examples of character bitsets

Examples of character-oriented:

alpha: make bitset! [#"A" - #"Z" #"a" - #"z"]

if find alpha #"a" [...]

if find alpha "abc" [...]

if find alpha [#"a" - #"g"] [...]

if find alpha [":;?" #"a" - #"g" 3 - 20] [...]

Examples using integer bitsets

bits: make bitset! [1 4 9 - 16 25 - 36]

if find bits 42 [...]

if find bits [20 - 29] [...]

if find bits [#"A" - #"Z" ".,?" 20 - 29] [...]

Examples using path selection

if alpha/#"a" [...]

if alpha/"abc" [...]

if bits/12 [...]

Bitsets for character classes

Bitsets are often used to create character classes (e.g. digits, alpha, alpha-numeric, punctuation) to search for multiple characters at the same time:

invalid-chars: charset [0 - #" " ",.;:?"]
if find name invalid-chars [...]

Such character classes are also used in parse rules:

digits: charset "0123456789"
alpha: charset [#"A" - #"Z" #"a" - #"z"]
if parse name [some alpha any digits] [...]

See parse section for details.

Character casing

Editor note: specification pending

Modification

You can modify the bits specified by a bitset in a few ways.

Appending bits

You can add bits to a bitset with the append function:

spaces: make bitset! " "
append spaces "^-^/"

The append function will also accept a bitset specification block:

letters: make bitset! [#"A" - #"Z"]
append letters [#"a" - #"z"]

The insert function is synonymous with append.

Removing bits

You can remove bits from a bitset with the remove function. In order to specify the bits to be removed, the /part refinement is necessary.

spaces: make bitset! "^-^/ "
remove/part spaces #" "

The remove function will also accept a bitset specification block:

letters: make bitset! [#"A" - #"Z" #"a" - #"z"]
remove/part letters [#"a" - #"z"]

If /part is not given, an error will result:

remove spaces #" "
** Script error: missing a required argument or refinement

Also, note that remove only "unsets" bits, it does does not modify the size of the bitset.

Clearing all bits

The clear function will clear all bits for a bitset:

bits: make bitset! "abc"
clear bits
append bits "d"
probe bits
make bitset! #{00000000000000000000000008}

Note that a complemented bitset remains complemented. The clear function does not reset the flag.

Complementing bitsets

As mentioned above, bitsets can be complemented. A complement provides the logical not of each bit.

Complemented bitsets are nothing more than a normal bitset with a complement flag used to indicate that the bits are inverted.

To create a complement bitset, use the complement function. It returns a new bitset with the complement flag set:

no-space: complement make bitset! #" "

if find no-space "a" [print "not a space"]
not a space

Other examples:

no-whitespace: complement make bitset! "^-^/ "
no-ctrl-space: complement make bitset! [0 - #" "]

If you mold a complemented bitset, you will see a not indicator:

print mold no-space
make bitset! [not bits #{0000000080}]

For more information, see the notes in the above sections.

Logical operations

You can use and, or, and xor to create new bitsets.

For example:

b1: make bitset! "abc"
b2: make bitset! "cdef"

probe b1 or b2
make bitset! #{0000000000000000000000007E}
probe b1 and b2
make bitset! #{00000000000000000000000010}

Note that the resultant bitset will always be minimized (trimmed):

b3: make bitset! [0 30 60]
b4: make bitset! [0 1 2]

probe b3 and b4
make bitset! #{80}

Using set-path notation

Bitsets can also be modified using path notation.

bits: make bitset! "abc"

bits/#"d": true
bits/#"c": false
bits/0: true

Variables can also be used:

s: "abc"
bits/:s: false

Special notes

Virtual length

You have always been able to define bitsets of any reasonable length, but now the length is more automatic:

For example, in R2 if you created a bitset:

bits: make bitset! 160

a 160 bit set would be created. However, if you wrote:

space: make bitset! " ^-^/"

a 256 bitset would be created (because it assumed you wanted an 8-bit character-based bitset).

In R3, this is no longer true. A bitset is only as long as needed to hold its maximum bit value. The line:

space: make bitset! " ^-^/"

creates a 40 bit set that looks like this:

make bitset! #{0060000080}

The bitset is only as long as it needs to be.

If you check for a bit outside its range, such a request is valid:

if find space "a" [print "found"]

In R2, this line would have caused an error, but it is fine in R3. All virtual bits are zero.

In addition, a bitset will auto-extend as needed. For example:

append space ".:;"

will expand the space bitset to be:

make bitset! #{006000008002}

Again, it is only as long as needed to hold the largest bit.

The above examples show only string bit values, but these rules apply to bitsets that are accessed with CHAR! and INTEGER! bit values as well.

Binary conversion

The binary representation for bitsets has changed compared to R2.

It is now left-to-right bit continuous (as if bits were written in binary as big-endian values.)

For example, in R3:

>> make bitset! "abcd"
== make bitset! #{00000000000000000000000078}

but, in R2:

>> make bitset! "abcd"
== make bitset! #{
0000000000000000000000001E00000000000000000000000000000000000000
}

Also, complemented bitsets do not use complemented bitmasks.

>> to-binary complement make bitset! " "
== #{0000000080}

For such conversions, the complement state is lost (similar to how width and height are lost for a conversion of an image! to binary!.)

If you want to preserve that information, you must use mold.


  TOC < Back Next > REBOL.com - WIP Wiki Feedback Admin