REBOL 3 Docs | Guide | Concepts | Functions | Datatypes | Errors |
TOC < Back Next > | Updated: 26-Oct-2009 Edit History |
Bitsets are bit-based data sets, logic! bitmaps. They map collections of integer and character values to true and false.
For example, bitsets define character classes used with the parse or find functions. They are also commonly used for allocation maps and search hash markers.
More information about bitsets, including details of changes from R2 to R3 can be found on the DocBase Bitsets page.
New bitsets are created with make, which accepts several datatypes as arguments, including a bitset specification block as described below. Bitsets can also be created from copy and complement.
The standard method of creating bitsets is by providing make with a datatype and argument:
bits: make bitset! arg
The argument can be one of several datatypes, providing a range of possible results.
If arg is an integer!, a bitset of at least that size in bits is allocated:
bits: make bitset! 1000
If arg is a binary!, the bitset is initialized according to the bits that are set within its bytes. This is common when mold was used to output the bitset or to-binary was used to convert it to binary.
bits: make bitset! #{0F0016}
In addition, the spec can be a char!, any string!, or a block!.
bits: make bitset! #"A" bits: make bitset! "ABC" bits: make bitset! [...] ; see below
Bitsets are not limited to just ASCII characters. Unicode characters can also be specified.
You can write them directly within the UTF-8 encoding of REBOL source, or you can write them as:
other: make bitset! [#"^(02FF)" - #"^(0FFF)"]
or even as integers:
other: make bitset! [612 - 990]
Bitsets will only allocate as many bits as necessary to reach the upper limit of the specified range. The bitset example above will require 124 bytes of memory storage.
Because bitsets are so commonly used for character maps, the charset helper function is also provided for creating bitsets.
digits: charset "0123456789" alpha: charset [#"A" - #"Z" #"a" - #"z"]
If a block is provided for make (and also several of the other bitset-related functions) it represents bit settings in the following way:
Datatype | Example | Description |
---|---|---|
integer! | 123 | the bit associated with the integer 123 (or part of a range) |
char! | #"a" | the bit associated with character code "a" (or part of a range) |
string! | "abc" | multiple bits, each character specifies a bit |
binary! | #{010203} | multiple bits, each byte specifies a single bit to be set |
Note that characters and integers can specify inclusive ranges using a dash (-). Do not forget the spaces around it. Spaces are required to delimit it.
[0 - 99] ; bits 0 to 99 inclusive [#"A" - #"Z"] ; bits associated with A to Z, inclusive
More examples are shown below.
Bitsets can be complements, meaning that each bit is inverted (a logical not.) In order to specify such bitsets, a specification block can begin with the keyword not.
Here are examples:
no-space: make bitset! [not #" "] no-whitespace: make bitset! [not "^-^/ "] no-ctrl-space: make bitset! [not 0 - #" "]
It should be noted that these two lines are not equivalent:
make bitset! #{01FF} make bitset! [#{01FF}]
The first sets bits 7 through 15. The second sets bits 1 and 255.
This difference occurs because the mold function outputs bitsets in a binary form that indicates what specific bits are set (the first line.)
However, within a block, this binary form can also be used if the bits keyword is used. These lines are equivalent:
make bitset! #{0060000080} make bitset! [bits #{0060000080}]
And for the complement, these lines are also equivalent:
complement make bitset! #{0060000080} make bitset! [not bits #{0060000080}]
Create a bitset with the bit associated with #"a" set:
bits: make bitset! #"a"
Create a bitset large enough to holds bits up to 1000 and set bit 1000:
bits: make bitset! [1000] ; note that bit 1000 is set
Create a bitset large enough to hold the bits for each character of "AxZ3?" and set each of those:
bits: make bitset! "AxZ3?"
Note that upper and lower case bits are different.
A block allows multiple bits to be specified using one or more combination of the above. For example:
bits: make bitset! [#"?" #"A" - #"Z" "!@#$" 201 - 220]
If the contents of bitset block needs to be computed, use reduce or compose first. For example:
start: 200 len: 50 bits: make bitset! reduce [start '- start + len]
These datatype actions are defined for the bitset! datatype:
Action | Description |
---|---|
make | creates and returns a new bitset |
copy | returns a copy of a bitset |
complement | inverts each bit; returns a new bitset |
find | test that value is set |
append | add new bits to the set (set them to true) |
poke | set one or more bits true or false |
remove | remove specific bits from the bitset (requires /part refinement) |
clear | clear entire bitset |
length? | returns the number of bits used |
and | bitwise and of two bitsets; returns new bitset |
or | bitwise or of two bitsets; returns new bitset |
xor | bitwise xor of two bitsets; returns new bitset |
In addition, these actions are identical:
Action | Same as |
---|---|
to | make |
insert | append |
pick | find |
negate | complement |
These comparisons are supported:
Action | Description |
---|---|
equal? | the bitsets are equal |
not-equal? | the bitsets are not equal |
same? | the bitsets use the same memory storage |
tail? | provided to allow empty? to work for bitsets |
zero? | always returns false (because bitsets are not scalar values) |
When you use a bitset, you normally want to:
To check for one or more bits in a bitset most of the time you will use the find function.
find the-bitset some-bits
It's quite common for find to be part of a conditional expression, such as:
if find the-bitset some-bits [... do something ...]
Note that while less common, you can also use pick or a path selection in a similar way.
Examples of character-oriented:
alpha: make bitset! [#"A" - #"Z" #"a" - #"z"] if find alpha #"a" [...] if find alpha "abc" [...] if find alpha [#"a" - #"g"] [...] if find alpha [":;?" #"a" - #"g" 3 - 20] [...]
bits: make bitset! [1 4 9 - 16 25 - 36] if find bits 42 [...] if find bits [20 - 29] [...] if find bits [#"A" - #"Z" ".,?" 20 - 29] [...]
if alpha/#"a" [...] if alpha/"abc" [...] if bits/12 [...]
Bitsets are often used to create character classes (e.g. digits, alpha, alpha-numeric, punctuation) to search for multiple characters at the same time:
invalid-chars: charset [0 - #" " ",.;:?"] if find name invalid-chars [...]
Such character classes are also used in parse rules:
digits: charset "0123456789" alpha: charset [#"A" - #"Z" #"a" - #"z"] if parse name [some alpha any digits] [...]
See parse section for details.
Editor note: specification pending
You can modify the bits specified by a bitset in a few ways.
You can add bits to a bitset with the append function:
spaces: make bitset! " " append spaces "^-^/"
The append function will also accept a bitset specification block:
letters: make bitset! [#"A" - #"Z"] append letters [#"a" - #"z"]
The insert function is synonymous with append.
You can remove bits from a bitset with the remove function. In order to specify the bits to be removed, the /part refinement is necessary.
spaces: make bitset! "^-^/ " remove/part spaces #" "
The remove function will also accept a bitset specification block:
letters: make bitset! [#"A" - #"Z" #"a" - #"z"] remove/part letters [#"a" - #"z"]
If /part is not given, an error will result:
remove spaces #" " ** Script error: missing a required argument or refinement
Also, note that remove only "unsets" bits, it does does not modify the size of the bitset.
The clear function will clear all bits for a bitset:
bits: make bitset! "abc"
clear bits
append bits "d"
probe bits
make bitset! #{00000000000000000000000008}
Note that a complemented bitset remains complemented. The clear function does not reset the flag.
As mentioned above, bitsets can be complemented. A complement provides the logical not of each bit.
Complemented bitsets are nothing more than a normal bitset with a complement flag used to indicate that the bits are inverted.
To create a complement bitset, use the complement function. It returns a new bitset with the complement flag set:
no-space: complement make bitset! #" "
if find no-space "a" [print "not a space"]
not a space
Other examples:
no-whitespace: complement make bitset! "^-^/ " no-ctrl-space: complement make bitset! [0 - #" "]
If you mold a complemented bitset, you will see a not indicator:
print mold no-space
make bitset! [not bits #{0000000080}]
For more information, see the notes in the above sections.
You can use and, or, and xor to create new bitsets.
For example:
b1: make bitset! "abc"
b2: make bitset! "cdef"
probe b1 or b2
make bitset! #{0000000000000000000000007E}
probe b1 and b2
make bitset! #{00000000000000000000000010}
Note that the resultant bitset will always be minimized (trimmed):
b3: make bitset! [0 30 60]
b4: make bitset! [0 1 2]
probe b3 and b4
make bitset! #{80}
Bitsets can also be modified using path notation.
bits: make bitset! "abc" bits/#"d": true bits/#"c": false bits/0: true
Variables can also be used:
s: "abc" bits/:s: false
You have always been able to define bitsets of any reasonable length, but now the length is more automatic:
For example, in R2 if you created a bitset:
bits: make bitset! 160
a 160 bit set would be created. However, if you wrote:
space: make bitset! " ^-^/"
a 256 bitset would be created (because it assumed you wanted an 8-bit character-based bitset).
In R3, this is no longer true. A bitset is only as long as needed to hold its maximum bit value. The line:
space: make bitset! " ^-^/"
creates a 40 bit set that looks like this:
make bitset! #{0060000080}
The bitset is only as long as it needs to be.
If you check for a bit outside its range, such a request is valid:
if find space "a" [print "found"]
In R2, this line would have caused an error, but it is fine in R3. All virtual bits are zero.
In addition, a bitset will auto-extend as needed. For example:
append space ".:;"
will expand the space bitset to be:
make bitset! #{006000008002}
Again, it is only as long as needed to hold the largest bit.
The above examples show only string bit values, but these rules apply to bitsets that are accessed with CHAR! and INTEGER! bit values as well.
The binary representation for bitsets has changed compared to R2.
It is now left-to-right bit continuous (as if bits were written in binary as big-endian values.)
For example, in R3:
>> make bitset! "abcd" == make bitset! #{00000000000000000000000078}
but, in R2:
>> make bitset! "abcd" == make bitset! #{ 0000000000000000000000001E00000000000000000000000000000000000000 }
Also, complemented bitsets do not use complemented bitmasks.
>> to-binary complement make bitset! " " == #{0000000080}
For such conversions, the complement state is lost (similar to how width and height are lost for a conversion of an image! to binary!.)
If you want to preserve that information, you must use mold.
TOC < Back Next > | REBOL.com - WIP Wiki | Feedback Admin |