REBOL 3 Concepts: Parsing: Evaluation

Pending Revision

This document was written for R2 and has yet to be revised for R3.

Normally, you parse a string to produce some result. You want to do more than just verify that the string is valid, you want to do something as it is parsed. For instance, you may want to pick out substrings from various parts of the string, create blocks of related values, or compute a value.

Return Value

Return Value

The examples in previous chapters showed how to parse strings, but no results were produced. This is only done to verify that a string has the specified grammar; the value returned from parse indicates its success. The following examples show this:

probe parse "a b c" ["a" "b" "c"]
true

probe parse "a b" ["a" "c"]
false

The parse function returns true only if it reaches the end of the input string. An unsuccessful match stops the parse of the series. If parse runs out of values to search for before reaching the end of the series, it does not traverse the series and returns false:

probe parse "a b c d" ["a" "b" "c"]
false

probe parse "a b c d" [to "b" thru "d"]
true

probe parse "a b c d" [to "b" to end]
true

Expressions in Rules

Within a rule, you can include a REBOL expression to be evaluated when parse reaches that point in the rule. Parentheses are used to indicate such expressions:

string: "there is a phone in this sentence"
probe parse string [
    to "a"
    to "phone" (print "found phone")
    to end
]
found phone
true

The example above parses the string a phone and prints the message found phone after the match is complete. If the strings a or phone are missing and the parse can not be done, the expression is not evaluated.

Expressions can appear anywhere within a rule, and multiple expressions can occur in different parts of a rule. For instance, the following code prints different strings depending on what inputs were found:

parse string [
    "a" | "the"
    to "phone" (print "answer") |
    to "radio" (print "listen") |
    to "tv"    (print "watch")
]
answer

string: "there is the radio on the shelf"

parse string [
    "a" | "the"
    to "phone" (print "answer") |
    to "radio" (print "listen") |
    to "tv"    (print "watch")
]
listen

Here is an example that counts the number of times the HTML pre-format tag appears in a text string:

count: 0
page: read http://www.rebol.com/docs/dictionary.html
parse page [any [thru <pre> (count: count + 1)]]
print count
777

Copying the Input

The most common action done with parse is to pick up parts of the string being parsed. This is done with copy, and it is followed by the name of a variable to which you want to copy the string. The following example parses the title of a web page:

parse page [thru <title> copy text to </title>]
print text
REBOL/Core Dictionary

The example works by skipping over text until it finds the <title> tag. That's where it starts making a copy of the input stream and setting a variable called text to hold it. The copy operation continues until the closing <title> tag is found.

The copy action also can be used with entire rule blocks. For instance, for the rule:

[copy heading ["H" ["1" | "2" | "3"]]

the heading string contains the entire H1, H2, or H3 string. This also works for large multi-block rules.

Marking the Input

The copy action makes a copy of the substring that it finds, but that is not always desirable. In some cases, it is better to save the current position of the input stream in a variable.

NOTE: The copy word as used in parse is different from the copy function used in REBOL expressions. Parse uses a dialect of REBOL, and copy has a different meaning within that dialect.

In the following example, the begin variable holds a reference to the page input string just after <title>. The ending refers to the page string just before >/title<. These variables can be used in the same way as they would be used with any other series.

parse page [
    thru <title> begin: to </title> ending:
    (change/part begin "Word Reference Guide" ending)
]

You can see the above parse expression actually changed the contents of the title:

parse page [thru <title> copy text to </title>]
print text
Word Reference Guide

Here is another example that marks the position of every table tag in an HTML file:

page: read http://www.rebol.com/index.html
tables: make block! 20
parse page [
    any [to "<table" mark: thru ">"
        (append tables index? mark)
    ]
]

The tables block now contains the position of every tag:

foreach table tables [
    print ["table found at index:" table]
]
table found at index: 836
table found at index: 2076
table found at index: 3747
table found at index: 3815
table found at index: 4027
table found at index: 4415
table found at index: 6050
table found at index: 6556
table found at index: 7229
table found at index: 8268

NOTE: The current position in the input string can also be modified. The next section explains how this is done.

Modifying the String

Now that you know how to obtain the position of the input series, you also can use other series functions on it, including insert, remove, and change. To write a script that replaces all question marks (?) with exclamation marks (!), write:

str: "Where is the turkey? Have you seen the turkey?"
parse str [some [to "?" mark: (change mark "!") skip]]
print str
Where is the turkey! Have you seen the turkey!

The skip at the tail advances the input over the new character, which is not necessary in this case, but it is a good practice.

As another example, to insert the current time everywhere the word time! appears in some text, write:

str: "at this time, I'd like to see the time change"
parse str [
    some [to "time"
        mark:
        (remove/part mark 4  mark: insert mark now/time)
        :mark
    ]
]
print str
at this 14:42:12, I'd like to see the 14:42:12 change

Notice the :mark word used above. It sets the input to a new position. The insert function returns the new position just past the insert of the current time. The set-word :mark is used to set the input to that position.

Using Objects

When parsing large grammar from a set of rules, variables are used to make the grammar more readable. However, the variables are global and may become confused with other variables that have the same name somewhere else in the program.

The solution to this problem is to use an object to make all the rule words local to a context. For instance:

tag-parser: make object! [
    tags: make block! 100
    text: make string! 8000
    html-code: [
        copy tag ["<" thru ">"] (append tags tag) |
        copy txt to "<" (append text txt)
    ]
    parse-tags: func [site [url!]] [
        clear tags clear text
        parse read site [to "<" some html-code]
        foreach tag tags [print tag]
        print text
    ]
]
tag-parser/parse-tags http://www.rebol.com

Debugging

As rules are written, there are times debugging is needed. Specifically, you may want to know how far you got in the parsing of a rule.

The trace function can be used to watch the parse operation progress, but this can output thousands of lines that are difficult to review.

A better way is to insert debugging expressions into the parse rules. As an example, to debug the rule:

[to "<IMG" "SRC" "=" filename ">"]

insert a the print function after key sections to monitor your progress through the rule:

[to "<IMG" (print 1) "SRC" "=" (print 2)
    filename (print 3) ">"]

This example prints 1, 2, and 3 as the rule is processed.

Another approach is to print out part of the input string as the parse happens:

[
   to "<IMG" here: (print here)
   "SRC" "=" here: (print here)
    filename here: (print here) ">"
]

If this is done often, you can create a rule for it:

here: [where: (print where)]

[
   to "<IMG" here
   "SRC" "=" here
    filename here ">"
]

The copy function can also be used to indicate what substrings were parsed as the rule was handled.

REBOL 3 Concepts: Parsing: Evaluation

Contents

Return Value

Expressions in Rules

Copying the Input

Marking the Input

Modifying the String

Using Objects

Debugging