Chapter 13 - Network Protocols
REBOL/Core Users Guide Main Table of Contents Send Us Feedback
Contents:
1. Overview
2. REBOL Networking Basics
2.1 Modes of Operation
2.2 Specifying Network Resources
2.3 Schemes, Handlers, and Protocols
2.4 Monitoring Handlers
3. Initial Setup
3.1 Basic Network Settings
3.2 Proxy Settings
3.3 Other Settings
3.4 Access to Settings
4. DNS - Domain Name Service
5. Whois Protocol
6. Finger Protocol
7. Daytime - Network Time Protocol
8. HTTP - Hyper Text Transfer Protocol
8.1 Reading a Web Page
8.2 Scripts on Web Sites
8.3 Loading Markup Pages
8.4 Other Functions
8.5 Acting Like a Browser
8.6 Posting CGI Requests
9. SMTP - Simple Mail Transport Protocol
9.1 Sending Email
9.2 Multiple Recipients
9.3 Bulk Mail
9.4 Subject Line and Headers
9.5 Debug Your Scripts
10. POP - Post Office Protocol
10.1 Reading Email
10.2 Removing Email
10.3 Handling Email Headers
11. FTP - File Transfer Protocol
11.1 Using FTP
11.2 FTP URLs
11.3 Transferring Text Files
11.4 Transferring Binary Files
11.5 Appending to Files
11.6 Reading Directories
11.7 File Information
11.8 Making Directories
11.9 Deleting Files
11.10 Renaming Files
11.11 About Passwords
11.12 Transferring Large Files
12. NNTP - Network News Transfer Protocol
12.1 Reading the Newsgroup List
12.2 Reading All Messages
12.3 Reading Single Messages
12.4 Handling News Headers
12.5 Sending a News Message
13. CGI - Common Gateway Interface
13.1 CGI Server Setup
13.2 CGI Scripts
13.3 Generating HTML Content
13.4 CGI Environment
13.5 CGI Requests
13.6 Processing HTML Forms
14. TCP - Transmission Control Protocol
14.1 Creating Clients
14.2 Creating Servers
14.3 A Tiny Server
14.4 Testing TCP Code
15. UDP - User Datagram Protocol
1. Overview
REBOL includes several of the primary Internet service protocols built-in.
These protocols are easy to use within your scripts; they require no extra
libraries or include files, and many useful operations can be done with only a
single line of source code.
The protocols listed in Network Protocols
are supported:
| DNS | Domain Name Service: translates computer names into addresses and addresses into names.
|
| Finger | Obtains information about a user from their profile.
|
| Whois | Obtains information about domain registration.
|
| Daytime | Network Time Protocol. Gets the time from a server.
|
| HTTP | Hypertext Transfer Protocol. Used for the Web.
|
| SMTP | Simple Mail Transfer Protocol. Used for sending email.
|
| POP | Post Office Protocol. Used for fetching email.
|
| FTP | File Transfer Protocol. Exchanges files with a server.
|
| NNTP | Network News Transfer Protocol. Posts or reads Usenet news.
|
| TCP | Transmission Control Protocol. Basic Internet protocol.
|
| UDP | User Datagram Protocol. Packet-based protocol.
|
In addition, you can create handlers for other Internet
protocols or make your own custom protocols.
2. REBOL Networking Basics
2.1 Modes of Operation
There are two basic modes of network operation: atomic and port-based.
Atomic network operations are those that are accomplished in
a single function. For instance, you can read an entire Web page with a single
call to the read function. There is no need to separately open a connection or
set up the read. All of that is done automatically as part of the read. For
example, you can type:
print read http://www.rebol.com
The host is found and opened, its Web page transferred, and the connection
closed.
The port-based mode of operation is one that uses a more
traditional programming approach. It involves opening a port and performing
various series operations on the port. For instance, if you want to read your
email from a POP server one message at a time, you would use this method. Here
is an example that reads and displays all of your email:
pop: open pop://user:pass@mail.example.com
forall pop [print first pop]
close pop
The atomic method of operation is easier, but it is also more limited. The
port-based method allows more types of operations, but also requires a greater
understanding of networking.
2.2 Specifying Network Resources
REBOL provides two approaches for specifying network resources: URLs and port
specifications.
Uniform Resource Locators (URL) are used on the Internet to
identify a network resource, such as a Web page, FTP site, email address, file,
or other resource or service. URLs are integral to the operation of REBOL, and
they can be expressed directly in the language.
The standard notation for URLs consists of a scheme followed
by a specification:
scheme:specification
The scheme is often the name of a protocol, such as HTTP, FTP, SMTP, and POP;
however, that is not a requirement. A scheme can be any name that identifies the
method used to access a resource.
The format of a scheme's specification depends on the scheme; however, most
schemes share a common format for identifying network hosts, user names,
passwords, port numbers, and file paths. Here are a few commonly used
formats:
scheme://host
scheme://host:port
scheme://user@host
scheme://user:pass@host
scheme://user:pass@host:port
scheme://host/path
scheme://host:port/path
scheme://user@host/path
scheme://user:pass@host/path
scheme://user:pass@host:port/path
Network Resource Specification lists the
fields used in the above formats.
| scheme | The name used to identify the type of resource, often
the same as the protocol. For example, HTTP, FTP, and POP.
|
| host | The network name or address for a machine. For example,
www.rebol.com, cnn.com, accounting.
|
| port | Port number on the host machine for the scheme being
used. Normally there is a default for this, so it is not
required most of the time. Examples: 21, 23, 80, 8000.
|
| user | A user name to access the resource.
|
| pass | A password to verify the user name.
|
| path | A file path or some other method for referencing the
resource. This is scheme dependent. Some schemes include
patterns and script arguments (such as CGI).
|
Another way to identify a resource is with a REBOL port
specification. In fact, when a URL is used, it is automatically
converted into a port specification. A port specification can accept many more
arguments than a URL, but it requires multiple lines to express.
A port specification is written as an object block definition that provides
each of the parameters necessary to access the network resource. For instance,
the URL to access a Web site is:
read http://www.rebol.com/developer.html
but, it can also be written as:
read [
scheme: 'HTTP
host: "www.rebol.com"
target: %/developer.html
]
The URL for an FTP read can be:
read ftp://bill:vbs@ftp.example.com:8000/file.txt
but, it can also be written as:
read [
scheme: 'FTP
host: "ftp.example.com"
port-id: 8000
target: %/file.txt
user: "bill"
pass: "vbs"
]
In addition, there are many other port fields that can be specified, such as
timeout, type of access, and security.
2.3 Schemes, Handlers, and Protocols
REBOL networking operates by using schemes to identify
handlers that communicate with protocols.
In REBOL a scheme is used to identify the method of accessing
a resource. That method uses a code object that is called a
handler. Each of the URL schemes that are supported by REBOL
(such as HTTP, FTP) has a handler. The list of schemes can be obtained with:
probe next first system/schemes
[default Finger Whois Daytime SMTP POP HTTP FTP NNTP]
In addition, there are lower level scheme names that are not shown here. For
instance, the TCP and UDP schemes are used for direct, lower level
communication.
New schemes can be added to this list. For instance, you can define your own
scheme, called FTP2, that provides special features for FTP access, such as
automatically supplying your username and password so it does not need to be
included in every FTP URL.
Most handlers are used to provide an interface to a network protocol. A
protocol is used to communicate between various devices,
including clients and servers.
Although each protocol is quite different in how it communicates, it does
have some things in common with other protocols. For instance, most protocols
require a network connection to be opened, read, written, and closed. These
common operations are performed by a default handler in REBOL. This handler
makes protocols like finger, whois, and daytime almost trivial to implement.
Scheme handlers are written as objects. The default handler serves as the
root object for all the other handlers. When a handler requires a particular
field, such as a timeout value to use for reading data, if the value is not
defined in the specific handler, it will be provided by the default handler.
Hence, handlers overlay one another with their fields and value. You can also
create handlers that use other handlers for default values. For instance, you
can create an FTP2 handler that looks for missing fields first in the FTP
handler, then in the default handler.
When a port is used to access network resources, it is linked to a specific
handler. The handler and the port together form the unit that is used to provide
the data, code, and state information to process all protocols.
The source code to handlers can be obtained from the system/scheme object.
This can be useful if you want to modify the behavior of a handler or build your
own handler. For instance, to view the code for the whois handler, type:
probe get in system/schemes 'whois
Note that what you are seeing is a composite of the default handler with the
whois handler. The actual source code that is used to create the whois handler
is only a few lines:
make Root-Protocol [
open-check: [[any [port/user ""]] none]
net-utils/net-install Whois make self [] 43
]
2.4 Monitoring Handlers
For debugging purposes, you can monitor the actions of any handler. Each
handler has its own debugging output to indicate what operations are being
performed. To enable network debugging, turn network tracing on with the
line:
trace/net on
To turn network debugging off, use:
trace/net off
Here is an example:
read pop://carl:poof@zen.example.com
URL Parse: carl poof zen.example.com none none none
Net-log: ["Opening tcp for" POP]
connecting to: zen.example.com
Net-log: [none "+OK"]
Net-log: {+OK QPOP (version 2.53) at zen.example.com starting.}
Net-log: [["USER" port/user] "+OK"]
Net-log: "+OK Password required for carl."
Net-log: [["PASS" port/pass] "+OK"]
** User Error: Server error: tcp -ERR Password supplied for "carl"
is incorrect.
** Where: read pop://carl:poof@zen.example.com
3. Initial Setup
REBOL networking is built-in. To create scripts
that use the network protocols you do not need any special include files or
libraries. The only requirement is that you provide the basic information
necessary to enable protocols to connect to servers or through firewalls and
proxies. For instance, to send an email, the SMTP protocol needs an SMTP server
name and a reply email address.
3.1 Basic Network Settings
When you run REBOL the first time, you re prompted for the necessary network
settings, which is stored in the user.r file. REBOL uses this file to
load the required network settings each time it is started. If a user.r
is not created and REBOL cannot find an existing user.r file in its
paths, no settings are loaded. See the chapter on Operation
for more information.
To change the network settings, type set-user at the prompt.
This runs the same network configuration script that ran when REBOL first
started. This script is loaded from the rebol.r file. If that file
cannot be found, or if you want to edit the setting directly, you can use a text
editor on the user.r file.
Within the user.r file the network settings are found in a block
that follows the set-net function. At a minimum the block
should contain two items:
- Your email address for use in the from and reply fields of email and for
anonymous FTP login
- Your default server; this is also your primary email server
In addition, you can specify a few other items:
- A different incoming email server (for POP)
- A proxy server (for connecting to the network)
- A proxy port number
- A proxy type (see Proxy Settings below).
You can also add lines after the set-net function to
configure other protocol values. For instance you can set the timeout values for
protocols, set the FTP passive mode, set the HTTP user-agent identifier, set up
separate proxies for different protocols, and more.
An example of set-net is:
set-net [user@domain.dom mail.server.dom]
The first field specifies your email from address, and the second field
indicates your default server (notice that it does not need quotes here). For
most networks, this is enough and no other settings are necessary (unless you
require a proxy). Also your default server is used whenever a specific server is
not provided.
In addition, if you use a POP server (for incoming email) that is different
from your SMTP server (for outgoing email), you can specify that as well:
set-net [
user@domain.dom
mail.server.dom
pop.server.dom
]
However, if your SMTP and POP servers are the same, then this is not
necessary.
3.2 Proxy Settings
If you use a proxy or firewall, you can provide the set-net function with
your proxy settings. This can include the proxy server name or address, a proxy
port number to access the server, and an optional proxy type. For example:
set-net [
email@addr
mail.example.com
pop.example.com
proxy.example.com
1080
socks
]
This example would use a proxy called proxy.example.com on its TCP
port 1080 with the socks proxy method. To use a socks4 proxy server, use the
word socks4 rather than socks. To use the
generic CERN server, use the word generic.
You can also set the proxy to be different machines for different schemes
(protocols). Each protocol has its own proxy object where you can set the proxy
values for just that scheme. Here is an example of setting a proxy for FTP:
system/schemes/ftp/proxy/host: "proxy2.example.com"
system/schemes/ftp/proxy/port-id: 1080
system/schemes/ftp/proxy/type: 'socks
In this case, only FTP uses a special proxy server. Notice that the machine
name must be a string and the proxy type must be a literal word.
Here are two more examples. The first example sets the proxy for HTTP to be
the generic (CERN) proxy method:
system/schemes/http/proxy/host: "wp.example.com"
system/schemes/http/proxy/port-id: 8080
system/schemes/http/proxy/type: 'generic
In the above example, all HTTP requests go through a generic proxy on
wp.example.com using TCP port 8080.
If you want to disable the proxy settings for a particular scheme, you can
set the proxy fields to false.
system/schemes/smtp/proxy/host: false
system/schemes/smtp/proxy/port-id: false
system/schemes/smtp/proxy/type: false
In the above example, all outgoing email does not go through a proxy. The
false value prevents even the default proxy from being used. If you set
these fields to none, then the default proxy is used if it is
configured.
If you want to bypass the proxy settings for particular machines, such as
those on your local network, you can provide a bypass list. Here is a bypass
list for the default proxy:
system/schemes/default/proxy/bypass:
["host.example.net" "*.example.com"]
Note that the asterisk (*) and question mark (?) characters can be used for
pattern matching. The asterisk (*) as used in the example above bypasses any
machine that ends with example.com.
To set a bypass list for only the HTTP scheme, type:
system/schemes/http/proxy/bypass:
["host.example.net" "*.example.com"]
3.3 Other Settings
In addition to proxy settings, you can set network timeout values for all of
the schemes (in the default) or for specific schemes. For instance, to increase
the timeout for all schemes, you can write:
system/schemes/default/timeout: 0:05
This sets the network timeout for 5 minutes.
If you want to increase the timeout just for SMTP, you would write:
system/schemes/smtp/timeout: 0:10
Some schemes have custom fields. For instance, the FTP scheme allows you to
set passive mode for all transfers:
system/schemes/ftp/passive: on
FTP passive mode is useful because FTP servers that are set to passive mode
do not attempt to connect back through your firewall.
When making HTTP accesses to Web sites, you may want to use a different
user-agent field in the HTTP request to get better results on a few sites that
detect the browser type:
system/schemes/http/user-agent: "Mozilla/4.0"
3.4 Access to Settings
Each time REBOL is started, it reads the user.r file to find its
network settings. These settings are made with the set-net
function. Scripts have access to these settings through the
system/schemes object.
system/user/email ; used for email from and reply
system/schemes/default/host - your primary server
system/schemes/pop/host - your POP server
system/schemes/default/proxy/host - proxy server
system/schemes/default/proxy/port-id - proxy port
system/schemes/default/proxy/type - proxy type
Below is a function that returns a block containing the network settings in
the same order as set-net accepts them:
get-net: func [][
reduce [
system/user/email
system/schemes/default/host
system/schemes/pop/host
system/schemes/default/proxy/host
system/schemes/default/proxy/port-id
system/schemes/default/proxy/type
]
]
probe get-net
4. DNS - Domain Name Service
DNS is the network service that translates domain names to their associated
IP address. In addition, you can use DNS to find a machine and domain name from
an IP address.
The DNS protocol can be used in three ways: you can lookup the primary IP
address of a machine name, you can lookup the domain name for an IP address, and
you can find the name and IP address of your local system.
To lookup the primary IP address of a specific machine within a specific
domain, type:
print read dns://www.rebol.com
207.69.132.8
You can also obtain the domain name that is associated with a particular IP
address:
print read dns://207.69.132.8
rebol.com
Note that it is not unusual for this reverse DNS lookup to return a none.
There are machines that do not have host names.
print read dns://11.22.33.44
none
To find your system's host name, read an empty DNS URL of the form:
print read dns://
crackerjack
The data returned here depends on the type of machine. It may be the
unqualified host name, as shown above, but it can also be the fully-qualified
host name, crackerjack.example.com. This depends on the operating
system and the network configuration in the operating system.
Here's an example that looks up and prints the IP addresses for a number of
Web sites:
domains: [
www.rebol.com
www.rebol.org
www.mochinet.com
www.sirius.com
]
foreach domain domains [
print ["address for" domain "is:"
read join dns:// domain]
]
address for www.rebol.com is: 207.69.132.8
address for www.rebol.org is: 207.66.107.61
address for www.mochinet.com is: 216.127.92.70
address for www.sirius.com is: 205.134.224.1
5. Whois Protocol
The whois protocol retrieves information about domain names from a central
registry. The whois service is provided by the organizations that run the
Internet. Whois is often used to retrieve registration information about an
Internet domain or server. It can tell you who owns the domain, how their
technical contact can be reached, along with other information.
To obtain information, use the read function with a whois
URL. This URL should contain the domain name and a whois server name separated
by an at sign (@). For example to obtain information about example.com
from the Internic registry:
print read whois://example.com@rs.internic.net
connecting to: rs.internic.net
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be
registered with many different competing registrars. Go to
http://www.internic.net for detailed information.
Domain Name: EXAMPLE.COM
Registrar: NETWORK SOLUTIONS, INC.
Whois Server: whois.networksolutions.com
Referral URL: www.networksolutions.com
Name Server: NS.ISI.EDU
Name Server: VENERA.ISI.EDU
Updated Date: 17-aug-1999
>>> Last update of whois database: Sun, 16 Jul 00 03:16:34 EDT <<<
The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains
and Registrars.
The above code is only an example. The details of the
information being returned and the servers that support whois
change over time.
If instead of a domain name you provide a word, all entries that match that
word are returned:
print read whois://example@rs.internic.net
connecting to: rs.internic.net
Whois Server Version 1.1
Domain names in the .com, .net, and .org domains can now be
registered with many different competing registrars. Go to
http://www.internic.net for detailed information.
EXAMPLE.512BIT.ORG
EXAMPLE.ORG
EXAMPLE.NET
EXAMPLE.EDU
EXAMPLE.COM
To single out one record, look it up with "xxx", where xxx is one
of the of the records displayed above. If the records are the same, look them
up with "=xxx" to receive a full display for each record.
>>> Last update of whois database: Sun, 16 Jul 00 03:16:34 EDT <<<
The Registry database contains ONLY .COM, .NET, .ORG, .EDU domains
and Registrars.
The whois protocol does not accept URLs, such as
www.example.com, unless the URL is part of the registrant's
company name.
6. Finger Protocol
The finger protocol retrieves user-specific information stored in the user
log file.
To request user information from a server it must be running the finger
protocol. The information is requested by reading a finger URL that contains a
username and a domain name in an email style format:
print read finger://username@example.com
The above example retrieves information about the user at
username@example.com. The information returned depends on the
information provided by the user and the settings of the finger server. Also,
the details of the information being returned are up to each server; the
examples below only describe typical servers. Many servers can have non-standard
behaviors on their finger ports.
For instance, the following information may be returned:
Login: username
Name: Firstname Lastname
Directory: /home/user
Shell: /usr/local/bin/tcsh
Office: City, State +1 555 555 5555
Last login Wed Jul 28 01:10 (PDT) on ttyp0 from some.example.com
No Mail.
No Plan.
Notice that finger reports when the user last logged in from a machine, and
whether the user has mail waiting. If the user reads email from this account,
finger sometimes reports when mail was received and when the user last retrieved
email:
New mail received Sun Sep 26 11:39 1999 (PDT)
Unread since Tue Sep 21 04:45 1999 (PDT)
The finger server can also report the contents of a plan file and a
project file if they exist. Users can include any information they want
in a plan or project file.
It is also possible to retrieve information about users using their real
first name or their last name. Some finger servers require that you capitalize
the names exactly as they appear in the login file or in the file used by the
online finger server, to retrieve user information. Other finger servers are
more liberal about capitalization. A finger server will respond to real name
queries by returning all listings that match the query criteria. For instance,
if there are several users on a host that have the first name zaphod,
then entering the query
print read finger://Zaphod@main.example.com
will retrieve all such users whose first or last name is Zaphod.
Some finger servers return a listing of users when the user name is omitted.
For example,
print read finger://main.example.com
retrieves a list of all users who are logged onto the machine, if the finger
service installed on the hosting machine allows it.
Some host machines limit finger services for security reasons. They may
require a valid username and only return information regarding that user. If you
finger such a server without providing user information, the server will report
that it requires specific user information.
If a system does not support the finger protocol, REBOL reports an access
error:
print read finger://host.dom
connecting to: host.dom
Access Error: Cannot connect to host.dom.
Where: print read finger://host.dom
7. Daytime - Network Time Protocol
The daytime protocol retrieves the current day and time. To connect to a
daytime server use read with a daytime URL. The URL contains the name of
the server to read the date from:
print read daytime://everest.cclabs.missouri.edu
Fri Jun 30 16:40:46 2000
The format of the information returned by servers may vary, depending on the
server. Notice that the time zone may not be present.
If the server you choose does not support daytime, REBOL returns an
error:
print read daytime://www.example.com
connecting to: www.example.com
** Access Error: Cannot connect to www.example.com.
** Where: print read daytime://www.example.com
8. HTTP - Hyper Text Transfer Protocol
The world wide Web is driven by two fundamental technologies: HTTP and HTML.
HTTP is the Hypertext Transfer Protocol that controls how Web servers and Web
browsers communicate with each other. HTML is the Hypertext Markup Language that
defines the structure and contents of a Web page.
To retrieve a Web page, the browser sends a request to a Web server using
HTTP. On receiving the request, the server interprets it, sometimes using a CGI
script (see CGI -
Common Gateway Interface), and sends back data. This data can be just about
anything, including HTML, text, images, programs, and sound.
8.1 Reading a Web Page
To read a Web page, use the read function with an HTTP URL.
For example:
page: read http://www.rebol.com
This returns the Web page for www.rebol.com. Note that a string
that contains the HTML code for the page is returned by the read. No graphics or
other information are fetched. To do so you would need to provide additional
reads. The page can be displayed as HTML code using print, it
can be written to a file with write, or it can be sent as
email using send.
print page
write %index.html page
send zaphod@example.com page
The page can be processed in a variety of ways by using a variety of REBOL
functions, such as parse, find, and
load.
For instance, to search a Web page for all occurrences of the word
REBOL, you can write:
parse read http://www.rebol.com [
any [to "REBOL" copy line to newline (print line)]
]
8.2 Scripts on Web Sites
A Web server can provide more than just HTML scripts. Web servers are quite
useful for supplying REBOL scripts as well.
You can load REBOL scripts directly from a Web server with
load:
data: load http://www.rebol.com/data.r
You can also evaluate scripts directly from a server with do:
data: do http://www.rebol.com/code.r
Warning
Do this with care. Evaluating arbitrary scripts on open Internet
servers is asking for trouble. Evaluate a script only if you
completely trust its source, have fully inspected its source, or
have kept your REBOL security settings on maximum.
In addition, Web pages that contain HTML can contain embedded REBOL scripts,
and they can be run with:
data: do http://www.rebol.com/example.html
To determine if a script exists on a page before evaluating it, use the
script? function.
if page: script? http://www.rebol.com [do page]
The script? function reads the page from the Web site and
returns the page at its REBOL header position.
8.3 Loading Markup Pages
HTML and XML pages can be quickly converted to a REBOL block with the
load/markup function. This function returns a block that
consists of all the tags and strings found within the page. All spacing and line
breaks are left intact.
To filter out all of the tags for a Web page and just print its text,
type:
tag-text: load/markup http://www.rebol.com
text: make string! 2000
foreach item tag-text [
if string? item [append text item]
]
print text
You could then search this text for string patterns. It will contain all of
the spaces and line breaks of the original HTML file.
Here's another example that checks all links found on a Web page to make sure
that the pages they reference exist:
REBOL []
page: http://www.rebol.com/developer.html
set [path target] split-path page
system/options/quiet: true ; turn off connection msgs
tag-text: load/markup page
links: make block! 100
foreach tag tag-text [ ; find all anchor href tags
if tag? tag [
if parse tag [
"A" thru "HREF="
[{"} copy link to {"} | copy link to ">"]
to end
][
append links link
]
]
]
print links
foreach link unique links [ ; try each link
if all [
link/1 <> #"#"
any [flag: not find link ":"
find/match link "http:"]
][
link: either flag [path/:link][to-url link]
prin [link "... "]
print either error? try [read link]
["failed"]["OK"]
]
]
8.4 Other Functions
To check if a Web page exists, use the exists? function,
which returns true if the page exists.
if exists? http://www.rebol.com [
print "page still there"
]
Note: It is usually faster in many cases to just read the
page rather than checking first to see if it exists. Otherwise
the script must contact the server twice, and that can be time
consuming.
To request the date on which a Web page was last modified, use the
modified? function:
print modified? http://www.rebol.com/developer.html
However, note that not all Web servers provide modification date information.
Dynamically generated Web pages typically do not return a modification date.
Another way to determine if a Web page has changed is poll it every so often
and check it. A handy way to verify that a Web page has changed is by using the
checksum function. If the previously calculated checksum of a
Web page differs from its current value, then the Web page has been modified
since it was last checked. Here is an example that uses this technique. It
checks a page every eight hours.
forever [
page: read http://www.rebol.com
page-sum: checksum page
if any [
not exists? %page-sum
page-sum <> (load %page-sum)
][
print ["Page changed" now]
save %page-sum page-sum
send luke@rebol.com page
]
wait 8:00
]
Whenever the page changes, it is sent to Luke via email.
8.5 Acting Like a Browser
Normally, REBOL identifies itself to a server when it reads from a Web site.
However, some servers are programmed to respond to particular browsers only. If
a request to a server does not produce the correct Web page, you can change the
request to make it look like it came from some other type of Web browser.
Pretending to be a Web browser is done by many programs to get Web sites to
respond correctly. However, this practice does end up defeating the purpose
behind the browser identification.
To change HTTP requests to look as though they are being sent by Netscape
4.0, you can modify the user-agent within the HTTP handler:
system/options/http/user-agent: "Mozilla/4.0"
Setting this variable affects all HTTP requests that follow.
8.6 Posting CGI Requests
HTTP CGI requests can be posted in two ways. You can include the CGI request
data in the URL or you can provide the request data through an HTTP post
operation.
The URL CGI request uses a normal URL. The example below sends the CGI script
test.r the data value of 10.
read http://www.example.com/cgi-bin/test.r?data=10
The post CGI request requires that you supply the CGI data as part of a
custom refinement to the read function. The example below shows
how data is posted to CGI:
read/custom http://www.example.com/cgi-bin/test.r [
post "data: 10"
]
In this example, the /custom refinement is used to provide
additional information to the read. The second argument is a block that begins
with the word post and is followed by the string to send.
The post method is useful for easily sending REBOL code and data to a web
server that runs CGI. The following example illustrates this:
data: [sell 10 shares of "ACME" at $123.45]
read/custom http://www.example.com/cgi-bin/test.r reduce [
`post mold data
]
The mold function will produce the proper REBOL string to be sent to the
server.
9. SMTP - Simple Mail Transport Protocol
The Simple Mail Transport Protocol (SMTP) controls the transfer of email
messages on the Internet. SMTP defines the interaction between Internet hosts
that participate in forwarding email from a sender to its destination.
9.1 Sending Email
Email is sent through SMTP by using the send function. This
function can send an email message to one or more email addresses.
For send to operate correctly, your networking must be set
up. The send function requires that you specify your email From
address and your default email server. See Initial Setup above.
The send function takes two arguments: an email address and
a message. For example:
send user@example.com "Hi from REBOL"
The first argument must be an email or block data type. The second argument
can be any data type.
send luke@rebol.com $1000.00
send luke@rebol.com 10:30:40
send luke@rebol.com bill@ms.dom
send luke@rebol.com [Today 9-Apr-99 10:30]
Each of these simple email messages can be interpreted on the receiver's side
(with REBOL) or viewed with a normal email program.
You can send an entire file by reading the file and passing it as the second
argument to the send function:
send luke@rebol.com read %task.txt
Binary data, such as an image or executable file, can also be sent:
send luke@rebol.com read/binary %rebol
The binary data is encoded to allow it to be transferred as text.
To send a self-extracting binary message you can write:
send luke@rebol.com join "REBOL for the job" [
newline "REBOL []" newline
"write/binary %rebol decompress "
compress read/binary %rebol
]
When the message is received, the file can be extracted by using the
do function.
9.2 Multiple Recipients
To send to multiple recipients, you can provide a block of email names:
send [luke@rebol.com ben@example.com] message
In this case, each message is individually addressed with only the
recipient's email name appearing in the To field (similar to BCC
addressing).
The block of email addresses can be any size or even a file that you load.
Just be sure that they are valid addresses, not strings. Strings are
ignored.
friends: [
bob@cnn.dom
betty@cnet.dom
kirby@hooya.dom
belle@apple.dom
...
]
send friends read %newsletter.txt
9.3 Bulk Mail
If you are sending email to a large group, you can reduce the load on your
server by delivering everyone in the group a single message. This is the purpose
of the /only refinement. It uses a feature of SMTP to send only
one message to multiple email addresses. Using the friends list from the
previous example:
send/only friends message
The messages are not individually addressed. You may have seen this mode in
some of the bulk email that you receive. When you receive bulk email, your
address does not appear in the To field.
The bulk email mode of SMTP should be used for email lists and
not for sending spam. Spam email is not proper network
etiquette, it is illegal in some countries and states, and spam
will get you banned from your ISP and from other sites.
9.4 Subject Line and Headers
By default the send function uses the first line of a
message as the subject line. To provide your own subject line, you need to
supply an email header to the send function. In addition to a
subject line, you can provide an organization, date, CC, and even your own
custom fields.
To include a header, use the /header refinement of the
send function and include a header object. The header object
must be made from the system/standard/email object. For example:
header: make system/standard/email [
Subject: "Seen REBOL yet?"
Organization: "Freedom Fighters"
]
Notice that the standard fields, such as the From address, are not
required and are supplied automatically by the send
function.
The header is then provided as an argument to send/header:
send/header friends message header
The email above is sent using the custom header for each message.
9.5 Debug Your Scripts
When testing email scripts, it is advised that you send email to yourself
first, before sending it to others. Examine your test email carefully to make
sure that it is what you want. It is common to have errors such as sending a
file name rather than the file contents. For instance, you might write:
send person %the-data-file.txt
This sends the name of the file, not the file itself.
10. POP - Post Office Protocol
The Post Office Protocol (POP) allows you to fetch email that is waiting in a
mail server mailbox. POP defines a number of operations for how to access and
store email on your server.
10.1 Reading Email
You can read all of your email in a single line without removing it from the
email server: This is done by reading from a POP URL in which you provided your
username, password, and email host.
mail: read pop://user:pass@mail.example.com
The messages are returned as a block of strings which you can handle one
message at a time using code such as:
foreach message mail [print message]
To read individual email messages from the server, you need to open a port
connection to the server and handle each message one at a time. To open the POP
port:
mailbox: open pop://user:pass@mail.example.com
In the example, mailbox can be accessed as a series. It
responds to many of the standard series functions, such as
length?, first, second, third, pick, next,
back, head, tail, head?, tail?, remove, and
clear.
To determine the number of mail messages residing on the server, use the
length? function.
print length? mailbox
37
In addition, you can find out the total size of all messages and the
individual sizes of messages with:
print mailbox/locals/total-size
print mailbox/locals/sizes
To display the first, second, and last messages, you can write:
print first mailbox
print second mailbox
print last mailbox
You can also use pick to fetch a specific message:
print pick mailbox 27
You can fetch and display each message from the oldest to the newest using a
loop that is identical to that used for other types of series:
while [not tail? mailbox] [
print first mailbox
mailbox: next mailbox
]
You can also read your email from newest to oldest with a loop such as:
mailbox: tail mailbox
while [not head? mailbox] [
mailbox: back mailbox
print first mailbox
]
When you are done, be sure to close the mailbox. This can be
done with a line such as:
close mailbox
10.2 Removing Email
As with series, the remove function can be used to delete a
single message, and the clear function can be used to delete
all of the messages from the current position to the end of the mailbox.
For example, to read a message, save it to a file, and remove it from the
server:
mailbox: open pop://user:pass@mail.example.com
write %mail.txt first mailbox
remove mailbox
close mailbox
The message is removed from the server when the close is
done.
To remove the 22nd email message from the server, you can write:
user:pass@mail.example.com
remove at mailbox 22
close mailbox
You can remove a number of messages by using the /part
refinement with the remove function:
remove/part mailbox 5
To remove all of the messages in your mailbox, use the clear
function:
mailbox: open pop://user:pass@example.com
clear mailbox
close mailbox
The clear function can also be used at different positions
within the mailbox to remove messages to the end of the mailbox.
10.3 Handling Email Headers
Email messages always include a header. The header holds information such as
the sender, recipient, subject, date, and other fields.
In REBOL email headers are handled as objects that contain all of the
necessary fields. To convert email message to a header object you can use the
import-email function. For example:
msg: import-email first mailbox
print first msg/from ; the email address
print msg/date
print msg/subject
print msg/content
You can easily write a filter that scans your email for messages that begin
with a particular subject line:
mailbox: open pop://user:pass@example.com
while [not tail? mailbox] [
msg: import-email first mailbox
if find/match msg/subject "[REBOL]" [
print msg/subject
]
mailbox: next mailbox
]
close mailbox
Here is another example that informs you when email is received from a group
of friends:
friends: [orson@rebol.com hans@rebol.com]
messages: read pop://user:pass@example.com
foreach message messages [
msg: import-email message
if find friends first msg/from [
print [msg/from newline msg/content]
send first msg/from "Got your email!"
]
]
This spam filter removes all messages from the server that do not contain
your email name anywhere within the message:
mailbox: open pop://user:pass@example.com
while [not tail? mailbox] [
mailbox: either find first mailbox user@example.com
[next mailbox][remove mailbox]
]
close mailbox
Here is a simple email list server that receives messages and sends them to a
group. The server only accepts email from people in the group.
group: [orson@rebol.com hans@rebol.com]
mailbox: open pop://user:pass@example.com
while [not tail? mailbox] [
message: import-email first mailbox
mailbox: either find group first message/from [
send/only group first mailbox
remove mailbox
][next mailbox]
]
close mailbox
11. FTP - File Transfer Protocol
The File Transfer Protocol (FTP) is used widely on the Internet for
transferring files to and from a remote host. FTP is commonly used for uploading
pages to a Web site and for providing online file archives.
11.1 Using FTP
In REBOL FTP file operations are handled in much the same way as local file
operations. Functions such as read, write,
load, save, do,
open, close, exists?,
size?, modified?, and others are used with
FTP. REBOL distinguishes between local files and files accessible by FTP through
the use of an FTP URL.
Access to FTP servers can be open or closed. Open access allows anyone to
login to the site and download files. This is called anonymous access and it is
used frequently for public file archives. Closed access requires that you
provide a username and password to download and upload files. This is the mode
of operation for uploading Web pages to a Web site.
Although FTP does not require your REBOL networking to be
configured, if you wish to use anonymous access, an email
address is required. This address is found in the
system/user/email object. Normally, when you boot REBOL,
this field is set from your user.r file. See Initial Setup for more detail.
If you are using FTP through a proxy server or firewall, FTP may need to
operate in passive mode. Passive mode does not require reverse connections from
the FTP server to the client for data transfers. This mode only makes outgoing
connections from your machine and allows a greater level of security. To enable
passive mode you need to set a flag in the FTP protocol handler:
system/schemes/ftp/passive: true
If you do not know if it is necessary, try FTP first without it. If that does
not work, try setting the passive flag.
11.2 FTP URLs
An FTP URL has the basic form:
ftp://user:pass@host/directory/file
For anonymous access the username and password can be left out:
ftp://host/directory/file
Most of the examples in this section use this form for simplicity; however,
they also work with a username and password.
To access a remote directory, end the URL with a slash, such as:
ftp://user:pass@host/directory/
ftp://host/directory/
ftp://host/
More about directory access is shown below.
It is convenient to put the URL in a variable and use paths to provide the
file names. This allows you to refer to the URL with just a word. For
example:
site: ftp://ftp.rebol.com/pub/
read site/readme.txt
This technique is used in some of the sections that follow.
11.3 Transferring Text Files
FTP distinguishes between text files and binary files. When transferring text
files, FTP converts the line break characters. This is not desirable for binary
files.
To read a text file, supply the read function with an FTP
URL:
file: read ftp://ftp.site.com/file.r
This puts the contents of the file into a string. To write the file locally,
use this line:
write %file.r read ftp://ftp.site.com/file.r
Many of the refinements of read can also be used. For
instance, you can use read/lines with:
data: read/lines ftp://ftp.site.com/file.r
This example returns a block of lines for the file. See the Files
chapter for more information about the refinements to the read
function.
To write a text file to the server, use the write
function:
write ftp://ftp.site.com/file.r read %file.r
The write function can also include refinements. See the Files
chapter.
As with normal text file transfers, all line termination will be properly
converted during FTP transfers.
Here is a simple script that updates files to your Web site:
site: ftp://wwwuser:secret@www.site.dom/pages
files: [%index.html %home.html %info.html]
foreach file files [write site/:file read file]
This should not be used for transferring graphics or sound files, as they are
binary. Use the technique shown in Transferring
Binary Files.
In addition to the read and write
functions, you can use the load, save, and
do functions with FTP.
data: load ftp://ftp.site.com/database.r
save ftp://ftp.site.com/data.r data-block
do ftp://ftp.site.com/scripts/test.r
11.4 Transferring Binary Files
To avoid the line termination conversion when transferring binary files
(images, archives, executable files), use the /binary
refinement. For instance, to read a binary file from an FTP server:
data: read/binary ftp://ftp.site.com/file
To make a local copy of the file:
write/binary %file read/binary ftp://ftp.site.com/file
To write a binary file to a server:
write/binary ftp://ftp.site.com/file read/binary %file
No line termination conversions are performed.
To transfer a set of graphics files to a Web site, use this script:
site: ftp://user:pass@ftp.site.com/www/graphics
files: [%icon.gif %logo.gif %photo.jpg]
foreach file files [
write/binary site/:file read/binary file
]
11.5 Appending to Files
FTP also allows you to append text and data to an existing file. To do so,
use the write/append refinement as described in the Files
chapter.
write/append ftp://ftp.site.com/pub/log.txt reform
["Log entry date:" now newline]
This can also be used with binary files.
write/binary/append ftp://ftp.site.com/pub/log.txt
read/binary %datafile
11.6 Reading Directories
To read the file names of an FTP directory, follow the directory name with a
forward slash:
print read ftp://ftp.site.com/
pub-files: read ftp://ftp.site.com/pub/
The ending forward slash (/) indicates that this is a directory access not a
file access. The forward slash is not always required, but it is recommended
when you know you are accessing a directory.
The block of files that is returned includes all of the files in the
directory. Within that block, directory names are indicated with a forward slash
following their names. For example:
foreach file read ftp://ftp.site.com/pub/ [
print file
]
readme.txt
rebol.r
rebol.exe
library/docs/
You can also use the dir? function on a file to determine if
it is a directory.
11.7 File Information
The same functions that provide information about files also
provide information about FTP files. This includes the
modified?, size?, exists?, dir?, and info?
functions.
You can use the exists? function to determine if a file
exists:
if exists? ftp://ftp.site.com/pub/log.txt [
print "Log file is there"
]
This works for directories too, but include the forward slash at the end of
the directory name:
if exists? ftp://ftp.site.com/pub/rebol/ [
print read ftp://ftp.site.com/pub/rebol/
]
To get the size or modification date for a file:
print size? ftp://ftp.site.com/pub/log.txt
print modified? ftp://ftp.site.com/pub/log.txt
To determine if the file is actually a directory:
if dir? ftp://ftp.site.com/pub/text [
print "It's a directory"
]
You can obtain all this information in a single access by using the
info? function:
file-info: info? ftp://ftp.site.com/pub/log.txt
probe file-info
print file-info/size
To perform the same operation on a directory:
probe info? ftp://ftp.site.com/pub/
To print a directory listing:
files: open ftp://ftp.site.com/pub/
forall files [
file: first files
info: info? file
print [file info/date info/size info/type]
]
11.8 Making Directories
New FTP directories can be created with the make-dir
function:
make-dir ftp://user:pass@ftp.site.com/newdir/
11.9 Deleting Files
With appropriate permission settings, files can be deleted from a remote FTP
server by using the delete function:
delete ftp://user:pass@ftp.site.com/upload.txt
You can also delete directories:
delete ftp://user:pass@ftp.site.com/newdir/
Note that a directory must be empty for this to succeed.
11.10 Renaming Files
You can rename a file with the line:
rename ftp://user:pass@ftp.site.com/foo.r %bar.r
The new name for the file will be bar.r.
FTP also allows you to move a file to a different directory with:
rename ftp://user:pass@ftp.site.com/foo.r %pub/bar.r
To rename a directory on an FTP site be sure to follow the directory name
with a slash:
rename ftp://user:pass@ftp.site.com/rebol/ rebol-old/
11.11 About Passwords
The above examples include the password within their URLs, but if you plan on
sharing your script, you probably don't want that information to be known.
Here's a simple way to prompt for a password and build the correct URL:
pass: ask "Password? "
data: read join ftp://user: [pass "@ftp.site.com/file"]
Or, you can ask for both the username and password:
user: ask "Username? "
pass: ask "Password? "
data: read join ftp:// [
user ":" pass "@ftp.site.com/file"
]
You can also open FTP connections by using a port specification rather than a
URL. This allows you to use any password, even ones containing special
characters that are not easily written in URLs. An example of a port
specification to open an FTP connection is:
ftp-port: open [
scheme: `ftp
host: "ftp.site.com"
user: ask "Username? "
pass: ask "Password? "
]
See Specifying Network Resources above for more detail.
11.12 Transferring Large Files
Transferring large files requires special considerations. You may want to
transfer the file in chunks to reduce the memory required by your computer and
to provide user feedback while the transfer is happening.
Here is an example that downloads a very large binary file in chunks.
inp: open/binary/direct ftp://ftp.site.com/big-file.bmp
out: open/binary/new/direct %big-file.bmp
buf-size: 200000
buffer: make binary! buf-size + 2
while [not zero? size: read-io inp buffer buf-size][
write-io out buffer size
total: total + size
print ["transferred:" total]
]
Be sure to use the /direct refinement, otherwise the entire
file will be buffered internally by REBOL. The read-io and
write-io functions allow reuse of the buffer memory that has
already allocated. Other functions such as copy would allocate
additional memory.
If the transfer fails, you can restart FTP from where it left off. To do so,
examine the output file or the size variable to determine where to restart the
transfer. Open the file again with a custom refinement that specifies
restart and the location from which to start the read. Here is
an example of the open function to use when the total
variable indicates the length already read:
inp: open/binary/direct/custom
ftp://ftp.site.com/big-file.bmp
reduce ['restart total]
You should note that restart only works for binary transfers. It cannot be
used with text transfers because the line terminator conversion that takes place
will cause incorrect offsets.
12. NNTP - Network News Transfer Protocol
The Network News Transfer Protocol (NNTP) is the basis for tens of thousands
of newsgroups that provide a public forum for millions of Internet users. REBOL
includes two levels of support for NNTP.
The built-in support for NNTP that provides very limited functionality and
access. This is the NTTP scheme.
An extended level of functionality that is provided by the news scheme that
is implemented in the file distributed as nntp.r.
12.1 Reading the Newsgroup List
NNTP consists of two components: a list of newsgroups supported by a specific
newsgroup server (newsgroups are typically selected by an Internet service
provider); and, a database of messages that are currently available for any
particular newsgroup.
To retrieve the list of all newsgroups from a specific news server, use the
read function with an NNTP URL such as:
groups: read nntp://news.example.com
This may take a while, depending on your connection; there are thousands of
newsgroups.
12.2 Reading All Messages
If you are using a fast connection, you can read all of the pending messages
for a newsgroup with:
messages: read nntp://news.example.com/alt.test
However, caution is advised. Some newsgroups can have thousands of messages.
It can take a long time to download all the messages, and you may run out of
memory to hold them.
12.3 Reading Single Messages
To read single messages, open NNTP as a port and use series functions to
access messages. This is similar to how you read email from a POP port. For
example:
group: open nntp://news.example.com/alt.test
You can use the length? function to determine the number of
messages that are available in the newsgroup:
print length? group
To read the first message available in the newsgroup, use
first:
message: first group
To select a specific message in the group by index, use
pick:
message: pick group 37
To create a simple loop that scans all messages for
a keyword, use:
forall group [
if find msg: first first group "REBOL" [
print msg
]
]
Remember that when the loop returns, the group series is positioned to the
tail. If you need to return to the head of the group:
group: head group
Be sure to close a port once you are done using it:
close group
12.4 Handling News Headers
News messages always include a header. The header holds information such as
the sender, summary, keywords, subject, date, and other fields.
Headers are handled as objects. To convert a news message to a news header
object you can use the import-email function. For example:
message: first first group
header: import-email message
You can now access the fields of the news message:
print [header/from header/subject header/date]
Different newsgroups and newsgroup clients use different fields in their
header. To view the fields available for a specific message display the first
item of the header object:
print first header
12.5 Sending a News Message
Before you can send a news message, you need to create a header for it. Here
is a generic header that can be used for news:
news-header: make object! [
Path: "not-for-mail"
Sender: Reply-to: From: system/user/email
Subject: "Test message"
Newsgroups: "alt.test"
Message-ID: none
Organization: "Docs For All"
Keywords: "Test"
Summary: "A test message"
]
Before you can send it, you need to create a unique global identification
number for it. Here is a function that does that:
make-id: does [
rejoin [
"<"
system/user/email/user
"."
checksum form now
"."
random 999999
"@"
read dns://
">"
]
]
print news-header/message-id: make-id
<carl.4959961.534798@fred.example.com>
Now you can combine the header with the message. They must be separated by at
least one blank line. The content of the message is read from a file.
write nntp://news.example.net/alt.test rejoin [
net-utils/export news-header
newline newline
read %message.txt
newline
]
13. CGI - Common Gateway Interface
The common gateway interface is used with many Web servers to provide
processing beyond the normal HTTP Web interface. CGI requests are submitted from
Web browsers to Web servers. When a server receives a CGI request, it typically
executes a script to process the request and return a result to the browser.
These CGI scripts can be written in a variety of languages, and REBOL provides
one of the easier ways of handling CGI.
13.1 CGI Server Setup
Setting up CGI access is different for every Web server. See the instructions
provided with your server.
Typically a server has an option for enabling CGI operation. You need to
enable this option and provide a path to the directory where your CGI scripts
reside. A common directory for CGI scripts is in cgi-bin.
On Apache servers, the ExecCGI option enables CGI scripts, and you
can provide a directory (cgi-bin ) for your scripts. This is normally
set up by default installation of Apache.
To configure CGI for Microsoft IIS, go to the properties for
cgi-bin and click on the configuration button. On the
configuration panel click add and enter the path to your
rebol.exe file. The format for this is:
C:\rebol\rebol.exe -cs %s %s
The two %s symbols are required for correctly passing the script and
command line arguments to REBOL. Add the extension for REBOL files (.r
) and set the last field to PUT, DELETE. The script engine
does not need to be selected.
The -cs option that is provided to REBOL enables CGI operation and
allows the script to access all files. (!!See notes below on how scripts can
limit file access to selected directories).
With Web servers other than those described above, the server requires
configuration to execute the REBOL executable for .r extension files
and run REBOL with the required option -cs.
13.2 CGI Scripts
Before a script can be executed on most CGI servers, it needs to have the
correct file permissions. On UNIX-type systems or those that use the Apache
server you need to change the permissions to enable the script to be readable
and executable by all users. This can be done with the chmod
function. If you are new to this concept, you should read your operating system
manual or talk with your system administrator before changing file
permissions.
For Apache and various other Web servers to run REBOL scripts, you need to
provide the correct header at the top of each script file. The header specifies
the path to the REBOL executable file and the -cs option. This can be
followed by the normal REBOL script header. Here is a simple CGI script that
prints the string, hello!.
#!/path/to/rebol -cs
REBOL [Title: "CGI Test Script"]
print "Content-Type: text/plain"
print "" ; required
print "Hello!"
There are many things that prevent a CGI script from running correctly. Get
this simple script working first before you try more complex scripts. If your
script does not work, here are a few items to check:
- You have CGI enabled on your Web server.
- The first line begins with a #! and the correct path to REBOL.
- The -cs option is supplied to REBOL.
- The script begins with "Content-Type:" being printed. (!!see below)
- The script is in the correct directory. (normally the cgi-bin
directory)
- The script has the correct file permissions (readable and executable by
all).
- The script contains the correct line break characters. Some servers do not
run scripts that contain the CR character for line breaks. You may need to
convert the file. (Use REBOL to do this in one line: write file, read file).
- The script does not contain errors. Test it without CGI to make sure that the
script loads (does not have syntax errors) and functions properly. Provide some
sample data and test it.
- All files that are accessed by the script have the correct file
permissions.
Often one or more of the above items is wrong and prevent your script from
running. You may see an error when viewing the Web page. If it says "Server
Error" or "CGI Error" then it is typically something to do with the permissions
or setup of the script. If it shows a REBOL error message, then the script is
running, but you have an error within the script.
In the example script shown above, the Content-Type line is
critical. It is part of the HTTP header that is returned to the browser, and it
tells the browser the type of content being delivered. This is followed by a
blank line to separate it from the actual content.
Many different types of content can be delivered. The previous example was
plain text, but you can also deliver HTML as is shown in the next example. (See
your Web server manual for more information about content types.)
The content type and blank line can be combined into a single line. The caret
forward slash (^/) symbol provides an additional line break to separate
it from the content.
print "Content-Type: text/plain^/"
It is a good practice to always print this line immediately from your script.
This allows error messages to be seen by the browser if your script encounters
an error.
Here is a simple CGI script that prints the time:
#!/path/to/rebol -cs
REBOL [Title: "Time Script"]
print "Content-Type: text/plain^/"
print ["The time is now" now/time]
13.3 Generating HTML Content
There are as many ways to create HTML content as there are ways to create
strings. This page creates a page that displays a page hit counter:
#!/path/to/rebol -cs
REBOL [Title: "HTML Example"]
print "Content-Type: text/html^/"
count: either exists? %counter [load %counter][0]
save %counter count: count + 1
print [
{<HTML><BODY><H2>Web Counter Page</H2>
You are visitor} count {to this page!<P>
</BODY></HTML>}
]
The script in the example above loads and saves to a counter text file. For
this file to be accessible, it will require the appropriate permissions be set
to allow access by all users.
13.4 CGI Environment
When a CGI script is run the server provides information to REBOL about the
CGI request and its arguments. All of this information is provided as an object
within the system/options object. To view the fields of the object,
type:
probe system/options/cgi
make object! [
server-software: none
server-name: none
gateway-interface: none
server-protocol: none
server-port: none
request-method: none
path-info: none
path-translated: none
script-name: none
query-string: none
remote-host: none
remote-addr: none
auth-type: none
remote-user: none
remote-ident: none
Content-Type: none
content-length: none
other-headers: []
]
Of course, your script will ignore most of this information, but some of it
could be of use. For instance, you may want to create a log file that records
the network address of the system that made the request, or check the type of
browser being used.
To generate a CGI page that displays this content in your browser:
#!/path/to/rebol -cs
REBOL [Title: "Dump CGI Server Variables"]
print "Content-Type: text/plain^/"
print "Server Variables:"
probe system/options/cgi
If you want to use this information in a log, you can write it to a file. For
example, to log the addresses of visitors to your CGI page you could write:
write/append/lines %cgi.log
system/options/cgi/remote-addr
The /append and /lines refinements causes
the write to be at the tail of the file and include a line-break. Here's another
approach that puts multiple items on the same line:
write/append %cgi.log reform [
system/options/cgi/remote-addr
system/options/cgi/remote-ident
system/options/cgi/content-type
newline
]
13.5 CGI Requests
There are two methods for CGI to provide request data to your scripts: GET
and POST.
The GET method encodes CGI data into the URL. This is used to provide
information to the server. You may have noticed before that some URLs look like
this:
http://www.example.com/cgi-bin/test.r?&data=test
The string that follows the question mark (?) provides the arguments to CGI.
At times they can be quite long. This string is provided to your script when it
is run. It can be obtained from the cgi/query-string field. For
instance, to print the string from a script:
print system/options/cgi/query-string
The data within the string can include whatever data you require. However,
because the string is part of a URL, data must be encoded. There are
restrictions on the characters that are allowed.
In addition, when the data is created by HTML forms, it is encoded in a
standard way. This data can be decoded and placed within an object with the
code:
cgi: make object! decode-cgi-query
system/options/cgi/query-string
The decode-cgi-query function returns a block that contains variable
names and their values. See the HTML form example in the next section.
The POST method provides the CGI data as a string. The data does not need to
be encoded. It can be in any format you desire and can even be binary. Post data
is read from the standard input device. You will need to read it from the input
with a line such as:
data: make string! 2002
read-io system/ports/input data 2000
This would read up to the first 2000 bytes of POST data and put it in a
string.
A good format for POST data is to use a REBOL dialect and create a simple
parser. The POST data can be loaded and parsed as a block. See the Parsing
chapter.
Warning About Blocks
It is not a good idea to pass REBOL blocks to be directly
evaluated because this can present a security risk. For
instance, someone could POST a block that reads or deletes files
on the server. However, it is safe to pass blocks that are
interpreted by your script (a dialect).
Here is an example script that displays the post data in your browser:
#!/path/to/rebol -cs
REBOL [Title: "Show POST data"]
print "Content-Type: text/html^/"
data: make string! 10000
foreach line copy system/ports/input [
repend data [line newline]
]
print [
<HTML><BODY>
{Here is the posted data.}
<HR><PRE>data</PRE>
</BODY></HTML>
]
13.6 Processing HTML Forms
CGI is often used for processing HTML forms. The forms accept input from
various fields and submit them to the Web server as an HTML get or post
method.
Here is an example that uses the CGI get to process a form and send an email
as the result. There are two parts to this: the HTML page and the CGI
script.
Here is an HTML page that includes a form:
<HTML><BODY>
<FORM ACTION="http://example.com/cgi-bin/send.r" METHOD="GET">
<H1>CGI Emailer</H1><HR>
Enter your email address:<P>
<INPUT TYPE="TEXT" NAME="email" SIZE="30"><P>
<TEXTAREA NAME="message" ROWS="7" COLS="35">
Enter message here.
</TEXTAREA><P>
<INPUT TYPE="SUBMIT" VALUE="Submit">
</FORM>
</BODY></HTML>
When the above script is submitted, it needs a CGI script to handle its
results. Here is an example of such a script. This example script decodes the
form data and sends the email. It returns a confirmation page.
#!/path/to/rebol -cs
REBOL [Title: "Send CGI Email"]
print "Content-Type: text/html^/"
cgi: make object! decode-cgi-query
system/options/cgi/query-string
print {<HTML><BODY><H1>Email Status</H1><HR><P>}
failed: error? try [send to-email cgi/email cgi/message]
print either failed [
{The email could not be sent.}
][
[{The email to} cgi/email {was sent.}]
]
print {</BODY><HTML>}
This script should be named send.r and stored in the
cgi-bin directory. It's permissions must be set to being readable and
executable by all.
When the form has been submitted by a browser, this script will run. It
decodes the CGI query string into a cgi object. The object now has email and
message variables that are used for the send function. Before
send is done, the email field is converted from a string to an
email datatype.
The send function is placed within a try block to catch
errors if they occur while sending the email. The failed variable is set to true
if an error occurred, and the appropriate message is generated.
Other CGI examples can be found in the REBOL Script Library at
http://www.rebol.com/library/library.html.
14. TCP - Transmission Control Protocol
In addition to those protocols previously described, you can create your own
network servers and clients with the transmission control protocol, TCP.
14.1 Creating Clients
TCP ports can be opened in the same way as other REBOL protocols, using the
TCP URL. To open a TCP connection to an HTTP (Web) server on TCP port number
80:
http-port: open tcp://www.example.com:80
Another way of opening a TCP connection is to provide the port specification
directly. This is a substitute for using a URL and is often quite useful:
http-port: open [
scheme: 'tcp
host: "www.example.com"
port-id: 80
]
Since ports are series, you can use the same series functions for sending and
receiving data. The example below queries the HTTP server opened in the previous
example. It uses the insert function to put data into the port
series which sends it to the server:
insert http-port "GET / HTTP/1.0^/^/"
The two newline characters are used to tell the server that the header has
been sent.
The newline characters are automatically converted to CR LF
sequences because the port was opened in text mode.
The server processes the HTTP request and returns a result to
the port series. To read the result, use the copy function:
while [data: copy http-port] [prin data]
This loop will continue to fetch data until a none is returned from
copy. This behavior differs between protocols. A none is returned because the
server closes the connection. Other protocols may send a special character to
indicate the end of the transfer.
Now that all the data has been received, HTTP port should be closed:
close http-port
Here is another example that connects to a POP port on a server:
pop: open/lines tcp://fred.example.com:110
This example uses the /lines refinement. The connection will
now be line oriented. Data will be written and read as lines. To read the first
line from the server:
print first pop
+OK QPOP (version 2.53) at fred.example.com starting.
To send the server a username for POP login:
insert pop "user carl"
Because the port is operating in line mode, a line terminator is sent after
the insert. The server response can be read with with:
print first pop
+OK Password required for carl.
And the rest of the communication would proceed as:
insert pop "pass secret"
print first pop
+OK carl has 0 messages (0 octets).
insert pop "quit"
first pop
+OK Pop server at fred.example.com signing off.
The connection should now be closed:
close pop
14.2 Creating Servers
To create a server you need to wait for connections and respond to them as
they occur. To set up a port on your machine that can be used to wait for
incoming connections:
listen: open tcp://:8001
Notice that you do not supply a host name, only a port number. This type of
port is called a listen port. The system now accepts
connections on port number 8001.
To wait for a connection from another machine, you wait on
the listen port.
wait listen
This function does not return until a connection has been made.
NOTE: There are other options available for wait . For
instance, you can wait on multiple ports or for a timeout as
well.
You can now open the connection port from the machine that has contacted your
system:
connection: first listen
This returns the connection that has been made to the listen port. It is a
port like all others and can now be used to receive and send data using the
insert, copy, first, and
other series functions:
insert connection "you are connected^/"
while [newline <> char: first connection] [
print char
]
When the communications is complete, the connection should be closed:
close connection
You are now ready for the next connection on the listen port. You can
wait again and use first again to get the
connection.
When you are done with serving, you can close the listen port with:
close listen
14.3 A Tiny Server
Here is a useful REBOL server that only requires a few lines of code. This
server evaluates whatever REBOL code is sent to it. Lines of REBOL are read from
the client until an error occurs. Each line must be a complete REBOL expression.
They can be of any length but must be a single line.
server-port: open/lines tcp://:4321
forever [
connection-port: first server-port
until [
wait connection-port
error? try [do first connection-port]
]
close connection-port
]
close server-port
If an error occurs, the connection is closed and the server waits for the
next connection.
Here is an example of a client script that allows you to enter REBOL command
lines remotely:
server: open/lines tcp://localhost:4321
until [error? try [insert server ask "R> "]]
close server
Here the query is used to determine if the connection was been closed due to
an error.
14.4 Testing TCP Code
To test your server code, connect from your own machine, rather than
requiring both a server and a client. This can be done from two separate REBOL
processes or even from the same process.
To connect to your local machine, you can use a line such as:
port: open tcp://localhost:8001
Here is an example that makes two ports connect to each other in line mode.
This is a sort of echo port since you're sending data to
yourself. It provides a good test of your code and networking:
listen: open/lines tcp://:8001
remote: open/lines tcp://localhost:8001
local: first listen
insert local "How are you?"
print first remote ; response
close local
close remote
close listen
15. UDP - User Datagram Protocol
The User Datagram Protocol is another transport layer protocol that provides
a connectionless method of communicating between machines. It allows you to send
datagrams, packets, between machines.
The operation of UDP is much different than TCP. UDP is simpler, but it is
essentially unreliable. There is no guarantee that a packet will ever reach its
destination. In addition, UDP has no flow control. If you send messages too
quickly, packets may be lost.
Like TCP, the wait function can be used to wait for the next
packet to arrive and the copy function is used to return the
data. If there is no data, copy waits until there is. Note,
however, that insert never waits.
Here is an example of a simple UDP server script:
udp: open udp://:9999
wait udp
print copy udp
insert udp "response"
close udp
The messages inserted here by the server are sent to the client the server
last received a message from. This allows responses to be sent for incoming
messages. However, unlike TCP you do not have a continuous connection between
the machines. Each packet transfer is a separate exchange.
The client script to communicate with the above server would be:
udp: open udp://localhost:9999
insert udp "Test"
wait udp
print copy udp
close udp
You should know that the maximum UDP packet size depends on the operating
system. 32 KB and 64 KB are common values. In order to send larger amounts of
data, you will need to buffer the data, chopping it into smaller pieces.
However, careful programming is required to make sure that each piece of the
data is received. Remember that with UDP, there are no guarantees.
|