SARA Architecture
SARA has an extensible architecture. At the center is a relatively
small generic kernel that knows little to nothing about system types,
network service names, vulnerabilities, or other details. Knowledge
about the details of network services, system types, etc. is built into
small, dedicated, data collection tools and rule bases. The behaviour
of SARA is controlled from a configuration file. Settings may be
overruled via command-line options of via a hypertext user interface.
The SARA kernel consists of the following main parts:
- Policy engine.
- Given the
constraints specified in the SARA
configuration file, this subsystem determines whether a host may be
scanned, and what scanning level is
appropriate for that host.
- Target acquisition.
- Given
a list of target hosts, this SARA subsystem generates a list of probes
to be run on those hosts. The list of probes serves as input to the data acquisition subsystem.
The target acquisition module also keeps track of a host's proximity level, and handles the
so-called subnet expansions.
- Data acquisition.
- Given a
list of probes, this SARA subsystem runs the corresponding data
collection tools and generates new facts. These facts serve as input to
the inference engine.
- Inference engine.
- Given a
list of facts, this subsystem generates new target hosts, new probes,
and new facts.
New target hosts serve as input to the
target acquisition subsystem; new probes are handled by the data acquisition subsystem, and new facts
are processed by the inference engine.
- Report and analysis.
- This
subsystem takes the collected data and builds a virtual hyperspace that
you can explore with your favourite HTML browser.
Once SARA is given an initial target host, the target acquisition,
data acquisition and inference engine subsystems keep feeding each
other new data until nothing new comes up. Technically speaking, the
system does a breadth-first search.
When you start SARA in interactive mode, i.e., using the HTML user
interface, SARA performs the following actions before starting up
the HTML browser:
- Start the SARA httpd daemon. This is a very limited subset of the
typical httpd daemon, sufficient to support all activities that SARA
can perform.
- Generate a (hopefully "good") 32 byte cryptographic magic cookie for
the upcoming SARA run. SARA runs several system utilities in
parallel and compresses their quasi-random output with the MD5 hashing
function. The HTML browser must specify this magic cookie as part of
the URLs that it sends to the custom SARA httpd daemon. If this key
is ever compromised, intruders could potentially execute any programs
that the SARA program can run, with the same privileges as the user
that started the SARA program. SARA generates a new magic cookie for
each session. SARA and the HTML browser always run on the same host,
so there is no need to send the magic cookie over the network.
- Read in any previously collected scan data. By default, SARA will
read data in the $sara_data database. In the mean time HTML
browser comes up, but it will not be ale to communicate with SARA
until the database has been read in. This can take anywhere from a few
seconds to several minutes, depending on the size of the database, the
speed of the machine you're using to run SARA on, the amount of
available RAM, etc.
The policy engine controls what hosts SARA may probe. The probing
intensity depends on the host's proximity
level, which is basically a measure for the distance from the
initial target host(s). Probing intensities and probing constraints are
specified in the configuration file. This
file can direct SARA to stay within certain internet domains, or to
stay away from specific internet domains.
While SARA gathers information from the so-called primary
target(s) that you specified, the program may learn about the existence
of other hosts. Examples of such non-primary systems are:
- hosts found in remote login information from the finger
service,
- hosts that import file systems from the target, according to the
showmount command.
For each host, SARA maintains a proximity count. The proximity of a
primary host is zero; for hosts that SARA finds while probing a
primary host, the proximity is one, and so on. By default, SARA stays
away from hosts with non-zero proximity, but you can override this
policy by editing the configuration file,
via command-line switches, or from the hypertext user interface.
SARA can gather data about just one host, or it can gather data about
all hosts within a subnet (a block of 256 adjacent network addresses).
The latter process is called a subnet scan
. Target hosts may be specified by the user, or may be generated
by the inference engine when it
processes facts that were generated by the
data acquisition module.
Once a list of targets is available, the target acquisition module
generates a list of probes, according to the scanning level derived by the policy engine. The actual data collection is
done under control of the data
acquisition module.
When requested to scan all hosts in a subnet (a block of 256 internet
addresses), SARA uses the fping utility to find out what
hosts in that subnet actually are available. This is to avoid wasting
time talking to hosts that no longer exist or that happen to be down at
the time of the measurement. The fping scan also may discover
unregistered systems that have been attached to the network without
permission from the network administrator.
The data acquisition engine takes a list of probes and executes each
probe, after it has verified that the probe may be run at the target's
scanning level. What tool may be run at
a given scanning level is specified in the configuration file. The software keeps a
record of what probes it has already executed, to avoid doing
unnecessary work. The result of data acquisition is a list of new facts
that is processed by the inference engine.
SARA comes with a multitude of little tools. Each tool implements one
type of network probe. By convention, the name of a data collection
tool ends in .sara. Often these tools are just a few lines of
PERL or shell script language. All tools produce output according to
the same common tool record format.
SARA derives a great deal of power from this toolbox approach. When a
new network feature becomes of interest, it is relatively easy to add your own probe.
SARA can probe hosts at various levels of intensity. The scanning
level is controlled with the configuration
file, but can be overruled with command-line switches or via the
graphical user interface.
- light
- This is the least intrusive scan. SARA collects information from
the DNS (Domain Name System), tries to establish what RPC (Remote
Procedure Call) services the host offers, and what file systems it
shares via the network. With this information, SARA finds out the
general character of a host (file server, diskless workstation).
- normal (includes light scan probes)
- At this level, SARA probes for the presence of common network
services such as finger, remote login, ftp, WWW, Gopher, email and a
few others. With this information, SARA establishes the operating
system type and, where possible, the software release version.
- heavy (includes normal scan probes)
- After it has found out what services the target offers, SARA looks
at them in more depth, and does a more exhaustive scan for network
services offered by the target. At this scanning level SARA finds out
if the anonymous FTP directory is writable, if the X Windows server has
its access control disabled, if there is a wildcard in the
/etc/hosts.equiv file, and so on. This level avoids known problems
in certain Microsoft Windows products, and other services such as the
font server.
- extreme (includes heavy scan probes)
- Same as heavy, but does not avoid the problems mentioned above.
At each level SARA may discover that critical access controls are
missing or defective, or that the host is running a particular software
version that is known to have problems. SARA takes a conservative
approach and does not exploit the problem.
The heart of SARA is a collection of little inference engines. Each
engine is controlled by its own rule base. The rules are applied in
real time, while data is being collected. The result of these
inferences are lists of new facts for the inference engine, new probes
for the data acquisition engine, or new
targets for the target acquisition
engine.
- rules/todo
- Rules that decide what probe to perform next. For example, when
the target host offers the FTP service, and when the target is being
scanned at a sufficient level, SARA will attempt to determine if the
host runs anonymous FTP, and if the FTP home directory is writable for
anonymous users.
- rules/hosttype
- Rules that deduce the system class (example: DEC HP SUN) and, where
possible, the operating system release version, from telnet, ftp and
other banners.
- rules/facts
- Rules that deduce potential vulnerabilities. For example, several
versions of the FTP or sendmail daemons are known to have problems.
Daemon versions can be recognized by their greeting banners.
- rules/services
- Rules that translate cryptic daemon banners and/or network port
numbers to more user-friendly names such as WWW server, or
diskless NFS client.
- rules/trust
- Like the services rules, these rules help SARA to classify the
data that was collected by the tools on NFS service, DNS, NIS, and
other cases of trust.
- rules/drop
- What data-collection tool output SARA should ignore. This can be
used to shut up SARA about things that you do not care about.
Implemented by the drop_fact.pl module.
Application of these rules in real time, to each tool output record,
and within the context of all information that has been collected
sofar, offers an amazing potential that we are only beginning to
understand.
When SARA scans a network with hundreds or thousands of hosts, it can
collect a tremendous amount of information. As we have found, it does
not make much sense to simply present all that information as huge
tables. You need the power of hypertext technology, combined with some
unusual implementation techniques to generate a dynamic hyperspace on
the fly.
With a minimal amount of effort (at least, by you; your computer may
disagree), SARA allows you to navigate though your networks. You can
break down the information according to:
- Domain or subnet,
- Network service,
- System type or operating system release,
- Trust relationships,
- Vulnerability type, danger level, or count.
Breakdowns by combinations of these properties are also possible.
SARA's reporting capabilities makes it relatively easy to find out,
for example:
- What subnets have diskless workstations,
- What hosts offer anonymous FTP,
- Who runs Linux or FreeBSD on their PC,
- What unregistered (no DNS hostname) hosts are attached to your network.
Questions like these can be answered with only a few mouse clicks.
Printing a report is a matter of pressing the print button of
your favourite hypertext viewer.
Back to the Reference TOC/Index