Microsoft Word - OpenTC_newsletter_03_for

OpenTC Newsletter
January 2008

From the Open Trusted Computing (OpenTC) research project, sponsored by the
European Union.

In this issue:

- Editorial: Verified compartments to protect users against malware
- Private Electronic Transactions - The OpenTC proof-of-concept prototype
- Static analysis using Abstract Interpretation (part 2)
- Recent OpenTC publications

Editorial: Verified compartments to protect users against malware

By: Arnd Weber, ITAS, Forschungszentrum Karlsruhe, Germany

Dear reader,

At the beginning of the OpenTC project, many opinions on Trusted Computing (TC) gathered
on the Internet in discussion fora and articles were critical of the technology. Some people
believed that TC would be used for tracking citizens’ activities, locking computers into
proprietary configurations, enforcing unfair business models for content or software
provision, and even automatically deleting emails and documents without the user’s approval.
In this context, the project’s work package on requirement analysis (WP02) found that a
prototype showing TC's potential to protect users could be used to make TC more attractive,
as opposed to an application protecting "only" organisations like employers or content
providers. Such a prototype would make it possible to demonstrate to the public the broad
range of purposes for which TC can be used. A particularly suitable use-case in our opinion is
one that demonstrates how TC can be used to protect individuals against financial risks, e.g.,
security threats in homebanking or online auctions. Given that computer criminals frequently
produce malware software to fool users into giving them their security credentials, so-called
phishing, strong protection against such attacks would be the objective of a suitable use-case
implementation. On the basis of the OpenTC architecture, the design of the prototype would
comprise a TC-secured hypervisor with a verified compartment running an isolated browser,
to be used for the risky transactions underlined in the scenario (i.e., not for general web
surfing) in order to keep it secure. This compartment would run next to a compartment with a
legacy operating system and untrusted applications used for casual tasks. We nicknamed this
prototype the "PET", an abbreviation of "Private Electronic Transactions", and the consortium
implemented a proof-of-concept prototype based on this scenario.

The PET prototype is described in the first article by Stephane Lo Presti (Royal Holloway,
University of London), Gianluca Ramunno (Politecnico di Torino), and Dirk Kuhlmann
(Hewlett Packard Laboratories, Bristol). The design and decision process which led to
creating the PET is described in more detail in Kuhlmann/Weber [1], where the reader can
find a review of the various arguments found in the media, a description of the consortium’s
internal decision process, and the conclusions drawn from this process, as well as information
on the OpenTC architecture and PET design. The source code has been made publicly

available [2] in order to enable developers, in particular FOSS (Free and Open Source
Software) developers, to use it.

In future real-world systems that use the code produced by the OpenTC project, the various
mechanisms from the OpenTC architecture should be implemented at a very high level of
quality. In the second article in this OpenTC newsletter, we continue our description of the
consortium’s approach to improving the source code of the project. In the last issue, Pascal
Cuoq from CEA-LIST, Paris, provided an introduction to "Static analysis using Abstract
Interpretation", while in this second part of his article he goes on explaining precisely how
static analysis works. The consortium’s approach to the trustworthiness of code will continue
to be explored in upcoming articles in the next issues of the OpenTC newsletter.

We conclude this newsletter with an announcement of a new research paper published by
members of the OpenTC consortium.

References:
- [1] Kuhlmann, Dirk; Weber, Arnd (eds.): Requirements Definition and Specification.
OpenTC Project Deliverable D02.2.

http://www.opentc.net/deliverables2006/OTC_D02.2_Requirements_Definition_and_Specific
ation_update.pdf

- [2] Free and Open-Source Software (FOSS) components of the Open Trusted Computing
architecture:

http://www.opentc.net/index.php?option=com_content&task=view&id=47&Itemid=65

Contact: arnd.weber at itas.fzk.de

Our thanks go to Richard Brown, Alison Hepper, Dirk Kuhlmann, Stephane Lo Presti and
Dirk Weber for their help in preparing this issue.

Private Electronic Transactions - The OpenTC proof-of-concept prototype

By: Stephane Lo Presti, Royal Holloway, University of London, UK; Gianluca Ramunno,
Politecnico di Torino, Italy; and Dirk Kuhlmann, Hewlett Packard Laboratories, Bristol, UK

Introduction
---------------
The OpenTC project recently published the first proof-of-concept prototype of its Open
Trusted Computing architecture. This work involved several developers from HP (United
Kingdom), IBM (Switzerland), Politecnico di Torino (Italy), Ruhr-Universitaet Bochum
(Germany), Technische Universitaet Dresden (Germany), IAIK Graz University of
Technology (Austria) and SuSE (Germany). The prototype aims to demonstrate some core
ideas of the OpenTC approach, combining Trusted Computing (TC) and virtualisation
technologies with Free and Open Source Software (FOSS) development. The release
implements a concrete application of TC, is available to the general public, and could serve as
a working basis for interested developers.

As a proof-of-concept prototype, the implementation is far from comprehensive, and has not
been thoroughly tested. It does not include all the necessary components for the OpenTC
architecture, nor are the existing ones in any finalised form. Nevertheless, we hope that the
proof-of-concept work will give some impression of how TC could be applied in practice,

foster discussions about the use of TC and the OpenTC architecture, and encourage
contributions from the FOSS communities. With some minor exceptions, all source code is
released under the GPLv2 license. It is provided as both a Live CD image (binaries) and as
source code (see link at the end of the article). The prototype was developed and tested on HP
nx6325 and IBM T60 laptops equipped with Trusted Platform Modules (TPMs) v1.2 and 1
GB RAM.

OpenTC makes use of virtualisation layers - also called Virtual Machine Monitors (VMMs)
or hypervisors - and supports two different implementations (i.e., Xen and L4/Fiasco). This
layer hosts compartments, also called virtual machines (VMs), domains or tasks, depending
on the VMM being used. Some domains host trust services that are available to authorised
user compartments. Various system components make use of TPM capabilities, e.g., in order
to measure other components they depend on or to prove the system integrity to remote
challengers.

The current prototype implements a scenario called Private Electronic Transactions (PET),
which aims to improve the trustworthiness of interactions with remote servers. Transactions
are simply performed by accessing a web server through a standard web browser running in a
dedicated trusted compartment named "domT". In the PET scenario, the server is assumed to
host web pages belonging to a bank; however, the setup also applies to other e-commerce
services.

The communication setup between browser compartment and server was extended by a
protocol for mutual remote attestation tunnelled through an SSL/TLS channel. During the
attestation phase, each side assesses the trustworthiness of the other. If this assessment is
negative on either side, the SSL/TLS tunnel is closed, preventing further end-to-end
communication. If the assessment is positive, end-to-end communication between browser
and server is enabled via standard HTTPS tunnelled over SSL/TLS.

General approach
---------------------
The approach of this scenario relies on four elements:

1) Trusted platform setup
2) Authenticated boot process
3) Registration of the known-good measurement values
4) Trusted communication setup

The following sections give a cursory overview of these steps.

1. Trusted Platform Setup

As a first step, the user has to initialise the client platform and to prepare it for subsequent
operations.

The setup procedure is performed as soon as the OpenTC system has started up for the first
time. First, the user, acting as the platform owner, must "take ownership" of the TPM. This
assumes that the TPM is enabled in the BIOS and, if it was already used before, is cleared and
re-enabled. More information about this feature of TC can be found at [1].

Next, an Attestation Identity Key (AIK) must be created and certified, and the corresponding

identity activated. In a real-world situation, Privacy Certification Authorities (PCAs) that
operate a valid TC-enabled Public Key Infrastructure (PKI) would be used during this
process. However, for the sake of simplicity, both the requests for and the release of the AIK
certificate operations are currently handled locally on the client, using hard-coded passwords.
This is done by a software library that comes with hard-wired mock certificates for the
authorities involved (e.g., PCA).

2. Authenticated boot process

From the initial bootup process up to the start of trusted components, a chain of trust is
generated: each component of the chain is measured prior to passing control to it. The
component measurements, i.e. their fingerprints through cryptographic hashes of relevant
binary and configuration data, are accumulated in Platform Configuration Registers (PCRs) of
the TPM.

In order to generate the chain of trust, all components in this chain must be instrumented to
perform integrity measurements of their successors in execution. For instance, the BIOS must
include a Core Root of Trust for Measurement (CRTM), as defined by the Trusted Computing
Group. Further modifications concern the Master Boot Record (MBR) and the boot loader.
The latter has to measure the hypervisor, the kernel and initial ram disk images of privileged
domains, or files or disk images of trusted compartments. For this purpose, the OpenTC
prototype includes "tGRUB", an extended version of the familiar GRand Unified Bootloader
(GRUB). The tGRUB boot menu offers a choice between the two different virtualisation
layers, XEN and L4/Fiasco.

A second boot menu option concerns the execution mode. It offers a "normal user" mode that
comes with a simplified interface and restricts access to management functions, geared
towards showing how the user can perform a transaction in a real-life scenario. The "expert
user" mode, on the other hand, enables full access to management features, permitting a peek
“under the hood”. It allows interactive access and comes with a more complex interface. As a
demonstration of the differences between successful and failing verification of integrity
measurements, the boot menu also provides a "good domT" and a "rogue domT" option. In
the "good" mode, all measurements match their expected values. On the other hand, the
"rogue" mode simulates a modified compartment that could have been tampered with by an
attack from a malicious program. In this case, at least one PCR contains an unexpected
measurement value.

OpenTC’s tGRUB loader is constrained to supporting the so-called Static Core Root of Trust
for Measurement (S-CRTM) model. In this case, all security and trust-relevant components
must be measured, starting with the BIOS. To demonstrate the new Dynamic Core Root of
Trust for Measurement (D-CRTM) approach introduced with TPM v1.2, OpenTC provides
the Open Secure LOader (OSLO) boot loader implemented as a standard module for
GRUB/tGRUB. OSLO implements the D-CRTM for AMD's CPUs, invoking the SKINIT
instruction for re-initialising the platform in a trustworthy manner late at runtime.

3. Registration of the Known-Good Measurement Values

The client can now register with a server by uploading the measurements that represent its
platform state (that is assumed to be trusted). During this procedure, the user (acting as a bank
operator) registers his platform with a bank server. To this end, he uploads the integrity
measurements of his platform (i.e., the expected value for the domT compartment), thus

authorising the trusted compartment to connect to the bank server. In a realistic scenario, this
process would be automated using secure communications and a dedicated registration
protocol. Since the prototype does not yet implement this feature, the user instead launches a
script (in "normal user" mode) or uses a browser (in the "expert user" mode) to upload a file
containing a measurement digest to the front-end proxy running on the bank server. This
digest corresponds to the current state of the client platform, i.e., its trusted virtualisation layer
and the trusted domain domT. For simplification, the bank server (named "domS")
measurements are already present on the client side (in a privileged compartment of the
virtualisation layer named Domain-0 or dom0) of the OpenTC system: the PCR metrics
defined to correspond to a "trusted" server state are currently hard-wired into the system.

4. Trusted communication setup

Trust status information is exchanged between the client and the server through a pair of
proxy services running in a privileged compartment (dom0) on the client and as a front-end in
the server compartment (domS). The proxies communicate through an SSL/TLS tunnel that
can encapsulate any TCP-based protocol. For the PET scenario, HTTPS is used.

The communication setup is initiated when the browser is started in the trusted compartment
(domT) and the link provided in its toolbar clicked on. The connection request is passed to the
client proxy that runs in the privileged compartment (dom0). A dedicated component running
in this compartment opens an SSL/TLS tunnel to connect to the server-side proxy running in
the bank server compartment (domS). The permissibility of this connection must be stated in
the policy of the virtualisation layer. Measurements of software components on both sides
(client and server, see above) are represented by PCR values of the hardware and software
TPM. These values are signed with the AIK acquired in step 3, and communicated to the
respective peer system.

Communication between the client and the server will only be enabled if both the client and
the server metrics suggest that they booted into (and still run) the expected configuration. This
could prove a countermeasure to phishing attacks, preventing the user from following a false-
lead URL to connect to a fake bank server. The mechanism might also improve the protection
of the bank server against unauthorised connections.

Platform components and behaviour
-------------------------------------------
The authenticated boot process launches the selected virtualisation layer, which is responsible
for controlling four dedicated compartments, namely:

- A privileged compartment (called "dom0") that directly accesses the physical platform and
includes the drivers for hardware devices. This compartment is also used to perform
management operations at the virtualisation layer. In "normal user" mode, this compartment is
not visible, whereas the "expert user" mode allows client compartments (see below) to be
started manually via scripts. Dom0 is part of the Trusted Computing Base (TCB).

- A compartment for the server side of the demonstrator (called "domS") which executes the
banking application and its front-end. It locally simulates a remote server (web server and
proxy). For the prototype, this removes the dependency on external communication which
would require an additional, separate banking computer. The domain is accessed via the
network name "domSbox", and a software TPM emulator is used to perform the integrity-
related operations for this domain. The compartment is not visible to the user. It goes without

saying that this domain could equally be executed on a remote physical machine, and future
versions of the OpenTC prototype will support this.

- Two compartments for the client side of the system. The first compartment (called "domT")
solely provides web browsing as its single functionality. It is considered trusted in that its
integrity has been measured and the values correspond to a well-known configuration.
Measurements, which include the configuration file, the kernel and the virtual disk image with
the root file system, are accumulated within PCR[11] in the TPM through the “PCR extend”
operation. The second compartment is an untrusted one (called "domU"). It is not measured
and is intended for daily use, but specifically not for the PET transaction.

The platform policy ensures that only the trusted compartment (domT) is authorised to use the
client proxy (executing in the privileged compartment dom0) for connecting to the bank
server (executing in domS). By contrast, the untrusted compartment (domU) cannot connect
to the bank server, since no measurements exist that can be used to attest to its trusted state.

Client compartments run a stripped-down Linux system (a Debian-based Damn Small
Linux/DSL distribution), while the privileged compartment dom0 runs either a SuSE or a
DSL distribution, depending on which version of prototype is used. The server compartment
(domS) runs a standard Debian distribution.

The L4-based OpenTC implementation provides a secure GUI service for managing input and
output. This GUI is provided as a trusted service started by tGRUB at boot time and running
in a dedicated compartment. The top part of the display is reserved to indicate the number of
active compartments and the name of the one currently hooked to the display. This section of
the screen is considered trusted because it is under the exclusive control of the secure GUI.
With the L4 virtualisation layer, a pair of hotkeys are used to switch between compartments.

For the Xen-based OpenTC implementation, the whole screen is under the control of a
selected compartment. Each compartment is assigned a fixed hotkey. When a specific hot key
associated to a compartment is pressed, the user can be sure that the desired compartment will
actually be displayed since the key is under the exclusive control of the privileged
compartment dom0.

Shortcuts, simplifications, and limitations
--------------------------------------------------
Trusted Computing is an emerging technology, and infrastructure support, e.g., for issuing
certificates, does not yet exist. The technology is also fairly complex. For example, first-time
users are easily confused by the multiplicity of authorisation secrets. For the sake of
simplicity, the OpenTC prototype, therefore, uses fixed passwords as authorisation secrets for
the TPM and AIK keys (which can, of course, be changed for all operations by editing the
configuration scripts in the "expert user" mode).

The prototype attempts to strike a balance here in providing all necessary components
(including PKI mechanisms) as part of the distribution. In particular, the server-side
mechanisms reside in a dedicated compartment on the same physical hardware that runs the
client.

The PCR metrics corresponding to a trusted PKI server state are currently hard-wired into the
PKI component running on the client system. The banking compartment domS is not actually
measured since it employs a TPM emulator instead of a hardware TPM. Instead, we just

communicate the initial PCR values of the software TPM. In a future release of the OpenTC
prototype, the server side will run on a different physical platform, and integrity
measurements of the server domain will be duly recorded.

In the "rogue domT" scenario, the measurement of the client banking compartment (domT)
does not correspond to the one uploaded earlier to the bank server. Consequently, the
attestation will fail, and communication with the bank server will be disallowed. Due to
implementation specifics, the PCRs mismatch currently has to be simulated in that the values
are not generated from actual measurements of the persistent compartment image but only
from a different configuration file for domT. A future release will fix this issue.

Furthermore, the current prototype is also limited in that man-in-the-middle or "relay of
attestation challenge" attacks have not yet been considered. For more information, please see
[2].

Outlook
---------
Although the current implementation of the PET scenario is relatively straightforward and
cuts quite a number of corners, it is a concrete example of the possibilities of coupling TC
with virtualisation technology. As the source code of the prototype is publicly available, the
OpenTC prototype provides a working basis for any developer interested in implementing TC
applications.

About the authors: Stephane Lo Presti is a research assistant working at Royal Holloway,
University of London (UK), on mobile trusted computing and dissemination activities of the
OpenTC project. Gianluca Ramunno is a research assistant working at the technical university
Politecnico di Torino (IT) on security applications and PKI for trusted computing. In the
OpenTC project, he coordinated the development of the proof-of-concept prototype related to
the PET scenario. Dirk Kuhlmann is a senior research engineer for Hewlett Packard
Laboratories in Bristol, UK, where he works as a member of the Trusted Systems Laboratory.
He acts as the overall technical leader for the OpenTC project.

Download:
The prototype can be downloaded from the OpenTC website, URL:

http://www.opentc.net/index.php?option=com_content&task=view&id=45&Itemid=63

The source code for the components of the OpenTC system is referenced in the following
webpage:

http://www.opentc.net/index.php?option=com_content&task=view&id=47&Itemid=65

References:
- [1] "The Roots of Trust - The RTM and the TPM ", Eimear Gallery. Second Lecture of the
Trusted Computing course, MSc of Information Security, Royal Holloway, University of
London, 2007.

- [2] Kenneth Goldman, Ronald Perez and Reiner Sailer: Linking remote attestation to secure
tunnel endpoints. In: Proceedings of the first ACM workshop on Scalable Trusted Computing
(STC '06), pp. 21-24. Alexandria, Virginia, USA, 2006. ACM Press, ISBN: 1-59593-548-7,

http://doi.acm.org/10.1145/1179474.1179481

Contacts: Stephane.Lo-Presti at rhul.ac.uk; Ramunno at polito.it

Static analysis using Abstract Interpretation (part 2)

By: Pascal Cuoq, Commissariat à l'Energie Atomique (CEA-LIST), Paris, France

This is the second part of the article, the first part of which was published in the September
2007 issue of the OpenTC newsletter, see [1].

The sample program analysis used in the first part of the article showed how static analysis
can work at first sight. This second and final part of the article aims to show that it is not
always as easy as it seems, using various program examples.

The property of termination of a program

The programming example used in the first part of the article was simple, but not simplistic,
because it contained a loop statement. Many research developments in theoretical computer
science either have to do with the fact that loop statements are necessary for programming, or
try to deal with the difficult questions they automatically raise. One such question is whether
a given program is always guaranteed to end and return a result. This property is considered
particularly interesting and is called "termination" of the program. However, it was not
difficult to check the properties of the loop statement (a 'while' statement) used in the previous
programming example. It is easy to see that the condition in the 'while' construct (x >= 2)
would eventually become false and cause the loop to stop, meaning that the program would
not run forever. Let us now consider the following C example instead:

1  for (n = 4 ; ; n += 2)
2  {
3    for (p1 = 2 ; p1 < n ; p1 ++)
4    {
5      p2 = n - p1;
6      if (is_prime(p1) && is_prime(p2))
7        break;
8    }
9    if (p1 == n)
10     break; // sum of two primes not found for n.
11 }

The function "is_prime" always terminates and returns "true" when its argument is a prime
number. The implementation of this function requires some ingenuity in order to make it
efficient for big numbers, but a simple implementation that repeatedly divides its argument
"m" by every number smaller than m and returns "true" only if none was a divisor, would be
enough for our purposes. Programming this function in C is left as an exercise to the reader.

The above program enumerates for each value of the variable "n" the even integers greater
than 2 (4, 6, 8, 10, etc.). For each of these successive values of n, the program first (lines 3 to
8) enumerates the various ways of splitting n into the sum of two integers p1 and p2. If this
split is such that both p1 and p2 are prime numbers, the program jumps to the next even value
for n (line 7). If no pair of numbers p1 and p2 satisfying this property, i.e., n == p1 + p2 and
p1 and p2 both prime, is found, the program stops (line 10).

This program is very simple, but in fact conceals a very difficult theoretical problem: deciding
whether the program terminates or not is equivalent to proving or disproving the mathematical
"Goldbach's conjecture", which at the time of this writing remains an open problem in number
theory and has been so for more than 250 years. In practice, this means that it is in the worst
case impossible to know whether a program will stop under any circumstances. Although the
previous program does not use any complicated programming features, the question of
whether it terminates or not is theoretically even harder to solve than Fermat's last theorem.

Loop invariants

Even when the termination of a loop statement is easier to establish than in the previous
example, the correct behaviour of the loop may rely on complicated hidden invariants. "Loop
invariant" is the name given to a property that holds throughout the execution of a loop. This
situation should be familiar to any developer who has ever had to modify code that he did not
write himself: the program works for reasons that are not obvious in the code, for example
because the value of a certain variable "p" can never be negative. The invariants are not
always explicitly written in the program (experienced developers write them in comments or
in assertions, but even with experience, it is not easy to think of all the invariants that should
be written down). Taking the example of the variable p that is always positive, the invariant
holds because one can check on each program statement that a negative value is never
assigned to p. The difficulty lies in the fact that it is not easy to reconstruct the invariant by
simply looking at the source code. Similarly to the situation of the termination of loop
statements, it can in fact be arbitrarily difficult.

Analysis of actual source code does not always require proving or disproving Goldbach's
conjecture, however. The above example is only intended to show the kind of difficulties that
make the problem impossible to solve in all generality. In real life and for ordinary programs,
invariants are often simple enough to be discovered automatically.

Example of a program that is difficult to analyse

Besides loop statements, real programs also contain many statements that are difficult to
handle because they do not fit well into the theoretical frameworks used for program analysis.
The particular set of statements that causes problems depends on the analysis framework, but
it typically contains some or all of the following: casts, union types, architecture-dependent
statements, pointers and aliasing, integer overflows, floating-point arithmetic. The following
example illustrates several of the above issues:

1  void shift_string (char *s, unsigned int src_offs, unsigned int size)
2  {
3    unsigned int dst_offs = src_offs - 4;
4    while (size > 0)
5    {
6      * (int *) (s + dst_offs) = * (int *) (s + src_offs);
7      size -= 4;
8      dst_offs += 4;
9      src_offs += 4;
10   }
11 }

This function makes particular assumptions about the computing architecture and the
compilation model in order to shift a specified part of a string "s" (starting at "src_offs") by
four characters. The function also assumes that the variable "size" is a multiple of 4, and that
the parameters passed to it do not make it access memory out of bounds or in a misaligned
way. For the purpose of the program analysis, we assume that these assumptions are well
understood by the developer, and that the bug we are looking for (if any) does not lie in these
assumptions. Proving that the above function satisfies its informal specification (shifting part
of the string "s" by four characters) is difficult. Besides all the low-level statements that the
analyser needs to understand (e.g., the conversion from "char*" to "int*"), there are several
hidden invariants that ensure that the function works as intended.

To perform the analysis of this program, let us focus first on one of these invariants: at the
beginning of the body of the loop (line 6), the current value of variable "dst_offs" is always
equal to "src_offs-4".

It is necessary to establish this invariant in order to prove that the function "shift_string"
operates as it is supposed to. For a 32-bit architecture, the property implies, in particular, that
there is no overlap between the four bytes read at the address "s+src_offs" and the four bytes
written at "s+dst_offs" (line 6).

The equality "dst_offs = src_offs - 4" can be automatically discovered by relational abstract
interpretation. The idea behind relational abstract interpretation is to compute information
about the relationships that are guaranteed to hold between the values of several variables, in
addition to sets of possible values for individual variables. Sets of possible variable values are
propagated from instruction to instruction in the same fashion that intervals were propagated
in the first part of the article. But "values" here means possible values for the tuple of
variables (size, dst_offs, src_offs). In the first part of the article, we restricted the "sets" of
values to intervals. Here, we also need to make a choice regarding the representation of these
sets of possible tuples, so that they can be computed in a reasonable amount of time and
memory. There is a wide selection of options when choosing a representation for sets of
tuples, but we are only going to consider linear relations between two variables (e.g.,
y=a*x+b), because this representation is space-efficient and simplifies the computations.
Without precise knowledge of the values of the arguments "src_offs" and "size", we know
that values that do not satisfy the above equality are impossible for the pair (src_offs,
dst_offs) after executing line 3. In fact, the values that are possible for the pair can be
represented as a linear relation between these two variables: src_offs = dst_offs + 4.

Given these values for the pair (src_offs, dst_offs), the condition of the 'while' -loop can both
be true or false, so we have to assume that this state may propagate to both the inside of the
loop at line 6 and the exit of the loop (line 11), which is also the exit of the function.

We will assume that the instruction on line 6 does not affect the values of variables src_offs
and dst_offs (this would theoretically require proof that the memory access "*(int *)(s +
dst_offs)" is not out-of-bounds). Line 7 does not change the relation between src_offs and
dst_offs either.

The instruction on line 8 has an effect on the value of dst_offs and, therefore, on the relation
between src_offs and dst_offs. Fortunately, the effect of this instruction on the relation
between src_offs and dst_offs can be computed. Better yet, the resulting state can be
represented as another linear relation, "src_offs = dst_offs".

The instruction on line 9 has an effect on the relation between src_offs and dst_offs, too.
Since this assignment increments the value of src_offs by 4, its effect on a state where
"src_offs = dst_offs" is to produce a state where "src_offs = dst_offs + 4" (since the new
src_offs is the same as the old src_offs plus 4).

The state of the variables at the end of line 9 cannot be used to decide whether the condition
of the 'while' loop (size > 0) is true, so this state must be propagated again to both the exit of
the function and the beginning of the body of the loop. This state, which is identical to the
state that occurred after executing line 3, has in fact already been propagated to the exit of the
function and the beginning of the body of the loop. This means that this state is a fixpoint of
the program (i.e., a state that leads to itself when the body of the loop is applied to it), and that
the program analysis is finished. The analysis has determined that the property held, because
the fixpoint found represents the invariant we wanted to prove.

The choice of the kind of sets that are used determines the types of programs that can be
analysed accurately. In the above example, linear relations are a good choice because:

1. Some variables in the program are in a linear relation to each other, and
2. The instructions executed by the program (particularly, at lines 8 and 9) transform a linear
relation into another linear relation, which can also be represented.

What else is necessary to analyse this program?

The other important requirement in order to be able to analyse this program is a memory
model. Actually running the program would have allocated the string "s" to somewhere in the
memory, and it would be possible to analyse the function "shift_string" with respect to this
particular memory allocation (in this case, a particular choice of address where "s" is stored).
But this is not satisfactory because the goal of static analysis is to check that there are no
errors for all possible executions, which means not only for all inputs, but also for all
addresses at which the system decides to allocate string "s". If two strings are allocated in the
program, one goal of the verification process would be to check that one string is not modified
via an out-of-bounds access to the other. This error may manifest itself for some allocations
while it may not be visible for others, making it hard to find or debug solely by testing the
program.

For this reason, the static analysis of a program proceeds with respect to an abstract memory
model. An abstract memory model allows manipulation of a synthetic representation of all
possible choices for the layout of a program's data in memory. As with the choices that must
be made with respect to the representation of relationships between variables, there are
compromises in the design of an abstract memory model. It goes beyond the scope of this
article to delve into all the various possibilities. The purpose of an abstract memory model, on
the other hand, is simple: the abstract memory model should make it possible to analyse a
program without having to choose a particular memory allocation. Instead, when the memory
model is appropriately devised, a program property proved in the abstract memory model
holds for all the possible allocations that can occur at run-time. The adjective "correct", which
was applied to analyses in the first part of the article, is applied to abstract memory models
which have this property. Of course, one needs a correct memory model in order to build a
correct analyser.

In the previous example, it is the task of the memory model to ensure that the instruction on
line 6 does not have any effect on the relation between src_offs and dst_offs. Because the C

programming language is a low-level language, an out-of-bounds access does not necessarily
translate into a run-time error. Depending how the program's data is laid out in memory, the
memory access could also silently change the value of another arbitrary variable. A
reasonable memory model should exclude this possibility, and instead emit a warning
concerning the out-of-bounds access, if one might exist.

To analyse this example precisely, the memory model also needs to describe the fact that an
"int" variable takes up the same space as four "char" variables (on most current computers,
integers are stored on 32 bits while characters require 8 bits), and what exactly happens when
a "pointer to char" is cast into a "pointer to int", and subsequently accessed.

The Frama-C toolbox

The Laboratory for Software Safety at CEA LIST (Paris, France) is developing Frama-C, a
toolbox for the static analysis of C programs. This toolbox is designed to enable different
analysis techniques and features to be plugged in, and used with each other. One of the
distinguishing features of the Frama-C toolbox is the specification of the memory model,
which is especially adapted to the analysis of embedded (low-level) C code. Frama-C has
been used to analyse parts of the Xen hypervisor in the context of the OpenTC project.

Editor's note: An article about the analysis of the XEN hypervisor will be published in a
future issue of the OpenTC newsletter.

About the author: Pascal Cuoq has been playing with computers since he was 6. He is
currently doing so at CEA-LIST, within the Software Safety Laboratory, where he is one of
the architects of Frama-C, an assistance tool to improve confidence in critical C code.

Contact: pascal.cuoq at cea.fr

Reference:
[1] Static analysis using Abstract Interpretation. In: OpenTC newsletter of September 2007:

http://www.opentc.net/publications/OpenTC_Newsletter_02.html#link3

Recent OpenTC publications

Since the publication of the last newsletter, the OpenTC project partners have produced one
scientific publication:

- Murray, D.; Milos, G.; Hand, S.: Improving Xen security through disaggregation. In
Proceedings of the 4th international conference on Virtual Execution Environments (VEE
2008), March 2008 (to appear).

Edited by the Institute for Technology Assessment and Systems Analysis,
Forschungszentrum Karlsruhe, Germany, on behalf of the OpenTC research project
consortium, in co-operation with all partners.
Editor: Arnd Weber, Forschungszentrum Karlsruhe GmbH, ITAS, Hermann-von-Helmholtz-
Platz 1, D-76344 Eggenstein-Leopoldshafen, Telephone: + 49 7247 82 3737.
Contact: editor at opentc.net

Disclaimer: The views and opinions expressed in the articles do not necessarily reflect those
of the European Commission and the consortium or partners thereof. All articles are regarded
as personal statements of the authors and do not necessarily reflect those of the organisation
they work for.

The OpenTC-project is a research project supported by the European Commission, project
IST-027635. Its 23 partners are: Technikon Forschungs- und Planungsgesellschaft mbH
(project coordination, AT); Hewlett-Packard Ltd (technical leader, UK); AMD Saxony LLC
& Co. KG (DE); Budapest University of Technology and Economics (HU); Commissariat à
l’Energie Atomique – LIST (FR); COMNEON GmbH (DE); Forschungszentrum Karlsruhe
GmbH – ITAS (DE); Horst Goertz Institute for IT Security, Ruhr-Universitaet Bochum (DE);
IBM Research GmbH (CH); Infineon Technologies AG (DE); INTEK Closed Joint Stock
Company (RU); ISECOM (ES); Katholieke Universiteit Leuven (BE); Politecnico di Torino
(IT); Portakal Teknoloji (TR); Royal Holloway, University of London (UK); SUSE Linux
Products GmbH (DE); Technische Universitaet Dresden (DE); Technische Universitaet Graz
(AT); Technische Universitaet Muenchen (DE); Technical University of Sofia (BR);
TUBITAK – UEKAE (TR); and University of Cambridge (UK).

For more information about the project, see:

http://www.opentc.net

Feedback to the consortium:

http://www.opentc.net/feedback

Archive of newsletters:

http://www.opentc.net/newsletter

Subscription: To subscribe or unsubscribe to the newsletter, write an email to <subscribe at
opentc.net> or <unsubscribe at opentc.net>.