Notes from York September 2004 Meeting
Tuesday 7th September
Present:
Jim Woodcock, Steve King, Fiona Pollack, Susan Stepney (York)
John Derrick (Sheffield)
Richard Banach, Czeslaw Jeske (Manchester)
Adrian Hilton (Praxis)
Mike Poppleton (Southampton)
David Crocker (Escher)
Norbert Völker(Essex)
Ana Cavalcanti, Marcel Oliveira (Kent)
Phil Clayton (QinetiQ)
- 15:00: Welcome, introductions
- 15:15: "Mutating the Purse" (Susan Stepney)
- Basic security properties (SP): "No value created",
"All value accounted", "This transfer permitted". The
SP comprises functional properties which are preserved by refinement.
Modelled in Z. An abstract model was a world of purses
with atomic value transfer between purposes. So easy to show SPs
hold in this model. Also a "lost" component which models where
money goes that hasn't been transferred properly.
Concrete model very different - atomic transfer hard.
There is a value transfer protocol with logging pratocol to
account for failures - the protocol can be interrupted at any
point so the proof must show that nevertheless the model
satisfies the SPs. Also an "ether" of protocol messages
in-flight.
Main challenge: requirement for backward refinenet rules,
because there is non-det in abstract and concrete model, and
since later resolved in concrete than abstract it shafts the
normal Z refinement. Also, an infelicity in the model means
that retrenchment would be applicable: the protocol and ether
were fixed already by the customer and couldn't be tweaked to
make the proof work. You could retrench the finite sequence and
exception logs, the message hash function (not quite injective) and
a balance enquirty special state. Back then, they got around these
problems with a constructive approach to reality.
A formalist might say "don't need retrenchment, I'll change
the spec". But the customer says "no, we've already implemented the
spec" so you can't change it. Or "We must do it to a particular standard",
"there's no time", "we need to use the spec for a wide range of platforms and
this makes it non-general", or "the spec will be too baroque for others".
Common feature: "oh no you don't"!
Definition of a real world problem: you can't change it when
it becomes too hard.
Now: the purse in CSP.
"Challenging formal specifications by mutation: a CSP security
example" - paper (see Susan for details).
Here we have a security property of a secure system and an
abstract model which we want to show satisfies the spec. It's
a secure system with a repeated value transfer, ended with a finalisation
step. At finalisation you spit out your value v minus a random r and
r itself. Secure if, no matter what value is lost, it is accounted for.
The abstract purse spec manages transfers in various ways, then
at finalisation outputs its values. 3 types of purse: From, To and Other.
A From Purse can correctly transfer money, lose the value or ignore the
transaction. To Purse works similarly. Other Purse does nothing.
We wrap the whole thing in an ObservedSystem wrapper so that we can
talk about the security property; we are also (later on) interested in
the internal behaviour of the purse System.
ObservedSystem is the system of purses plus a process to gather
the finalisation totals, with internal channels hidden. ObservedSystem
has security property SecureSystem i.e. does not create value, accounts
for all lost value. This was checked with FDR2.
Now what? Extend use of program mutation into spec techniques.
Program mutation is used to test how good a test suite is. It measures
how much coverage the test suite gives you. Do a mutation (small syntactic
change) to the program, and see if the mutated program passes the test.
If caught, great. If passes, a modelled fault isn't covered by the
test suite - if original and mutation programs have same behaviour, no
worries (equivalent mutants) but if different behaviour the harness has
failed. Well known technique. So what about specs?
We have property P ("is secure"), spec S (secure system). Want to show
that S sat P. Mutate S to S' with small syntactic change. Check if S'
satisfies P. If not, modelled fault is a single-point vulnerability in
the spec. If S' sat P then these are equivalent mutants again. If
internal behaviours are different, both systems are secure. So this
reflects a design decision (you could write the spec different ways).
If internal behaviours are same, why so? Has an abstraction been missed?
Applied CSP mutation tool with mutation operators and checks they
are syntactically correct (and usually type correct). Then use FDR2.
Did 579 mutant specs, giving: 241 compilation errors, 177 not trace refinements,
156 trace refinements but failures-divergences violations, 23 FD refinements
(equivalent mutants) with 20 different internal behaviour, 3 same internal
behaviour. Note that these mutations were single-point changes; there is
clearly scope for more broad multiple-place changes.
Equivalent mutants that are also refinements: in most of the cases
it was just restricted behaviour resulting from strengthened guards e.g.
sometimes/always ignore the transfer request. One interesting case
was when a mutant transfers to other purses rather than the To purse -
shouldn't be secure, should it? But FDR claimed a refinement. That
was because checks were in a system of only 3 purses! When you introduced
a fourth purse ("Other2") into the system, no longer a refinement. So
that validated the model-checking choices, and suggested that testing should
be done in a system with more than 3 purses.
It was quite a powerful validation tool. Made specifiers ask
different kinds of questions, challenging design decisions and identifying
potential security vulnerabilities in the spec. A robust design is when
small implementation errors don't break security.
- 16:00: "Recasting Mondex" (Jim Woodcock)
- Context of the Grand Challenge in Dependable Systems Evolution
which requires elucidating and developing basic scientific principles
that justify confidence in the correct behaviour of software, even
in the face of extreme threats. Part of the work is to try out a
strong software engineering toolset on representative challenge
problems - like Mondex.
Problem needs to be comprehensible to man on the Clapham
omnibus, with a simplified version publically available, formal
spec, explanations, theorems, rigorous proof, pushes the state of
the art and includes two good competing tools (ProofPowerZ and
Z EVES), and has an active interested community.
Observations - tension between exploring the theory (academic)
and getting the job done (industrial). Very successful project,
on time and budget, first to ITSEC E6.
Recent investigations suggest that rigorous proofs don't formalise
easily - they get to the heart of the problem, but it's the edges which
are the hard bits! Get to the start of the complete proof, then relate
the completed proof to the rest of the spec. One small team spent a
week with ProofPowerZ trying to formalise the Mondex Z spec. Very slow
progress! Steve King did an independent proof of the verbatim spec given
in the monograph. Respect. The reduced functionality in the monograph
(reduced by the sanitisation) suggests that Hoare's Law: "inside every
large program is a small one trying to get out" applies.
Why is formalisation difficult? Some language constructs are
different: finite sets and functions (repeated proof of finiteness),
definite descriptions with exactly one value, schemas. Specification
structure isn't always helpful for this. The promotion structure isn't
reflected in the proof structure. Promotion explains to the reader, but
not to the theorem prover. Linking the informal theorem to the spec text
is a bit tenuous.
Conjecture: Mondex can be restructured so that the proof can be
mechanised with a high degree of automation. How far can we get with
two days of work? Formalise spec in Z/Eves, compare with ProofPowerZ and
formalise refinement.
Recasting Mondex: the properties are very simple e.g.
all value accounted for is just credit+debit = credit'+debit'.
Summing over a series is pretty simple, so restructure around a
pair of sequences. Two-step update from-to, making the update as
simple as possible, develop a theory.
(Example def'n in Z) "update" is a sequence of integer triples mapping
onto a sequence of integers. NB: Z/Eves requires you to prove source set in
function domain for partial functions, which is annoying, so work on total
functions for now. "sum" is sequence of integers mapped to an integer.
They are defined inductively by empty, singleton, n-ary.
Summing the updated sequences - prove the obvious theorem on
the sums of any updated sequence being the same as the sum of the original
sequence minus the old value plus the new value. In Z/Eves, set up a set
of all sequences with the required property and then show that the set of
integers is a subset of that set. Empty and singleton are easy.
The inductive step is quite hard; the theorem prover isn't good at
arithmetic and is really bad at generic types. Some lemmas needed,
needed to prove that a bijection is finite, that set size will always
return a natural number. The final step is a substantial proof by cases.
Disappointing - over 50 proof steps to prove an "obvious" theorem.
Why bother? Better to redefine "update" by using the "sum" property,
and use this as the starting point. It's a perfectly reasonable
implicit definition of "update" (debatable...)
Bit of tension between uses of a spec and the usability for proof
since such an implicit definition may hide efficiency, attainability
issues.
Does this help? A crucial theorem is "doubleUpdate" - update
one by subtraction and one by addition, and show the sums are the
same. Nails about 15 proof steps.
Smart cards are indexed by name not number so introduce a
map - does this hinder things? No. "No value creation" is proved
fairly easily. The transfer function domain check is OK. Now
try to prove no value creation. Nearly fully automatic, problem
lies in summing - need to show that the sum of every sequence is
natural. We didn't originally restrict the domain of "sum" because
of all the domain checks that would result. Now we need to do the
check we postponed. A bit tricky, but comes out OK. Everything else
comes out OK, although not fully automatic.
Final proof was 10 defs, 15 theorems, 20 proofs (some paras have
domain checks and therefore need extra proofs). About half were proven
via rewriting (no ingenuity). 15 were proven via apply/use which
uses a specific proof rule from library or hand-crafted lemma (needs
ingenuity), 10 using other special tricks (mostly ingenuity). Most
ingenuity was applied in the domain checks.
Two day's effort was therefore enough for a radical recast
of Mondex, making it much simpler: full refinement proof may be possible.
Generic types in Z/Eves were an issue, but Leonardo Freitas's library
of results would have helped. The job got done by exploring the theory.
You're forced to tackle spec complexity when you use machine techniques.
How much should the theorem prover cart pull the horse of specification
style? Should we avoid certain techniques in specifications to ease automatic
proof? Not necessarily, but need to evaluate how useful specific techniques
are.
Issue of library of proofs - how to distribute, how to know the
result you want has already been done.
- 17:10: "Another Tale of Two Proofs" (Steve King)
- Original Mondex work: rigorous proofs done by hand; experience
is that current (1995) proof tools aren't yet appropriate for a project
of that size.
Long-term goal to mechanize in ProofPowerZ the proofs in the
published Mondex spec and design with as few changes as possible.
Short-term goal over 2 months study at QinetiQ Malvern: learn about
ProofPowerZ and start on long-term goal.
Motivation by Grand Challenge 6: Dependable Systems
Evolution, and personal interest in automated theorem proving.
Results? Now a reasonable understanding of basic use of
ProofPower for proving Z conjectures but much more will be needed.
Proved that the 3 abstract operations maintain certain security
properties. The original 2.5 pages of proofs in the specs became
15.5 pages in the automatic theorem prover output, including lemmas.
Have started refinement proofs: A [ B (100 pages) and
B [ C (30 pages).
Now we're starting to see whether the proof structure in
the monograph is reflected in how it's done in ProofPower.
Some significant changes were made to published text:
missing domain checks were added and proven, and some schema
quantification in function definitions were changed for ease of proof,
inconsistencies between operations were fixed (caused by the
sanitisation process for original publication).
Lessons learned: ProofPower-Z was easier than expected to
use, but documentatioon on basic use could be improved. The
sanitisation process isn't easy, with an empty schema (caused
by hiding all components) and a component where two similarly
named components were merged. For real proof, size+res of screen
display is important - don't use a regular laptop! Maybe multiple
screens if you have multiple windows? Need to be able to refer to
on-line manual. Mechanical theorem proving is actually fun.
Future plans: continue work on refinement proofs. Can we
maintain, or improve, the hand proof structure? Compare with
Jim Woodcock's work with Z/Eves. Long-term, automate the proof
not just mechanise it.
Publications: see
Susan
Stepney's page on the topic.
Does pushing the buttons on the proof tool give you the same
level of understanding of why the proof works as would proving
it by hand? The problem is that the bits it can't do aren't the
interesting bits. But automation is a real win when you have
multiple reworkings. Perhaps it's at points of proof failure
that one learns about the proof.
- 19:30: Dinner
- La Piazza in York - a good evening out. We can recommend
the risotto and tiramisú.
Wednesday 8th September
- 09:30: Where Refinement Fails (John Derrick)
- Example 1: infinite buffer implemented by a bounded buffer
Insert operation very simple in the infinite case (append to sequence)
but more complex in bounded case (handle case of full buffer)
Example 2: Adder adds any number, implemented by an adder
of a number from a bounded type (adds modulo the type size)
Other examples: square root with a real specification but
floating point implementation
Noted that there is a general theme of an idealised behaviour
but reality isn't as ideal. Reminiscent of fault tolerance - think
of ideal behaviour, and at a later stage of development have an idea
of what faults are possible and what happens after a fault. It's not
a refinement, but rather a development. It really is the way that
we design systems in industry.
- 09:45: Retrenchment
Tutorial (Richard Banach)
- Motivation: came out of model-oriented refinement.
Finiteness gets in the way of your development but you don't
want to worry about them up front.
Example: adding an element to a set. Concrete world refines
the set to a sequence, and an implementation where it is turned
into a (finite) array. Oops, at that point typical refinement
proof obligations can't be met. So you typically have to go back
up the refinement path making everything finite. Tedious.
Large systems development is a good example of contrast
between textbook and real world. The nice steps of refinement
go awry between the true abstraction and concrete abstraction. The
concrete abstraction becomes impenetrable. Scaling up makes
it worse and worse. Typically exponential difficulty as everything
tends to interact and so the various finitisms interfere.
The "ferocity" of the refinement proof obligations (POs) is
too high and restricts their application. So attempt to judiciously
weaken them. But refinement is built up on the assumption that you
can substitute concrete for abstract. Fiddle with this and you can
break the whole thing.
We are prepared to forego these aspects of
refinement in order to fix the problem. The refinement "square"
mapping abstract to concrete and input state to output state gets
tweaked. You add an "escape clause" which says that if you can't
establish the retrieve relation between abs and conc outputs then
there's another way around it - your retrenchment, a concrete
operation. It can be anything, but you need to be careful what
it is in practice.
The retrenchment definition is complex, includies: abstract
and concrete operation sets, a retrieve (glueing) relation that
tells you how the state spaces correspond, the
within (provided) relation, the output relation, and the new "concedes"
relation which says what happens when things go wrong. Then you
have the usual initialisation proof obligation, then for every
operation a fairly complex proof obligation. This latter is the
starting point for investigating retrenchment.
Main differences from refinement: doesn't guarantee concrete
is faithful to abstract, but rather documents the model change in
a structured way. Many more properties of the model are addressable.
Proof obligation is much more liberal because you have a place to
put the inconvenient facts about what you're doing. It's a weaker theory but
is a generalisation of refinement so has wider applicability.
Refinement tends to expel design and risk from the process of
transforming to concrete. The abstract and concrete models are tied
together strongly and can become monolithic - you usually have to
develop the two together. Retrenchment can embrace design and risk,
separating concerns as required. Rather than "black box" it's
a "glass box". You can develop abstract and concrete model independently
and worry later about how to tie them together. If you discharge a
refinement PO then you've made some progress. If you discharge a
retrenchment PO you don't necessarily know if it's going to be useful
or necessary later on.
In the real world, you retrench from true abstract to concrete
abstract, then refine properly to a concrete implementation. The
"autopsy" gives a rationalisation (to domain experts) of the structure
of the model.
Now look at the key issues and opportunities of retrenchment.
You need to define the algebraic theory of retrenchment and refinement -
Czeslaw's PhD covers this area. Composition: if you have two retrenchment
steps, one after the other (abstract to concrete to implementation) then
can we compose to a single retrenchment? Yes. Retrieve relation is just
the relational composition. The within relation is more complex, strengthened
by the retrieve relation. The output relation is a straightforward
composition. The concedes relation is nastily complex and clearly shows
the exponentiality. In refinement you'd quickly be overwhelmed by the
"grubbiness", but retrenchment makes you able to define a simple abstraction.
This particular case is "plan vanilla" vertical composition. There are many
stronger forms.
Horizontal composition is harder. What about when the final configuration
of one step doesn't satisfy the retrieve and within relation of the next one?
It turns out that the best way to study this is with the simulation
relation - replace the implication in the proof obligation with a
conjunction, and get the set of abs-conc in-out "squares" that fit
your problem. This is still under study, but there's a forthcoming
publication in SAFECOMP showing a nice fault tree extraction from this.
"Retrenchment is like refinement except round the edges"
There is a more worked-out horizontal composition that Mike
Poppleton studied for his PhD. There is an abs/conc retrenchment given
by sequential composition of operations, within relation with an initial
PO and retrieve relation and a wp relating to the operations and second
step. The output relations are sequentially composed, and the concessions
as before. That works, is associative and relatively clean. It can
capture "bulk sequential composition" of operations but is rather
stronger than the simulation relation.
Dataflow composition is a simplification of sequential composition;
ignore the state, just plug outputs into inputs and it is much better
behaved than unrestricted horizontal composition. The within relation
is just the original within relation, the output relations compose
sequentially, and the concede relations are the expected combinations.
Synchronous parallel composition is done by pairs of variables with
a similar structure as before.
"Fusion composition" is where you have several different retrenchments
between abstract and concrete. This can be done with conjunctive composition
(not too bad) or disjunctive composition (simpler).
In all these kinds of composition, the key issues are: does the
notion compose to give a retrenchment, does this retrenchment satisfy
the special conditions that you have in mind, and is the notion associative?
There are all sorts of strengthening of compositions where the second
and third questions aren't easy to answer.
Decomposition mechanisms have been examined by Mike Poppleton
recently. Given a retrenchment, suppose we can decompose the
domain of the operation into some disjunction; then you can
retrench the individual operations from the disjunction list.
You could recover the original retrenchment by doing fusion composition.
Algebraic theory of retrenchment and refinement:
Try to factorise an abs-conc retrenchment into a retrenchment to something
universal and then a refinement to concrete (or vice versa). Then
you look at the family of possible factorisings. This was Czeslaw's
M.Sc. The details are horrendously complex, but have been nailed down
and do work on real examples.
You can also look at "completing the square": suppose you have
a refinement from abs to conc, but then you retrench the abs, can you
refine that retrenchment to something concrete which is itself a
retrenchment of the original concrete solution? This gives you the
refinement of new requirements automatically.
It's surprising how complicated it all turned out to be.
You wouldn't have expected this, but that's how it is. It's
been tested on examples and seems to give the right answer, which
is encouraging.
What you ideally want in such developments is the
stepwise simulation of fragments, but you don't always get it.
Retrenchment encourages this form but can't guarantee it as
the final configuration of one step need not satisfy the guard
of the next one. You could add strengthening assumptions, but
that misses the point. Retrenchment is mainly for cases where
this induction fails, so we deal with simulation directly.
Representing these sequences you get "slanted box" patterns
that represent development steps where you sometimes
dodge around various situations in abstract and concrete worlds.
You get some nice simulator transformers for properties, acting
like the box and diamond operators in a modal algebra. But the
mapping of things you can't simulate is completely unconstrained
and you have to address this somehow.
Probabilistic retrenchment - you define probabilities of
taking various transitions giving you a certain output, which
sum to 1 (of course). You can work out 1-step and n-step
probabilities of reaching a certain step. The underlying principles
are well known from probabilistic transition systems.
How does this formal stuff fit in with engineering practice?
How should individual bits of retrenchment affect practice, and
how should industrial practice impact retrenchment details?
For the former, you can generate checklists to see what you should
be covering in industrial practice corresponding to the various
within, retrieves, concedes relations. For the latter, consider how
retrenchment remit can be widened to take on board non-functional
requirements.
"Retrenchment aims to make quantitive the system engineering
process as it is at the moment."
- 11:30: Retrenchment
and Mondex (Mike Poppleton, Richard Banach)
- Mike defined the retrenchment rules for schemas, in a similar
way to forward simulation.
Look at the salient aspects of the Mondex model and four
examples of opportunities for retrenchment.
Mondex model: plug two purses into the wallet, type in instructions
and the money flows as required. Purses are on their own, assume a
hostile environment (e.g. spoofed protocol) and the total balance must
always be in favour of the bank despite the protocol being halted at any
point.
The original Mondex refinement, presented (sanitised) in PRG-126
monograph glosses over a number of things. Purse numbers are finite, not
infinite; the balance enquiry operation interacts badly with resolution
of nondeterminism during money transfer; the purse log is finite not
infinite; the log archiving relies on a noninjective hash function to
validate clearing of logs.
Model architecture: Abstract model (a single requirement)
B-refines to Between (concrete and global invariants), F-refines
to Concrete (concrete with infinite domains). We then retrench
F-retrench to the Discrete world (concrete with finite domains),
and do an F-ret from Between to Effective (lifted finite concrete)
and F-refine from Effective
to Discrete, then Identity map to Filtered model and B-refine/B-retrench
from Filtered to Effective.
Sequence number: each concrete purse has a sequence number. In
the Concrete model it's a natural unbounded number. In reality and
Discrete model it's no more than BIGNUM. Model 0..BIGNUM as a subset
of Z naturals so all relations and arithmetic is available if it delivers
an in-range answer. Example: the IncreasePurseOkay operation; simple
in C model, more complex in D model. Data is effective the same but the
predicates are more complex in D; refinement case, plus the exceptional
case (if the next sequence number is outside 0..BIGNUM, send a message
saying that the operation is blocked). You can also rewrite the
single operation scheme into three schemas - two cases plus the
decision schema which is an OR of the other two schemas.
Retrieve relation here is a straight equality of purse data.
Within relation is just true. Output relation is equality of outputs.
The concedes relation is that: the unmanipulated data in C and D is the
same, the C sequence number is no less than the D sequence number,
the C message is bottom and the D message is "blocked".
The D world assembles the purses which are indexed by elements
of set DNAME. The normal schemas are defined. Promotion in the
retrenchment context is more difficult than in refinement, because
not every component (purse) can be guaranteed to be working. So
you can get a NAME-indexed family of retrenchments, each referring to
the retrenchment of component NAME within the whole. Eventually
you can disjunctive-compose these retrenchments to get a full
retrenchment.
We defined world-level retrieve, within and output relations for
the Concrete-Discrete model development step. Not sure that the
way chosen is the only way, but it seems to work OK. Within
relation is equivalence of names plus an instantiation of retrieve
relation, which is a common trick in retrenchment. Concedes relation
at world level says that the unaffected data remains the same, the
concrete sequence may be greater than the discrete sequence, and
the message output will change accordingly. You can also choose
other retrieve relations which e.g. give up as soon as one purse
concedes, or tracks which purses are still good.
Lifting can be done when you have a retrench from B to D,
e.g. via model C, and want to lift it to the E model. When
you unravel all the predicates you end up with a schema which
expresses the various possibilities of failures. Filtering
to the F model; you can use backwards refinement to get back to
the A model since the only difference between C and D is the
extra "blocked" message and you can just throw it away.
The purse just blocks if the limit on sequence number
is reached. Is this reasonable? Assume you use identical
distributed random variables from some probability distro.
Analyse usage frequency and times, test values for BIGNUM and see
how long we can go. More sophisticated analysis notes that
the increments are arrivals of a renewal process - go read
up on these in the text books, get the equations and plug in
the numbers. This gives you 4000 years until run-out at
100 transactions a day in first-order. Variance is order of
a week over 4000 years, which means we're happy there's minimal
chance of people getting "lucky".
- 12:10: Refinement with Approximations
(John Derrick)
- This is about converging to the idealised behaviour.
Retrenchment measures this via a predicate.
Example: a garbage collector. "Collect all dead cells
immediately" does it too often, "collect periodically" may
be none ever etc. You might try to make a precise version
with some unparametrised variables. "Collect n dead cells
every 1/m seconds" where n,m go to infinity. Refine,
retaining n and m; compromise by picking specific values.
Why compromise late?
Example 1: bounded buffer. Implement by a bounded
buffer for finite size (say 256). Where it fails to be
a refinement doesn't relate to the inputs but rather to the
state.
Example 2: adder. Implement by Add-N where N is bounded,
refine to ModAdd-N and instantiate N to maxint. Here it fails
at refinement depending on input data not state.
Look at sequences of specs with a limit, define metrics
and limits wrt different criteria. There are four types of
step in the development process: element-wise refine (refine each
sequence member Sn uniformly), introduce sequence (replace S with
a sequence {Sn} such that S is the limit of the sequence),
instantiate parameter and one other.
Example for adder: introduce sequence {Add-n}, elementwise refine
to {ModAdd-n} and instantiate n to maxint, compromising.
Metrics: idea is to define a distance between the limit and
each sequence element, i.e. an approximation. Define the distance
function d(x,y) with the usual distance functin rules. The limit
of a sequence is defined as the point of convergence wrt a metric
and converges as per the usual definitions with shrinking epsilon.
First metric: program length. Assign a distance to specs which
agree on observations up to a certain length (say 2^-N where N is
your longest match). Two specs are "close" if it takes a long
time to tell them apart. It's certainly hard to calculate! Can
you show that it's symmetric as metrics must be? Think so.
Buffer example: shortest program that can observe that Buf-n is
different from Buf has size 2n+2: add first n+1 elements (last one
is the first ignore), then n Remove operations are successful and
the next Remove fails. So d(Buf-n,Buf) = 2 ^-(2n+3) so Buf-n tends
to Buf as n tends to infinite.
Changing the finite buffer spec we get a different distance
but still get convergence to the unbounded buffer. E.g. if we
insert and remove from the same end then we can see the difference
more quickly.
Need to add an observer to the Add example, to make the
output visible. Sequence Init;Add;Ops is enough to see the
difference - if the input is big enough. This metric stresses
quantification over programs at the expense of quantification
over inputs/outputs.
Alternate approach for input/output difference: define
d in terms of the maximum distance between constituent operations;
this distance between operation in terms of an asymmetric distance
based on applicability (to inputs) and correctness (of outputs).
Return the ratio of failures to size of input domain.
Defined as example for the Add: d(Add-N,Add) still tends to 0
which is fine.
Can we do this with infinite data types? Looks hard
but it would be useful.
How to manage arbitrary i/o types? Avoid the problem!
Recognise the nature of approximation of implementation.
You really will use a bounded data type when you implement
it, so why not look at how things will be implemented and
count them in that domain. Measure failures ratio on
size of (bounded) implementation set. But this depends on
the implementation, and you'd ideally like a metric independent
of implementation.
Open questions on the relation between these metrics and
other CompSci-type metrics, basis in terms of data refinement,
topological characteristics, alternatives. Currently working
to define "uniform" in the definition of element-wise refinement
(apply refinement uniformly to each Sn).
- 13:50: Mondex in Perfect Developer (David Crocker)
- Put together the abstract model and two security properties
in Perfect developer. Purse class with two state variables
balance and lost, a function on sufficient funds and schemas
for operations. Made purses a sequence rather than a mapping -
Perfect Developer is better at sequences than mappings at
the moment. Last class represents the abstract world, with
functions observing total balances of all purses and the
total lost from all purses.
Can't yet do non-det choice between schemas in PD so
have combined three original schemas into one. Put the
security properties in as assertions in this schema (no
value created, all value accounted for).
Generates 23 proof obligations: 2 security properties
and 21 domain checks. All these are proven automatically;
all of them very quick apart from "all value accounted for" -
that takes about a minute. Took about 2 hours to get
working from scratch.
Effort in moving from sequences to mappings: quick in
rewriting the model, but prover needs extra rules adding in.
Noted that most proof tools omit schemas, and PD's schemas
are quite nice.
- 14:00: Refinement and Security (John Clark)
- What is security? Used to drive formal methods
research. Example: "stop people reading what they shouldn't"?
It's a start, but not the whole story. First model:
Bell La Padula. Security level N may not read a greater
security level ("no read up"), or write to a lower security level
("no write down").
But not without its problems. In practice, objects lie
outside the model, and access is not policed. File locks
and names, for instance, can act as information conduits for
information transfer. NB: even classifications have classifications!
"Top Secret" is not top secret, but "Eyes Only: Foo" may be top secret
(existence of security level Foo).
Covert channels - computers provide channels for inter-
subject communication, e.g. files. They are intended to
allow communication and access to them can be policed by the
system. Covert channels are unusual means of communicating
information. They arise because subjects share resources e.g.
CPU, memory. The resources allow one subject (the transmitter) to
modulate some aspect of how another subject (the receiver) perceives
the system. Usually grouped into storage channels (I affect
the data you see) and timing channels (I affect when you see data).
Bell LaPadula's idea "System Z" - start in a secure state,
where no-one can read a document above their clearance. Make
only transitions that preserve that property. Do you maintain
security? Not exactly. You can downgrade everything to the
lowest classification and allow anyone access.
A better model "Non-interference" by Goguen and
Meseguer. Actions by high level subjects should not be visible
to lower level subjects. It is formulated in terms of trace
validity: if trace t is valid, trace t with all high level
security events *missing* must be valid too. Generally
considered too strong to implement effectively. In practice,
ruling out every bit of information flow is not the aim.
We may wish to allow communication but only via specific channels
(conditional non-interference).
Formal methods: in the maths, f(x) just is. But in computation
you need to calculate f(x) on real hardware, consuming power.
The amount and pattern of intellectual work you have to do
varies according to the data you're multiplying. Smartcards,
for instance, often leak info in terms of their power consumption.
The first refinement problem - non-functional characteristics
like resource information (power, time taken) leak information.
Is this what security is all about? No. Confidentiality
is only one part of security. Also: integrity (if we know what
this means), availability, anonymity (not usually just all-or-nothing),
accountability / non-repudiation.
Breaking the Model: in the relational model of refinement,
the finalisation can break security. It is the process of moving
from the abstract or concrete world of programming variables to
"real", but still modelled, world. It says how you interpret
programming variables. E.g. a clock that is 5 minutes slow -
finalise to make it right by adding 5 minutes. If the clock is
stopped, no finalisation view ("glasses") can make it right.
Finalisation is a great way of passing information!
Would be nice to restrict finalisation, but in practice
you need to send concrete things e.g. bit streams, voltages etc.
So forbidding the problem merely moves it somewhere more covert.
Instead, use concept of finalisation to categorise and expose
covert channels.
Vary your finalisation glasses: remove the glasses
(e.g. view page faults, interrupts, power, timing), enhance
your view - post-process by viewing sequences of outputs and
do statistical analysis, multiple viewpoints - use multiple
different glasses, invasive - break into the device.
Vary the system: perturb it by adding load, irradiating it,
flexing a smartcard; passive perturbation where the card is
worn out or faulty; multiple systems allow differential analyses;
heterogenous multiples where "sameness" depends on the observation.
Vary the environment: heat or cool it, modulate power supply.
Cold can lower noise and defeat blinding defences. At -60C RAM
doesn't lose information as quickly as you might think.
Examples: fault injection, bit flip breaking RSA (intended
finalisation, single system, perturbed); differential power
analysis of smart cards (unintended finalisation, enhanced, single
system); static RAM persists for minutes at -20C (unintended
finalisation, single system, perturbed environment).
Prevention: enforce use of the glasses, e.g. by a protective
screen. Implement "magic eye" glasses when there is noise
without them. Detect unwanted finalisations (mainly invasive ones)
and then destroy the secrets.
Finalise the user - dynamics of hand signatures, text analyses
for authorship; this gives anonymity vs authentication tradeoffs.
Destructive quantum finalisation - look at something and it breaks,
essential in various security schemes.
- 15:00: Closing
- Location for next meeting: Kent or Sheffield? John Derrick
to investigate. January some time, call for talks etc. to come out
soon. Based on Foundations.
1-day refinement workshop co-located with ZB in Surrey
(April 14th or so). Exact date to be agreed with Steve and Helen.
Maybe get ENTCS publication of the papers, journal special issue
with extended versions. CFP date - deadline later than the ZB
one.
The meeting after that - Manchester in July perhaps?
September in Southampton for RODIN 1-year anniversary perhaps?
Topics might be selected from OO/UML, distributed and concurrent
systems, integrated notations.
Expenses for this meeting - expense claims to John at
Kent, mail John for a copy if we haven't got one.
Website: add new links page, add tools page with notes about
how tools are used with refinement (and citations if appropriate).
Full names on Partners page.