Simple key-value store¤
When designing a distributed system, we typically have to confront the problem of consistency in some form. That is, when processing a collection of transactions concurrently, we need to define in what sense the view of these transactions seen by various concurrent processes is consistent.
To illustrate this idea, we will use a consistency condition called linearizability (sometimes also called "strong consistency"). In a linearizable system, each transaction has a duration in time, marked by a begin event and an end event. The begin event corresponds to a "call" and passes the input parameters of the transaction. The end event corresponds to a "return" and passes output parameters. An execution of the system is linearizable if it is consistent with the hypothesis that each transaction executed instantaneously at some time between its begin and end events. This moment is called the commit point of the transaction.
To illustrate how we specify and verify a linearziable concurrent
system, we will use a simple server that acts as a key/value store
for multiple clients. The transactions implemented by the store are
read(K) and write(K,V), where K is a key and V is a value.
Our first key/value store server will be a simple sequential process that receives transaction requests and sends responses over a network. This will allow us to see in a simple setting how a linearizable service is specified.
Basic data types¤
We begin by declaring the data types needed to represent transactions and the messages that carry them.
The key and value types are uninterpreted for the moment.
type key
type value
txid as a transaction identifier. For
transaction identifiers, we use a ghost type. This is a value that
appears in the system when we specify and verify it, but not in
production (where it acts like a void type). Non-ghost values can
never depend on ghost values.
ghost type txid
type request_kind = {write,read}
object request_body = {
type t = struct {
knd : request_kind,
ky : key,
vl : value
}
}
object request = {
type t = struct {
tx : txid,
bd : request_body.t
}
}
object response_body = {
type t = struct {
vl : value
}
}
object response = {
type t = struct {
tx : txid,
bd : response_body.t
}
}
0. The module also defines the effect of
executing a transaction on the store using an action exec. This
action takes a request, updating the store if it is a write and
returning the value if it is a read.
module replica = {
function store(K:key) : value
after init {
store(K) := 0
}
action exec(inp : request_body.t) returns (out:response_body.t) = {
if inp.knd = write {
store(inp.ky) := inp.vl;
}
else if inp.knd = read {
out.vl := store(inp.ky);
}
}
}
dest. Since we have only one
server for now, this is a singleton set.
object dest = {
ghost type t = {srvr}
}
create, commit and eval. The first
corresponds to a begin event, the second to a commit point and the
third to an end event. This object is also "ghost" in the sense
that it will be abstracted away in production.
The create action takes a request and returns a unique transaction
id. At this point, the transaction has begun. The commit action
takes a transaction id and indicates the commit point of that
transaction. At the commit point, the reference object actually
executes the transaction on its replica and stores the result. The
eval action takes a transaction id of a commited transaction and
returns the correct result for the transaction.
object ref = {
action create(inp : request_body.t, dst:dest.t) returns (tx : txid)
action serialize(tx : txid)
action commit(tx : txid,dst:dest.t)
action eval(tx : txid) returns (res : response_body.t)
create, commit and eval. First, we have a replica of
the store, calle rep. To generate unique transaction ids, we use a
counter, whose value is next. The map txs stores the request
corresonding to each transaction id, txres stores the result, and
committed kees track of the set of committed
transactions. Initially, there are no committed transactions.
instance rep : replica
var next : txid
function txs(X:txid) : request_body.t
function txres(X:txid) : response_body.t
relation committed(X:txid)
after init {
next := 0;
committed(X) := false;
}
implement create {
tx := next;
txs(tx) := inp;
next := next + 1;
}
delegate commit tells ivy
that the assertions are guarantees for the caller.
implement commit {
assert 0 <= tx & tx < next;
assert ~committed(tx);
txres(tx) := rep.exec(txs(tx));
committed(tx) := true;
}
delegate commit
implement eval {
assert committed(tx);
res := txres(tx);
}
delegate eval
int in order to
get in infinite sequence of transaction ids.
interpret txid -> int
}
The implementation¤
include udp
type client_id
type req_msg = struct {
cid : client_id,
req : request.t
}
client_request and client_response. The user calls
client_request to send a request and implements client_response
to get the response as a callback. The parameter dst indicates
which server process to use, but in this case there is only one
choice.
module client(cid,srvr_chan,cl_chans) = {
action client_request(req : request_body.t, dst: dest.t)
action client_response(req : response_body.t, tx : txid)
implement client_request {
var m : req_msg;
m.cid := cid;
m.req.tx := ref.create(req,dst);
m.req.bd := req;
call srvr_chan.send(m);
}
client_response callback. The transaction id parameter is
"ghost" and is only used for specification.
implement cl_chans.recv(resp : response.t) {
call client_response(resp.bd,resp.tx)
}
}
port parameter tells us what port to listen on.
module server_node(port, cl_chans) = {
instance rep : replica
instance req_chan : nondup_endpoint(port,req_msg)
commit is a no-op on the
production server, since the reference object is abstracted away.
It is only used as a witness for linearizablility during
verification.
implement req_chan.recv(inp : req_msg) {
call ref.commit(inp.req.tx,dest.srvr);
var rr := inp.req.bd;
var res : response.t;
res.tx := inp.req.tx;
res.bd := rep.exec(rr);
call cl_chans(inp.cid).send(res)
}
}
Now we a server and an array of client endpoints, connecting them up
with network channels. We use a parameterized array of endpoints for
the clients called cl_chans. The endpoints will use a range of
port numbers beginning with 441000.
instance cl_chans : nondup_endpoint_set(client_id,44100,response.t)
instance cl(X:client_id) : client(X,srvr.req_chan,cl_chans)
instance srvr : server_node(44200,cl_chans)
For now we need only specify the client API. The service specification says that we get at most one response for each transaction and whenever we get a response, the transaction has been committed and gives the correct value according to the reference model. This specifcation allows us to verify the server in isolation, and then seprately verify any clients using the API, using only the reference model. Notice that the role of a specification is not to give a "golden model" of correctness but to provide a contract that allows us to verify components in isolation.
object service_spec = {
relation responded(X:txid)
after init {
responded(X) := false;
}
before cl_chans.send(p : client_id, m : response.t) {
assert ~responded(m.tx);
assert m.bd = ref.eval(m.tx);
responded(m.tx) := true;
}
}
export cl.client_request
import cl.client_response
Finally, we create an isolate the allows us to verify our server implementation in isolation. We verify only the guarentees of the server, using the specifications of the network channels. Thus, the test generator will play the role of the environment and the network.
trusted isolate iso_srvr = srvr with cl,cl_chans,ref,service_spec
The theory strbv[1] encodes strings with 1-bit vectors. This means
we can have at most 2 distinct keys in a test run. This limitation
is a good thing since it means we have a good chance to have
multiple transactions on the same key. Otherwise, we would be
unlikely to detect any data errors.
interpret key -> strbv[1]
interpret value -> bv[16]
interpret client_id -> bv[1]
$ ivy_to_cpp target=test isolate=iso_srvr build=true repstore1.ivy
$ ./repstore1 iters=100 runs=10 out=file.iev
Look at the trace file file.iev to see the test generator, server
and reference object interact. Try putting some errors in this
example and see if they produce assertions failures. If not, why
not?