DistKV: A distributed no-master key-value store¶
Rationale¶
Any kind of distributed storage is subject to the CAP theorem (also called “Brewer’s theorem”): you can’t get all of (global) Consistency, Availability, and Partition tolerance. The problem is that you do want all three of these.
One way around this problem is to recognize that on most KV storage systems, any given record is rarely (if ever) changed by more than one entity at the same time. Thus, a simple gossip protocol is sufficient for distributing data.
DistKV does not have a master node, much less a consensus-based election system (Raft, Paxos, …). Instead, DistKV compiles a short list of available servers that’s broadcast every few seconds. The algorithm to select the next server is deterministic so that all nodes in a network agree which server is currently responsible for housekeeping.
When a partitioned network is re-joined, these housekeepers connect to each other and exchange a series of messages, to establish which updates the other side(s) appears to have missed out on. These are then re-broadcast.
DistKV does not support data partitioning. Every node knows the whole data set.
The DistKV client library does not support reconnecting. That is intentional: if the local server ever dies, your client has stale state and should not continue to run. The clean solution is to wait until the client is again operational and up-to-date, and then restart the client.
DistKV is intended to be used in a mostly-RAM architecture. There is no disk-based storage backend; snapshots and event logs are used to restore a system, if necessary.
DistKV is based on the gossip system provided by Hashicorp’s Serf. It supports all data types that can be transmitted by MsgPack <https://github.com/msgpack/msgpack/blob/master/spec.md>.
TODO: MsgPack has extension types, so constructing Python objects is possible.
API¶
DistKV offers an efficient interface to access and change data. For compatibility, a front-end that mostly-mimics the etcd2 protocol is planned.
Status¶
Some of the above is still wishful thinking. In particular, we don’t have an etcd2 compatibility service yet.
DistKV’s client protocol¶
DistKV’s native client protocol is based on MsgPack. The client sends requests; the server sends one or more responses. You may (and indeed should) run concurrent requests on the same connection.
Strings must be UTF-8, as per MsgPack specification.
Requests and replies are mappings.
The server initially sends a greeting, using sequence number zero. It will not send any other unsolicited message.
Requests¶
Client requests are always mappings. seq
and action
must be
present. All other fields are request specific. The server will ignore
fields it doesn’t understand.
seq¶
Every client request must contain a strictly increasing positive sequence number. All replies associated with a request carry the same sequence number.
action¶
The action which the server is requested to perform. Valid actions are described below.
nchain¶
This field tells the DistKV server how many change entries to return.
The default is zero. If you want to update a value, retrieve the
original with nchain
set to one. Synchronization between DistKV servers
requires the number of possible partitions plus one, in order to protect
against spurious conflict reports.
Replies¶
Server replies are always mappings. At least one of seq
and error
must be present. The client must ignore fields it doesn’t expect.
seq¶
The sequence number of the request which caused this reply.
Server messages which don’t include a sequence number are errors and will close the connection.
The server will either send exactly one reply with any given sequence number,
or a multi-reply sequence which starets with a state=start
message.
error¶
This field contains a human-readable error message. A request has failed if this field is present.
value¶
The value of the DistKV entry, assuming one was requested.
state¶
May be start
or end
start
indicates the beginning of a multi-value result.end
indicates that a multi-value result has finished. No more messages with this sequence number will be sent.
The start
message will contain neither an error nor a value.
chain¶
The change chain resulting from, or retrieved by, a command.
Change chains track which server last modified a value, so that replayed updates can be ignored and conflicting updates can be recognized. A chain never contains any one DistKV server more than once.
See the server protocol for a detailed description.
tick¶
The current server’s change counter. This field can be used to ensure that the local server is not restarted with old state.
tock¶
An always-increasing integer that’s (supposed to be) shared within the whole DistKV system. You can use it when you need to reconnect to a server, to make sure that the system is (mostly) up-to-date.
Actions¶
connect¶
This is a pseudo-action with sequence number zero, which the server assumes
to have received after connecting. The server’s first message will contain
seq=0
, its node
name, a version
(as a list of integers), and
possibly its current tick
and tock
sequence numbers.
The auth
parameter, if present, carries a list of configured
authorization methods. The first method in the list should be used to
authorize the client. If the list’s first entry is None
then
authorization is not required. Other entries may be used for
testing after a client is logged in.
auth¶
Tell the server about your identity. This method must be sent first if the server requests authorization.
The identity
parameter tells the server which user ID (or equivalent)
to use for logging in. typ
contains the auth type to use; this
must be identical to the first entry in the connect
reply’s
auth
parameter.
If this is not the first message, the authorization is verified but the resulting user identity is ignored.
stop¶
Send this action to abort a running multi-value request. Set task
to
the sequence number of the request to abort.
This action only works after you received a start
state message.
It returns a bool
which is True
if the command was still
running.
A positive reply does not indicate that no more messages with the stated
sequence number will arrive; this will be indicated by the state=end
message.
get_value¶
Retrieve a single value. The path
to the value needs to be sent as a list.
If the value does not exist or has been deleted, you’ll get None
back.
Alternately, you can set node
and tick
, which returns the entry
that has been set by this event (if the event is still available). The
entry will contain the current value even if the event has set a previous
value.
set_value¶
Set a single value. The path
to that value
needs to be sent as a list.
If you are updating a known value, you should send a chain
entry
to help ensure that no other node has changed it unexpectedly. (Of course,
due to the distributed nature of DistKV, this may happen anyway.) You can
also use prev
to send an expected old value, but you really shouldn’t.
This action returns the node’s new change chain
. If you did not send a
chain
field, the previous value is returned in prev
.
delete_value¶
Remove a single value. This is the same as setting it to None
.
get_state¶
Retrieve the current system state. The following bool
attributes can be
set to specify what is returned. The reply is stored in an attribute of the
same name.
- nodes
A dict of node ⇒ tick.
- known
A dict of node ⇒ ranges of ticks known. This contains current data as well as events that have been superseded.
- current
A dict of node ⇒ ranges of ticks corresponding to the current state of nodes. This is expensive to calculate. It is a superset of ‘known`.
- missing
A dict of node ⇒ ranges of ticks not available locally. This is the inverse
of known
.
- remote_missing
A dict of node ⇒ ranges of ticks reported to be missing at some other node.
get_tree¶
Retrieves all values with the prefix given in path
.
This is a multi-value reply; each reply contains path
and value
entries. Deleted nodes may or may not be reported.
If the path does not exist or does not have children, a single-value reply is returned.
Optimization: if a reply contains a “depth” key, its path is shortened by the request’s path, plus that many elements from the previous reply’s path.
Thus, if you request a path of ['a','b','c']
, this reply:
{ seq=13, path=['a','b','c'], value="one" }
{ seq=13, path=['a','b','c','d','e'], value="two" }
{ seq=13, path=['a','b','c','d','f'], value="three" }
is equivalent to:
{ seq=13, depth=0, value="one" }
{ seq=13, depth=0, path=['d','e'], value="two" }
{ seq=13, depth=1, path=['f'], value="three" }
root¶
Switch the client’s root to the given path. This request returns the new root node.
It is not possible to undo this request (other than to reconnect). Tasks started before this action are not affected.
This action returns the new root node’s value.
watch¶
Monitor changes to this node (and those below it). Replies look like those from get_tree
.
The recommended way to run the watch
call with fetch=True
. This
fetches the current state and guarantees that no updates are lost. To mark
the end of the static data, the server sends a state=uptodate
message.
This process will not send stale data after an update, so your code may
safely replace an old entry’s state with new data.
This task obeys min_depth
and max_depth
restrictions.
save¶
Instruct the server to save its state to the given path
(a string with
a filename).
log¶
Instruct the server to continuously write change entries to the given path
(a string with a filename). If fetch
is True
, the server will also
write its current state to that file.
This command returns after the new file has been opened and the initial state has been written, if so requested. If there was an old log stream, there may be some duplicate entries. No updates are skipped.
serfsend¶
Pass-through call to transmit a message via serf
. Parameters are
type
(the user event to send to), data
(the data to send) and
optionally tag
(a string that limits recipients to Serf nodes with this
tag).
Raw binary data may be transmitted by using raw
instead of data
.
serfmon¶
Pass-through call to receive brodcast messages via serf
. You’ll get a
stream with data
containing the decoded message. If decoding fails,
raw
contains the message’s bytes and error
holds a string
representation of the decoder problem.
Set raw
to True if the incoming messages are not supposed to be
msgpack-encoded in the first place. In this case, data
and error
will always be missing.
Examples¶
You can turn on message debugging with ‘distkv -vvv’.
Get and set a value¶
If the value is not set:
Send {'path': ('test',), 'nchain': 3, 'action': 'get_tree', 'seq': 1}
Recv {'value': None, 'seq': 1}
Setting an initial value:
Send {'value': 1234, 'path': ('test',), 'nchain': 2, 'chain': None, 'action': 'set_value', 'seq': 2}
Recv {'changed': True, 'chain': {'node': 'test1', 'tick': 2, 'prev': None}, 'seq': 2}
Trying the same thing again will result in an error:
Send {'value': 1234, 'path': ('test',), 'nchain': 2, 'chain': None, 'action': 'set_value', 'seq': 3}
Recv {'error': 'This entry already exists', 'seq': 3}
To fix that, use the chain value you got when setting or retrieving the previous value:
Send {'value': 123, 'path': ('test',), 'nchain': 2, 'chain': {'node': 'test1', 'tick': 2}, 'action': 'set_value', 'seq': 4}
Recv {'changed': True, 'chain': {'node': 'test1', 'tick': 3, 'prev': None}, 'seq': 4}
Sending no precondition would also work
After you set multiple values:
Send {'value': 123, 'path': ('test', 'foo'), 'nchain': 0, 'action': 'set_value', 'seq': 5}
Recv {'changed': True, 'prev': None, 'seq': 5}
Send {'value': 12, 'path': ('test', 'foo', 'bap'), 'nchain': 0, 'action': 'set_value', 'seq': 6}
Recv {'changed': True, 'prev': None, 'seq': 6}
Send {'value': 1, 'path': ('test', 'foo', 'bar', 'baz'), 'nchain': 0, 'action': 'set_value', 'seq': 7}
Recv {'changed': True, 'prev': None, 'seq': 7}
Send {'value': 1234, 'path': ('test',), 'nchain': 0, 'action': 'set_value', 'seq': 8}
Recv {'changed': True, 'prev': 123, 'seq': 8}
you can retrieve the whole subtree:
Send {'path': ('test',), 'nchain': 0, 'action': 'get_tree', 'seq': 1}
Recv {'seq': 1, 'state': 'start'}
Recv {'value': 1234, 'depth': 0, 'seq': 1}
Recv {'value': 123, 'path': ('foo',), 'depth': 0, 'seq': 1}
Recv {'value': 12, 'path': ('bap',), 'depth': 1, 'seq': 1}
Recv {'value': 1, 'path': ('bar', 'baz'), 'depth': 1, 'seq': 1}
Recv {'seq': 1, 'state': 'end'}
Retrieving this tree with distkv client get -ryd ':val' test
would print:
test:
:val: 1
foo:
:val: 1
bap: {':val': 12}
bar:
:val: 1
baz: {':val': 1}
DistKV’s server protocol¶
DistKV instances broadcast messages via Serf <http://serf.io>.
The payload is encoded with msgpack
<https://github.com/msgpack/msgpack/blob/master/spec.md> (Serf does not
pass arbitrary payload objects) and sent as user
events with a
configurable name that defaults to name of distkv.XXX
(“XXX” being the
action’s type). The coalesce
flag must always be False
.
All strings are required to be UTF-8 encoded.
TODO: investigate whether replicating Serf in Python would make sense.
Data types¶
Chains¶
A chain, in DistKV, is a bounded list of ordered (node, tick)
pairs.
node
is the name of DistKV node that effected a change.tick
is a node-specific counter which increments by one when any entry on that node is changed.
A chain entry may not have a tick
element. In that case the node has
not been initialized yet. Such entries are only valid in ping
chains.
Chains are governed by three rules:
The latest change is at the front of the chain.
Any node may only appear on the chain once, with the
tick
of the latest change by that node. If a node changes an entry again, the old entry is removed before the new entry is prepended.This rule does not apply to
ping
chains.Their length is bounded. If a new entry causes the chain to grow too long, the oldest entry is removed.
If an entry is removed from the chain, its node, tick
value is stored
in a per-node known
list.
Chains are typically represented by (node,tick,prev)
maps, where
prev
is either Null
(the chain ends here), nonexistent (the chain
was truncated here), or another chain triple (the previous change on a
different node).
Ticks increment sequentially so that every node can verify that it knows all of every other node’s changes.
The chain concept is based on vector clocks <https://queue.acm.org/detail.cfm?id=2917756>. Nodes are sorted so that causality may be established more easily (no need to compare the whole vectors) and vector length may be bounded without sacrificing reliability.
The default chain length should be two larger than the maximum of
- the number of partitions a DistKV system might break up into,
- the number of hosts within one partition that might change any single value. Ideally, this number should be two: one for the host that does it as a matter of fact, e.g. a measurement system, and one for any manual intercession.
ticks¶
All tick values are 63-bit unsigned integers. As this space requires 20 mio years to wrap around, assuming ten messages per millisecond (which is way above the capacity of a typical Serf network), this protocol does not specify what shall happen if this value overflows.
Ranges¶
Tick ranges are used to signal known (or missing) messages. They are
transmitted as sorted lists which contain either single elements or
[begin,end)
pairs (that is, the begin
value is part of the interval
but end
is not).
Path¶
Every entry is associated with a path, i.e. a list of names leading to it. Names may be UTF-8 strings, byte strings, or numbers. The empty UTF-8 and byte strings are considered equivalent, any other values are not.
Common items¶
Bidirectional¶
path¶
The path to the entry you’re accessing. This is a list. The contents of
that list may be anything hashable, i.e. strings, integers,
True
/False
/None
.
value¶
A node’s value. This can be anything that msgpack
can work with: you do
not need to encode your values to binary strings, and in fact you should
not because some of DistKV’s features (like type checking) would no longer
work, or be much more awkward to use.
Replies¶
node¶
The node which is responsible for this message. For update
events this
is the node which originated the change; for all other events, it’s the
sending node.
tick¶
This node’s current tick. The tick is incremented every time a value is changed by that node.
prev¶
A dict with node,tick,prev
entries, which describes the node which
originated the change that is is based on.
If this value is None
, the entry has been created at that time. If it
is missing, further chain members have been elided.
In the client protocol, the node
, tick
and prev
members are
stored in a chain
element; otherwise the semantics are the same.
A chain will not contain any node more than once. When a value is changed
again, that node’s tick
is incremented, its entry is added or moved
to the head of the chain.
tock¶
This is a global message counter. Each server has one; it is incremented
every time its node counter is incremented or a Serf message is sent.
A server must not send a message with a smaller (or equal) tock
value
than any it has received, or previously sent. Since Serf does ot guarantee
order of delivery, receiving a message with a smaller tock
than the
preceding one is not an error.
Message types¶
update¶
This message updates an entry.
Each server remembers the change chain’s per-node tick
values so that
it can verify that all messages from other servers have been received.
path¶
The list of path elements leading to the entry to be updated.
value¶
The value to set. Null
means the same as deleting the entry.
info¶
This message contains generic information. It is sent whenever required.
known¶
This element contains a map of (node ⇒ ranges of tick values) which the sending server has seen. This includes existing events as well as events that no longer exist; this happens when a node re-updates an entry.
This message’s change chain refers to the ping
it replies to.
ticks¶
This element contains a map of (node ⇒ last_tick_seen), sent to verify that
missing¶
A map of (node ⇒ ranges of tick values) which the sending node has not seen. Any node that sees this request will re-send change messages in that range.
reason¶
This element is sent in the first step of split reconciliation recovery. If
the first ping
after being reconnected “wins”, then the winning side
needs to be told that there’s a problem.
This element contains the losing side’s ping chain, which the nodes in the winning side’s ping chain use to initiate their recovery procedure.
ping¶
A periodic “I am alive” message. This message’s change chain shows which node was pinged previously.
Timing and concurrency¶
Server to Server¶
Ping sequence¶
Every clock
seconds each node starts thinking about sending a ping
sometime during the next clock
seconds. The node that’s last in the
chain (assuming that the chain has maximum length) does this quite early,
while the node that transmitted the previous ping
does this at the end
of the interval. Nodes not in the current chain do this immediately, with
some low probability (one to 10 times the number of known nodes) so that
the chain varies. If no ping
has arrived after another clock/2
seconds, each node sends a ping sometime during the next clock/2
seconds. Thus, at least one ping
must be seen every 3*clock
seconds.
Ping messages can collide. If so, the message with the higher tock
value wins. If they match, the node with the higher tick
value wins. If
they match too, the node with the alphabetically-lower name wins. The
winning message becomes the basis for the next cycle.
This protocol assumes that the prev
chains of any colliding ticks are
identical. If they are not, there was at least one network split that is
now healed. When this is detected, the nodes mentioned in the messages’
chains send info
messages containing ticks
for all nodes they know.
The non-topmost nodes will delay this message by clock/ping.length
(times their position in the chain) seconds and not send their message if
they see a previous node’s message first. Resolution of which chain is the
“real” one shall proceed as above.
clock
is configurable (ping.clock
); the default is 5
. It must be at
least twice the time Serf requires to delivers a message to all nodes.
The length of the ping chain is likewise configurable (ping.length
).
It should be larger than the number of possible network partitions; the
default is 4.
TODO: Currently, this protocol does not tolerate overloaded Serf networks well, if at all.
Startup¶
When starting up, a new node sends a ping
query with an empty prev
chain, every 3*clock
seconds. The initial tick
value shall be zero;
the first message shall be delayed by a random interval between clock/2
and clock
seconds.
Reception of an initial ping
does trigger an info
message, but does not
affect the regular ping
interval, on nodes that already participate in
the protocol. A new node, however, may assume that the ping
message it
sees is authoritative (unless the “new” ping
is followed by one with a
non-empty chain). In case of multiple nodes joining a new network, the last
ping
seen shall be the next entry in the chain.
The new node is required to contact a node in the (non-empty) ping chain it
attaches to, in order to download its current set of entries, before
answering client queries. If a new node does already know a (possibly
outdated) set of messages and there is no authoritative chain, it shall
broadcast them in a series of update
messages.
The first node that initiates a new network shall send an update
event
for the root node (with any value). A chain is not authoritative if it only
contains nodes with zero tick
values. Nodes with zero ticks shall not
send a ping
when the first half of the chain does not contain a
non-zero-tick node (unless the second half doesn’t contain any such nodes
either).
The practical effect of this is that when a network is restarted,
fast-starting empty nodes will quickly agree on a ping
sequence. A node
with recovered data, which presumably takes longer to start up since it has
to load the data first, will then take over as soon as it is operational;
it will not be booted from the chain by nodes that don’t yet have recovered
the data store.
Event recovery¶
After a network split is healed, there can be any number of update events that the “other side” doesn’t know about. These need to be redistributed.
Step zero: a ping
message with an incompatible chain arrives.
First step: Send an info
message with a ticks
element, so that any
node that has been restarted knows which tick value they are supposed to
continue with.
Second step (after half a tick): Send a message with missing
elements
that describe which events you do not yet know about.
Third step: Nodes retransmit missing events, followed by a known
message that lists ticks which no longer appear on an event’s chain.
After completing this sequence, every node should have a node list which
marks no event as missing. For error recovery, a node may randomly
(at most one such request every 10*clock
interval) retransmit its
local missing
list, assuming there is one.
This protocol assumes that new nodes connect to an existing non-split
network. If new nodes first form their own little club before being
reconnected to the “real” network (or a branch of it), this would force a
long list of events to be retransmitted. Therefore, nodes with zero ticks
must initially be passive. They shall open a client connection to any
on-chain node and download its state. If a node has received a non-zero
tick for itself in a known
message, it may participate only after it
has received a complete download, and must not allow client connections
before its list of missing events is empty.
All of these steps are to be performed by the first nodes in the pre-joined
chains. If these messages are not seen after clock/2
seconds (counting
from reception of the ping
, ticks
or missing
element that
occured in the previous step), the second node in the chain is required to
send them; the third node will take over after an additional clock/4
interval, and so on. Of course, only messages originating from hosts on the
correct chain shall suppress a node’s transmission.
DistKV and authentication¶
DistKV ships with a couple of rudimentary auth modules.
Currently there is no access control. That’s on the TODO list.
Included user auth methods¶
root¶
No access control. There is one user named “*”.
password¶
Username plus password.
API¶
The authorization code is modular. DistKV allows loading multiple auth methods, one of which is active. A method may use more than one record type (think “user” or “group”). Each of those records has a name.
The “user” type is only special because server and client use that to process login requests.
Multiple distinct DistKV domains or subdomains are possible, by adding an additional meta-root record anywhere in the entry hierarchy.
-
distkv.auth.
loader
(method: str, typ: str, *a, **k)¶
-
class
distkv.auth.
BaseServerAuth
(data: dict = {})¶ This class is used on the server to represent / verify a user.
The schema verifies whatever data the associated
ClientAuth
initially sends.-
classmethod
load
(data: distkv.model.Entry)¶ Create a ServerAuth object from existing stored data
-
await
auth
(cmd: distkv.server.StreamCommand, data)¶ Verify that @data authenticates this user.
-
info
()¶ Return whatever public data the user might want to have displayed.
This includes information to identify the user, but not anything that’d be suitable for verifying or even faking authorization.
-
await
check_read
(*path, client: distkv.server.ServerClient, data=None)¶ Check that this user may read the element at this location. This method may modify the data.
-
await
check_write
(*path, client: distkv.server.ServerClient, data=None)¶ Check that this user may write the element at this location. This method may modify the data.
-
classmethod
-
class
distkv.auth.
BaseClientAuth
(**data)¶ This class is used for creating a data record which authenticates a user.
The schema verifies the input to
build()
.-
classmethod
build
(user)¶ Create a user record from the data conforming to this schema.
-
ident
¶ Some user identifier. Required so that the server can actually find the record.
-
await
auth
(client: distkv.client.Client, chroot=())¶ Authorizes this record with the server.
-
classmethod
-
class
distkv.auth.
BaseServerAuthMaker
(chain=None, data=None, aux=None)¶ This class is used on the server to verify the transmitted user record and to store it in DistKV.
The schema verifies the data from the client.
-
classmethod
load
(data: distkv.model.Entry)¶ Read the user data from DistKV
-
classmethod await
recv
(cmd: distkv.server.StreamCommand, data: distkv.util.attrdict) → distkv.auth.BaseServerAuthMaker¶ Create a new user by reading the record from the client
-
ident
¶ The record to store this user under.
-
save
()¶ Return a record to represent this user, suitable for saving to DistKV
-
await
send
(cmd: distkv.server.StreamCommand)¶ Send a record to the client, possibly multi-step / secured / whatever
-
classmethod
-
class
distkv.auth.
BaseClientAuthMaker
(**data)¶ This class is used for creating a data record which describes a user record.
This is not the same as a
BaseClientAuth
; this class is used to represent stored user data on the server, while aBaseClientAuth
is used solely for authentication.The schema verifies the input to
build()
.-
classmethod
build
(user)¶ Create a user record from the data conforming to this schema.
-
ident
¶ The identifier for this user.
Required so that the server can actually find the record.
-
await
send
(client: distkv.client.Client, _kind='user')¶ Send this user to the server.
-
classmethod
Verifying and Translating Entries¶
Verification¶
Your application may require consistency guarantees. Instead of committing
fraud when a transaction in your bookkeeping system doesn’t add up to zero,
you might want to add a verification step to make sure that that doesn’t
happen in the first place. More prosaically, the statement “The door is
locked” is either True or False. (However, you always should be prepared
for an answer of “No idea”, aka None
. That’s not avoidable.)
Types¶
Type entries may contain a schema
attribute with a JSON Schema that
verifies the data. They also may contain a code
attribute which forms
the body of a validation procedure. The variable value
contains the
value in question.
Type entries are hierarchic: An (“int”,”percent”) type is first validated against (None,”type”,”int”), then against (None,”type”,”int”,”percent”).
Type checkers cannot modify data.
A value of None
may represent a deleted entry and thus is never
typechecked.
Type check entries must be accompanied by “good” and “bad” values, which
must be non-empty arrays of values which pass or fail this type check. For
subordinate types, both kinds must pass the supertype check: if you
add a type “float percentage”, the bad
list may contain values like -1.2
or
123.45
, but not "hello"
.
Beware that restricting an existing type is dangerous. The DistKV server does not verify that all existing entries verify correctly. In pedantic mode, your network may no longer load its data or converge.
Matches¶
The (None,”match”) hierarchy mirrors the actual object tree, except that wildcards are allowed:
“#”
matches any number of levels
“+”
matches exactly one level
This matches MQTT’s behavior.
Unlike MQTT, there may be more than one “#” wildcard.
Be aware that adding or modifying matches to existing entries is dangerous. The DistKV server does not verify that all existing entries verify correctly. In pedantic mode, your network may no longer load its data or converge.
Putting it all together¶
Given the following structure, values stored at (“foo”, anything, “bar”)
must be integers. The DistKV content of testcase
tests/test_feature_typecheck.py::test_72_cmd
looks like this, when
dumped with the command get -ryd_
:
_: 123
null:
match:
foo:
+:
bar:
_:
type:
- int
- percent
type:
int:
_:
bad: [none, "foo"]
code: 'if not isinstance(value,int): raise ValueError(''not an int'')'
good: [0,2]
percent:
_:
bad: [-1,555]
code: 'if not 0<=value<=100: raise ValueError(''not a percentage'')
'
good: [0,100,50]
foo:
dud:
bar:
_: 55
Translation¶
Sometimes, clients need special treatment. For instance, an IoT-MQTT message
that reports turning on a light might send “ON” to topic
/home/state/bath/light
, while what you’d really like to do is to change
the Boolean state
attribute of home.bath.lights
. Or maybe the value
is a percentage and you’d like to ensure that the stored value is 0.5
instead of “50%”, and that no rogue client can set it to -20 or “gotcha”.
To ensure this, DistKV employs a two-level type mechanism.
- “type” entries describe the type of entry (“this is an integer between 0 and 42”).
- “match” entries describe the path position to which that type applies
In addition, a similar mechanism may be used to convert clients’ values to DistKV entries and back.
- “codec” entries describe distinct converters (“50%” => 0.5; “ON” => ‘set
the entry’s “state” property to
True
’) - “map” entries are activated per client (via command, or controlled by its login) and describe the path position to which a codec applies
All of these are stored below the global (None
) top-level path.
Codecs¶
Codec entries contain decode
and encode
attributes which form the
bodies of procedures that rewrite external data to DistKV values and vice
versa, respectively, using the value
parameter as input. The decode
procedure gets an additional prev
variable which contains the old
value. That value must not be modified; create a copy or (preferably)
use distkv.util.combine_dict()
to assemble the result.
Codecs may be named hierarchically for convenience; if you want to call the “parent” codec, put the common code in a module and import that.
Codecs also require “in” and “out” attributes, each of which must contain a list of 2-tuples with that conversion’s source value and its result. “in” corresponds to decoding, “out” to encoding – much like Python’s binary codecs.
-
distkv.util.
combine_dict
(*d)¶ Returns a dict with all keys+values of all dict arguments. The first found value wins.
This recurses if values are dicts.
Converters¶
While the (None,"map")
contains a single mapping, (None,"conv")
contains an additional single level of names. A mapping must be applied to
a user before it is used. This change is instantaneous, i.e. an existing
user does not need to reconnect.
Below that, converter naming works like that for mappings. Of course, the
pointing attribute is named codec
instead of type
.
Putting it all together¶
Given the following data structure, the user “conv” will only be able to write stringified integers under keys below the “inty” key, which will be stored as integers:
null:
auth:
_:
current: _test
_test:
user:
con:
_:
_aux:
conv: foo
std:
_:
_aux: {}
codec:
int:
_:
decode: assert isinstance(value,str); return int(value)
encode: return str(value)
in:
- [ '1', 1 ]
- [ '2', 2 ]
- [ '3', 3 ]
out:
- [ 1, '1' ]
- [ 2, '2' ]
- [ -3, '-3' ]
conv:
foo:
inty:
'#':
_:
codec:
- int
inty:
_: hello
ten:
_: 10
yep:
yepyepyep:
_: 13
yep:
_: 99
The above is the content at the enf of the testcase
tests/test_feature_convert.py::test_71_basic
.