Benoit Perroud : concepts and cloud ready applications: scalability

jeudi, 6 janvier 2011

Eventually Consistency demystified

In my crusade into the NoSQL world, Eventually Consistency is everywhere. I want to demystify this property a little bit here.

But let's begin with an example to have the same base for the discussion :

Let "Node1", "Node2" and "Node3" be three nodes (servers) that are part of our distributed datastore.
Let "User A", "User B", "User c" be three users wanting to read and write data in our fictive distributed datastore.

At time (1), "User A" write the value "A" to "Node1". "Node1" will replicate asynchronously this value to both "Node2" and "Node3" (specific to my example).
At time (2) the write call of "Node A" returns. But the replication of value "A" hasn't been completely propagate to "Node2" and "Node3".
At time (3), "User B" and "User C" will read value "A" from "Node1" and "Node2" respectively. "User B" got the latest value (because it reads the node which initiate the update), "User C" will read either the old or the new version of "A", but without any guarantee regarding what it will read.

In a future time (5), "User B" and "User C" re-read value "A" and then got the same value. At this point of time, the datastore is consistent.

Immediate Consistency

In a Immediate Consistency, opposing Eventually Consistency, the write call from "User A" should wait till the replication is done on other nodes before returning, and replica nodes ("Node2" and "Node3") should be synchronized to expose the new value at the same time.

Moreover, if "Node1" is unable to talk to "Node2", the write replication will probably fail then the write call from "User A" will fail.

As we can notice, Immediate Consistency is hard to scale (see two-phase commit or paxos algorithm), because it increases the latency of the writes and makes the system not redundant to failure.

Trade-off for scaling writes

Eventually Consistency is then a trade-off for scaling writes that seems reasonable in certain use-cases.

mercredi, 30 décembre 2009

Sequences scalability

Sequences are central part of most computer systems. They are used both in technical part (generate unique identifier for object) and in business part (count the remaining free tickets which can be sold).

When the load on the application goes higher, sequences become quickly central bottlenecks of systems. So the goal here is to first categorize different types of sequences and then gives some keys to see how sequences be distributed to increase their scalability, and hence the global scalability of the whole system.

Characteristics of sequences :

Order : the number generate should be used in a given order (ascendant or descendant), or the order do not matter. I will talk about ordered or unordered sequence.

Missing values : some values can be missing in the sequence. For example the sequence miss the value 4 should not alter the behavior of the application. I will talk about continued or discontinued sequence.

Finite sequence : all the values of the sequence are know and countable. Countable sequences are particular case of finite sequence where the sequence has a reasonable amount of values, and thus could be represented by one object per values. I will talk about finite or infinite sequence, and countable sequence.

Key to distribute sequences

Discontinued sequences are easy to scale : the application can pick numbers n by n in the sequence, reducing the amount of calls to the sequence by n. Good example for of discontinued sequences are objects identifier generators.

Unordered sequences are also easy to scale : a counter can be decomposed in n subcounter, picking at random one of the subcounter to increment or decrement will reduce the contention on one counter by n. A typical example is a quantity of object to sell. N subcounters are initialized with the sum is the total amount to sell. The object cannot be sold anymore when every subcounters are at 0.

Countable sequences can be scaled by representing every value of the sequence as a single object in the database. Updating an unused object is a easy task.

Finaly infinite, continued ordered sequence are very hard to scale because synchronization between all callers is required, so a single point of contention is mandatory.

Conclusion

Sequences are choke point in most systems, but as we saw above they can scale relatively simply with two simple tricks : read number n by n instead of 1 by 1, or decompose a counter in n subcounters.

Benoit Perroud : concepts and cloud ready applications

jeudi, 6 janvier 2011

Eventually Consistency demystified

mercredi, 30 décembre 2009

Sequences scalability

Qui êtes-vous ?

RSS Feed(burner)

Private links

Archives du blog

Tags