Affichage des articles dont le libellé est concurrent. Afficher tous les articles
Affichage des articles dont le libellé est concurrent. Afficher tous les articles

jeudi, 6 janvier 2011

Eventually Consistency demystified

In my crusade into the NoSQL world, Eventually Consistency is everywhere. I want to demystify this property a little bit here.

But let's begin with an example to have the same base for the discussion :

  • Let "Node1", "Node2" and "Node3" be three nodes (servers) that are part of our distributed datastore.
  • Let "User A", "User B", "User c" be three users wanting to read and write data in our fictive distributed datastore.
At time (1), "User A" write the value "A" to "Node1". "Node1" will replicate asynchronously this value to both "Node2" and "Node3" (specific to my example).
At time (2) the write call of "Node A" returns. But the replication of value "A" hasn't been completely propagate to "Node2" and "Node3".
At time (3), "User B" and "User C" will read value "A" from "Node1" and "Node2" respectively. "User B" got the latest value (because it reads the node which initiate the update), "User C" will read either the old or the new version of "A", but without any guarantee regarding what it will read.



In a future time (5), "User B" and "User C" re-read value "A" and then got the same value. At this point of time, the datastore is consistent.

Immediate Consistency

In a Immediate Consistency, opposing Eventually Consistency, the write call from "User A" should wait till the replication is done on other nodes before returning, and replica nodes ("Node2" and "Node3") should be synchronized to expose the new value at the same time.

Moreover, if "Node1" is unable to talk to "Node2", the write replication will probably fail then the write call from "User A" will fail.

As we can notice, Immediate Consistency is hard to scale (see two-phase commit or paxos algorithm), because it increases the latency of the writes and makes the system not redundant to failure.

Trade-off for scaling writes

Eventually Consistency is then a trade-off for scaling writes that seems reasonable in certain use-cases.

dimanche, 25 avril 2010

Implementing unique constraints with Cassandra

I was talking about my "Doodle clone with Cassandra backend" in a previous post, I want to explain a little bit how I implement 2 kinds of constraints with Cassandra.

Unique name constrain

When subscribing to a poll, a unique constraint on the name of the subscriber is applied. There is a first check in Javascript in the GUI, but there is also a check of the uniqueness of the names on the server side.

Before going deeper, I just need to say a word about the structure used :

To store the subscribers, I use a SuperColumnFamily, with the ID of the poll as key. For a poll having key "1234", it gives me something like :

"1234" => {
"subscriberKey1" => { name = "subscriberName1", options = ... }
"subscriberKey2" => { name = "subscriberName2", options = ... }
...
}
The trick about the subscriberKey* is to generate a unique ID based on a timestamp in order to be sorted by Cassandra.
The key will be something like : currentTimestamp - ClusterNodeId - ThreadId. Adding the ThreadId into the key give me the guarantee that using the System.currentTimeMillis() will be unique, because the processing will unfortunately take more than a millisecond :)

Using this key, I will write the subscriber row with ConsistencyLevel.QUORUM, and then I will read all the rows till the one I have inserted (with ConsistencyLevel.QUORUM again).
At this point I use Cassandra magic (rows are ordered) to be sure that I have selected all the subscribers they subscribe before the current one.
I remains me to check manually the uniqueness of the name. If another row has the same name, I will simply remove the current subscriber.

Limiting the number of subscribers per option

Another constraint I have implemented is the limitation of the number of subscribers per option.

The principle is the same as the unique name constraint : I count the number of previously inserted subscriber.
One point to keep in mind is that we can read a row with a duplicated name, which will be deleted later so it should not be taken into account for the actual constraint. This mean that we need to check if we are not counting duplicated names in this constraint too...

But the good news about this constraint is that it show a way to implement counters in Cassandra. Maybe I will write another post about this topic. The bad news is that we need to check the duplicated names in several places.

Limitation of the model

Complexity : adding more constraints grows the complexity in O(n^2), as every new constraints (duplicated names) should be implemented into every other constraints (limitation of subscribers per option).

Scalability : this model will scale very well for billions of polls with some tens of users. The inverse, some tens of polls with billions of users, needs a lot of resources for checking constraint, which will make the performance drop.

mercredi, 30 décembre 2009

Sequences scalability

Sequences are central part of most computer systems. They are used both in technical part (generate unique identifier for object) and in business part (count the remaining free tickets which can be sold).

When the load on the application goes higher, sequences become quickly central bottlenecks of systems. So the goal here is to first categorize different types of sequences and then gives some keys to see how sequences be distributed to increase their scalability, and hence the global scalability of the whole system.

Characteristics of sequences :

Order : the number generate should be used in a given order (ascendant or descendant), or the order do not matter. I will talk about ordered or unordered sequence.

Missing values : some values can be missing in the sequence. For example the sequence miss the value 4 should not alter the behavior of the application. I will talk about continued or discontinued sequence.

Finite sequence : all the values of the sequence are know and countable. Countable sequences are particular case of finite sequence where the sequence has a reasonable amount of values, and thus could be represented by one object per values. I will talk about finite or infinite sequence, and countable sequence.

Key to distribute sequences

Discontinued sequences are easy to scale : the application can pick numbers n by n in the sequence, reducing the amount of calls to the sequence by n. Good example for of discontinued sequences are objects identifier generators.

Unordered sequences are also easy to scale : a counter can be decomposed in n subcounter, picking at random one of the subcounter to increment or decrement will reduce the contention on one counter by n. A typical example is a quantity of object to sell. N subcounters are initialized with the sum is the total amount to sell. The object cannot be sold anymore when every subcounters are at 0.

Countable sequences can be scaled by representing every value of the sequence as a single object in the database. Updating an unused object is a easy task.

Finaly infinite, continued ordered sequence are very hard to scale because synchronization between all callers is required, so a single point of contention is mandatory.

Conclusion

Sequences are choke point in most systems, but as we saw above they can scale relatively simply with two simple tricks : read number n by n instead of 1 by 1, or decompose a counter in n subcounters.

dimanche, 15 novembre 2009

Call different services, use the first to reply

The needs


My needs were the following : I have an arbitrary amount of webservices to call, but I don't know, for a particular query, which one will return the answer. More than that, some are slower and others are faster, but again, I don't know which is the fastest for my particular query.

So what I will do is to parallelize the queries, and use the value of the first which return my a good value.

Use an Executor

The first component to implement such a requirement is an Executor.
The Executor will run all my queries, in as many threads as I want, permitting to fine tune the behavior of the system.
As many queries will input the system and will be translated in many more threads, it is important to use a limited amount of threads to not crash the system.

Synchronization on result

A specific object will be used to synchronize the result : the first result will be returned to the caller, letting the running threads die, and no more threads will be run for this query.

Some basic primitive for concurrent programming (java.util.concurrent) will be used, such as CountDownLatch : the caller will block on the latch, and once a thread will find the answer, it will set the value into the result and count down the latch. The caller will be freed and the value will be available.

Watchdog if all the queries fail

One point to not forget is if every queries fail, the caller should be informed of this fail.
A watchdog will be used : it will block till all the threads executing the query will finish, and will then be run. Its only task is to free the caller. This one will continue the processing, but the result will not be available. This will be the responsibility of the caller to run user code to handle this special case.

Strong requirements

The requirements are the following :
  • Use as little synchronization as possible
  • Use as much concurrent primitives (java.util.concurrent) as possible
  • Do it as generic as possible
Java implementation

The interesting part of the implementation is given here below.
The Executor is simply an java.util.concurrent.Executor. I'm not good enough to implement a better solution than they do.
The result object is composed of the following attributes :
public class FutureResult {
private T result;
private boolean done;
private AtomicInteger threadCount = new AtomicInteger(0);
private CountDownLatch latch = new CountDownLatch(1);
}
A generic type for the value, a AtomicInteger to count how many other threads are working on this result, and a latch to make the caller wait.

The watchdog class will simply wait till all the threads are finish and free the caller if it is not already the case :
class WatchDogRunnable implements Runnable {
public void run() {
while (result.hasThreadsWorking()) {
synchronized (result) {
result.wait(2000);
}
}
latch.countDown();
}
}
And finally an abstract callable worker which can be easily extended to run the wanted tasks. The call function will increment the AtomicInteger, do the work if it is not already done, if a result is found, free the caller, decrement the AtomicInteger and return the value if it is set. Not that the worker implementation will implement the callInternal function.
public abstract class AbstractCallableWorker
implements Callable {
private FutureResult result;
public final T call() {
result.setOneMoreThread();
if (!result.isDone()) {
T callResult = callInternal();
if (callResult != null) {
setResult(callResult);
latch.countDown();
}
}
result.setOneLessWorkingThead();
return result.getResult();
}
protected abstract T callInternal();
}
Everything put together, you can implement as much worker as you want and call as many webservice as you want, with the only guaranty to get the caller continue with the fastest service to reply !