patternMinor
Measuring availability from CAP theorem
Viewed 0 times
captheoremavailabilitymeasuringfrom
Problem
I wonder how to check availability property from CAP theorem.
Consistency check is conceptually easy: if at least once somehow same query at the same time returns different results, then the system may be considered inconsistent (or under some circumstances eventually consistent).
But what about availability? If it takes 1 millisecond for the system to respond, it probably should be considered available. If 1 second for a huge query - probably also available. But what about 50 seconds for a simple query?.. Where is the threshold - how much time should the system take to respond in order to consider it unavailable?
Well, one possible threshold is to consider system unavailable only if it throws some error immediately or doesn't respond at all. But with such definition we may consider available any system that lets the user wait for a long time until all inconsistencies and partitions are resolved.
So my questions are:
1) How do we check if system satisfies availability property as defined in CAP theorem?
2) How do we distinguish the following cases:
huge query.
wrapper (user calls it instead of the original system) that each time when original system is not available simply waits
until it becomes, so to the outside user it looks like the system is
available and is simply processing the query.
Consistency check is conceptually easy: if at least once somehow same query at the same time returns different results, then the system may be considered inconsistent (or under some circumstances eventually consistent).
But what about availability? If it takes 1 millisecond for the system to respond, it probably should be considered available. If 1 second for a huge query - probably also available. But what about 50 seconds for a simple query?.. Where is the threshold - how much time should the system take to respond in order to consider it unavailable?
Well, one possible threshold is to consider system unavailable only if it throws some error immediately or doesn't respond at all. But with such definition we may consider available any system that lets the user wait for a long time until all inconsistencies and partitions are resolved.
So my questions are:
1) How do we check if system satisfies availability property as defined in CAP theorem?
2) How do we distinguish the following cases:
- System is available, but it may take 10 minutes to process some
huge query.
- Actually system is not available, but there is some
wrapper (user calls it instead of the original system) that each time when original system is not available simply waits
until it becomes, so to the outside user it looks like the system is
available and is simply processing the query.
Solution
This is indeed a concern for those building real-world applications - how does one measure "availability" - not the binary property discussed in the CAP theorem, but the experience for users of the system.
There is industry agreement around this concern, and a standardized method of measuring it applicable to all systems. (Note: as stated in the comments, this is a separate concept from the theoretical concept of availability, which is analyzed from an algorithm, or perhaps from a simulation of a system.)
The Apdex method assumes you have a definition of "satisfied service" at some response time. Let's say you consider a sub-1-second response time to satisfactory to define "available." Then, by convention, "tolerable" service is 4 times the target time: in our case, a response time from 1 to 4 seconds. Response times longer than 4 seconds are considered "frustrated."
Then, the Apdex score is the number of satisfied requests, plus half the number of tolerable requests, plus none of the frustrated requests (i.e. 0), all divided by the total number of requests.
For example, if you had 100 total requests, and 30 of them were satisfied (1s but 4s), the apdex score for your service for the period of time those requests were serviced would be (30+25+0)/100, or 0.55. (An apdex score, by virtue of being a ratio, is always between 0 and 1)
Fun fact: a bunch of businesses then teamed up to turn this quite simple idea into a fully-fledged business. If you are a business, you can pay the Apdex Alliance money so they can produce education, webinars, and blogs, as well as certify compliant business so they may use the Apdex name and logo(!)
(I am not affiliated with the Alliance. I just like standards.)
There is industry agreement around this concern, and a standardized method of measuring it applicable to all systems. (Note: as stated in the comments, this is a separate concept from the theoretical concept of availability, which is analyzed from an algorithm, or perhaps from a simulation of a system.)
The Apdex method assumes you have a definition of "satisfied service" at some response time. Let's say you consider a sub-1-second response time to satisfactory to define "available." Then, by convention, "tolerable" service is 4 times the target time: in our case, a response time from 1 to 4 seconds. Response times longer than 4 seconds are considered "frustrated."
Then, the Apdex score is the number of satisfied requests, plus half the number of tolerable requests, plus none of the frustrated requests (i.e. 0), all divided by the total number of requests.
For example, if you had 100 total requests, and 30 of them were satisfied (1s but 4s), the apdex score for your service for the period of time those requests were serviced would be (30+25+0)/100, or 0.55. (An apdex score, by virtue of being a ratio, is always between 0 and 1)
Fun fact: a bunch of businesses then teamed up to turn this quite simple idea into a fully-fledged business. If you are a business, you can pay the Apdex Alliance money so they can produce education, webinars, and blogs, as well as certify compliant business so they may use the Apdex name and logo(!)
(I am not affiliated with the Alliance. I just like standards.)
Context
StackExchange Computer Science Q#68079, answer score: 4
Revisions (0)
No revisions yet.