patternjavaMinor
Scalability issues, storing scores for users on a particular level for later retrieval
Viewed 0 times
scalabilitylaterissueslevelforscoresparticularusersstoringretrieval
Problem
I recently handed in a test assignment for a job application, which I was declined, with the motivation that it had scalability issues.
The test was to setup a REST-like backend for a hypothetical game, with three methods:
No persistence was required.
The session and user handling was done using service layers, aptly called
The session service did not assume any invalidation of sessions previously associated with the given user (and no requirements suggested anything to that effect) but simply
The user service was similar to the session service. The exception being that it used
Finally the
```
import java.util.*;
import java.util.concurrent.ConcurrentHash
The test was to setup a REST-like backend for a hypothetical game, with three methods:
- create session with a given user id (implicitly creating the user if it did not exist).
- register scores for a given user (through a session created earlier) and a given level. Each score had to be stored, not just the top score.
- retrieve a high score listing for a given level (max fifteen entries). If a single user had more than one of the top fifteen scores for a level, then only the highest score for that user would count.
No persistence was required.
The session and user handling was done using service layers, aptly called
SessionService and UserService. Both services used concurrent hashmaps to store information.The session service did not assume any invalidation of sessions previously associated with the given user (and no requirements suggested anything to that effect) but simply
put an entry into the concurrent hashmap. The entry key was the external representation and the value was an internal representation which contained the user id, thus on validation, no information was retrieved from the external representation.The user service was similar to the session service. The exception being that it used
putIfAbsent rather than just put, I don't see issues here either, but include the relevant code just in case:User user = users.get(userId);
if (null == user) {
long creationTime = timeProvider.getCurrentTimeMillis();
user = new User(userId, creationTime);
User previouslyStoredUser = users.putIfAbsent(userId, user);
if (null != previouslyStoredUser) {
user = previouslyStoredUser;
}
}Finally the
ScoreService in which I suspect the scalability issues are most likely to have occurred and include in its entirety:```
import java.util.*;
import java.util.concurrent.ConcurrentHash
Solution
This looks like a dreadful scalability problem:
You are sorting the list of scores first, and constraining it to the number you need second. This means that you are sorting a bunch of entries that you subsequently throw away. A good sort algorithm is
Since you only need to keep 15 scores in the end, you should be looking for a scalable data structure that can keep track of the 15 scores you need at any given time. Ideally, it would give you a way to compare the score you are looking at with the 15th best you have seen so far.
In other words, a
It might be better to implement the priority heap yourself - if you were to store the high scores in an array, and apply the heap properties to it by hand, then replacing the low score with another would save a siftUp operation that you don't need. That said, it's more complicated code to address a hypothetical additional performance concern. One possible compromise would be to write an interface that shows how you really want to interact with the heap, an implementation that relies on PriorityQueue, and then a comment to the effect that hand rolling the heap would be a possibility if the PriorityQueue were too slow.
Additionally, your design could have been improved by recognizing that the scores for the individual users could also be stored in a heap. This is hidden in the problem statement - you are expected to store all of a users scores, but you only ever need to retrieve the best score. Maintaining a heap for each user makes saving a new score slower (because the heap properties need to be maintained), but makes creating the high score list much faster (you only need the best score for each user - that
Or better still - instead of a heap, simply maintain the invariant that the list always has the highest score in at index = 0.
Furthermore, the problem statement tells you that when retrieving data, you only need to consider high scores for the requested level. That means that any time spent iterating through scores for other levels is wasted. You can get around this by stratifying the scores by level first, and then breaking them down by user. That is to say, the basic idea for your database should be
Other notes:
You'd normally prefer to avoid creating array lists that you don't need. A better design would use a cache with a CacheLoader, so that the new parts are created only when needed. The Guava library has a lot of Shiny! hidden in it. Well worth a read.
My local coding conventions very sternly frown upon wildcards in import statements.
Your class has two different constructors, which is a bad idea. Having a constructor that can accept any kind of ConcurrentMap you want is good; but if you are also going to provide a default constructor, it should create the map and then pass it to the previous constructor.
Iterating over Map.Entries is good - my preferred style is to then copy the key/value to local variables, rather than using getKey() over and over again.
Bare variable declarations often indicate that there is a method that you haven't written.
looks as though it should be
List highScores = new ArrayList<>(highestScoreForLevelByUserId.values());
Collections.sort(highScores);
return highScores.subList(0, Math.min(MAX_NUMBER_OF_HIGH_SCORE_RESULTS, highScores.size()));You are sorting the list of scores first, and constraining it to the number you need second. This means that you are sorting a bunch of entries that you subsequently throw away. A good sort algorithm is
O(n log n) but that's still a lot for large n.Since you only need to keep 15 scores in the end, you should be looking for a scalable data structure that can keep track of the 15 scores you need at any given time. Ideally, it would give you a way to compare the score you are looking at with the 15th best you have seen so far.
In other words, a
heap. The hands in the air explanation is that the heap property keeps an extreme value at the root, so you only need to do one compare to see if the next value you encounter should replace one of the values you have in the heap. When you find a higher score, you replace the root with the new score, then rebalance the ordering to restore the heap property. Once you have iterated through all the scores, you can either sort the scores in the heap, or pop them out of it as you prefer.java.util.PriorityQueue is an unbounded queue based on a priority heap, with the lowest entry at the root. Therefore, you can add() the first 15 high scores you find into it. After that, you can use peek() to compare your next score with the lowest score you have stored. If the new score is higher, poll() to remove the old value, and add() the new value.It might be better to implement the priority heap yourself - if you were to store the high scores in an array, and apply the heap properties to it by hand, then replacing the low score with another would save a siftUp operation that you don't need. That said, it's more complicated code to address a hypothetical additional performance concern. One possible compromise would be to write an interface that shows how you really want to interact with the heap, an implementation that relies on PriorityQueue, and then a comment to the effect that hand rolling the heap would be a possibility if the PriorityQueue were too slow.
Additionally, your design could have been improved by recognizing that the scores for the individual users could also be stored in a heap. This is hidden in the problem statement - you are expected to store all of a users scores, but you only ever need to retrieve the best score. Maintaining a heap for each user makes saving a new score slower (because the heap properties need to be maintained), but makes creating the high score list much faster (you only need the best score for each user - that
O(1) in a heap).Or better still - instead of a heap, simply maintain the invariant that the list always has the highest score in at index = 0.
List.add(int,E) would be one option, but List.set(int,E) is probably better, provided you remember to append the previous best score.Furthermore, the problem statement tells you that when retrieving data, you only need to consider high scores for the requested level. That means that any time spent iterating through scores for other levels is wasted. You can get around this by stratifying the scores by level first, and then breaking them down by user. That is to say, the basic idea for your database should be
Map> database;Other notes:
You'd normally prefer to avoid creating array lists that you don't need. A better design would use a cache with a CacheLoader, so that the new parts are created only when needed. The Guava library has a lot of Shiny! hidden in it. Well worth a read.
My local coding conventions very sternly frown upon wildcards in import statements.
Your class has two different constructors, which is a bad idea. Having a constructor that can accept any kind of ConcurrentMap you want is good; but if you are also going to provide a default constructor, it should create the map and then pass it to the previous constructor.
Iterating over Map.Entries is good - my preferred style is to then copy the key/value to local variables, rather than using getKey() over and over again.
Bare variable declarations often indicate that there is a method that you haven't written.
List userScores;
synchronized (entry.getValue()) {
userScores = new ArrayList<>(entry.getValue());
}looks as though it should be
List userScores = copy(entry.getValue());Code Snippets
List<Score> highScores = new ArrayList<>(highestScoreForLevelByUserId.values());
Collections.sort(highScores);
return highScores.subList(0, Math.min(MAX_NUMBER_OF_HIGH_SCORE_RESULTS, highScores.size()));Map<Level, Map<User, Heap<Score>> database;List<Score> userScores;
synchronized (entry.getValue()) {
userScores = new ArrayList<>(entry.getValue());
}List<Score> userScores = copy(entry.getValue());Context
StackExchange Code Review Q#51842, answer score: 2
Revisions (0)
No revisions yet.