HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Complex query to count votes with a redistribution system

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
redistributionwithquerysystemcountcomplexvotes

Problem

I've spent the last couple of evenings trying to build a system for redistribution of votes in a ranked voting system. I've finally come up with the model for redistributing surpluses from a candidate whose number of votes exceeds the threshold.

I'm at the beginning of my learning curve and I've used Stack Overflow extensively during this process to gain knowledge about how to produce each and every query within this system.

Finally I've reached the result I want as can be seen below on the expense of a really confusing set of queries:

+-----------+-------+----------------+----------------------------+
 | CANDIDATE | VOTES | REDISTRIBUTION | VOTES_AFTER_REDISTRIBUTION |
 +-----------+-------+----------------+----------------------------+
 |         1 |     8 |             -1 |                          7 |
 |         2 |     1 |          0.125 |                      1.125 |
 |         3 |     2 |           0.25 |                       2.25 |
 |         4 |     4 |            0.5 |                        4.5 |
 |         5 |     2 |           0.25 |                       2.25 |
 |         6 |     3 |              0 |                          3 |
 +-----------+-------+----------------+----------------------------+


As mentioned - these queries are horrible. Absolutely and utterly HORRIBLE. Now I need to understand how this could be simplified and at the same time as effective or even more effective than this mess of queries.

Could anyone explain how to optimize this in a way that I can learn more about writing good queries?

This is my (awful) query:

```
SELECT vote_candidate candidate, original_votes votes, surplus_redistribution redistribution, (original_votes + surplus_redistribution) votes_after_redistribution
FROM (
SELECT c.vote_candidate, (
SELECT (
(MAX(votes_above_the_threshold) - (
SELECT FLOOR((COUNT(*) / (2 + 1)) + 1) threshold
FROM votes
)) / MAX(votes_above_the_threshold)

Solution

Your query is simultaneously awful and awesome. It is truly impressive, especially if you are self-taught.

Unfortunately, I don't believe it is correct. (Or, I may have misunderstood how you intended for the vote redistribution scheme to work.) For example, it reports that the redistribution in favour of Candidate 2 should be +0.125. Of the 8 ballots that have Candidate 1 as the first choice, 4 of them have Candidate 2 as the second choice. Shouldn't the redistribution be +0.5 then?

As further evidence, the redistribution column should sum to 0. However, -1 + 0.125 + 0.25 + 0.5 + 0.25 + 0 = 0.125, which isn't neutral.

Schema

I would have changed the terminology of your votes table as

CREATE TABLE ballots
( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, choice1 INT
, choice2 INT
, choice3 INT
, choice4 INT
, choice5 INT
, choice6 INT
);


However, it may have been better to do away with that table altogether, and go straight to the vote_orders table, because it's a bad idea to design your schema to support a fixed number of candidates. (As the saying goes, there are only three numbers in computer science: zero, one, and many. You want to support many.) Consider the ballots table to be more of a data entry tool (possibly a TEMPORARY TABLE) than a storage format.

I would have used the name "votes" for what you called the vote_orders table:

CREATE TABLE votes
( ballot INT NOT NULL
, rank INT NOT NULL
, candidate INT NOT NULL
, UNIQUE (ballot, rank)
, UNIQUE (ballot, candidate)
);


For the rest of the review, I'll stick with your original names, so as to avoid confusion.

Choice of RDBMS

For a complex query like this, I recommend using a more capable database than MySQL. For example, PostgreSQL (and others) offers several advantages:

-
Support for Common Table Expressions (WITH clauses), which help tremendously with managing the complexity. MySQL does not support CTEs. As a workaround, you can create views instead. However, it's harder to edit all of your views at once as easily as a single query with CTEs, so development is more cumbersome. Also, CTEs don't pollute the global namespace, and it's easier to see how subsequent CTEs depend on earlier CTEs.

I've written my solution below using CTEs, but it's trivial to rewrite each CTE as a CREATE VIEW statement instead, and I've verified that it does work with MySQL that way.

-
Support for recursive queries. Using recursive CTEs, you should be able to write one query that processes multiple rounds of vote transfers. With MySQL, your alternative would be to write a stored procedure that manipulates a temporary table, which is less elegant.

-
Stricter interpretation of SQL. I often find that MySQL adheres loosely to the SQL standard, and makes sense of queries that shouldn't make sense. For example, you have some rather suspect SQL such as

WHERE votes_above_the_threshold = (
    SELECT MAX(votes_above_the_threshold)
    FROM vote_orders
)


that PostgreSQL would have rejected (see discussion about this excerpt below). You'll end up acquiring better habits developing on PostgreSQL than MySQL.

Overall impression

Unfortunately, your query is too complex for me to try to understand, especially since it yields wrong results. However, I can say that you did some things well:

  • Indentation is consistent and logical (though using subqueries to select attributes results in messy indentation; see below).



  • Keywords are capitalized according to common practice.



  • Column names are pretty clear and meaningful.



Some problems are:

  • Repetition, such as the threshold calculation.



  • Cryptic table aliases, such as t, c, and y.



  • Many of your subqueries are to select an attribute (SELECT (SELECT …) AS attribute). Try to formulate your subselects to appear in the FROM clause (SELECT subsel.attr FROM (SELECT attr FROM …)), which will make your indentation saner and the query generally more readable.



-
Convoluted reasoning that looks almost circular. For example, in the first two dozen lines…

SELECT c.vote_candidate, (
    SELECT (
        (MAX(votes_above_the_threshold) - (
            SELECT FLOOR((COUNT(*) / (2 + 1)) + 1) threshold
            FROM votes
        )) / MAX(votes_above_the_threshold)
    ) ratio
    FROM (
        SELECT vote_candidate vote_candidate, COUNT(*) votes_above_the_threshold
        FROM vote_orders
        WHERE vote_order = 1
        GROUP BY vote_candidate
        HAVING votes_above_the_threshold >= (
            SELECT FLOOR((COUNT(*) / (2 + 1)) + 1) threshold
            FROM votes
        )
    ) t
    WHERE votes_above_the_threshold = (
        SELECT MAX(votes_above_the_threshold)
        FROM vote_orders
    )
) surplus_ratio,


votes_above_the_threshold is defined as a column of t. t is filtered through a HAVING clause. It is further filtered by WHERE votes_about_the_threshold = (SELECT MAX(votes_above_the_threshold) FROM vote_orders), b

Code Snippets

CREATE TABLE ballots
( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
, choice1 INT
, choice2 INT
, choice3 INT
, choice4 INT
, choice5 INT
, choice6 INT
);
CREATE TABLE votes
( ballot INT NOT NULL
, rank INT NOT NULL
, candidate INT NOT NULL
, UNIQUE (ballot, rank)
, UNIQUE (ballot, candidate)
);
WHERE votes_above_the_threshold = (
    SELECT MAX(votes_above_the_threshold)
    FROM vote_orders
)
SELECT c.vote_candidate, (
    SELECT (
        (MAX(votes_above_the_threshold) - (
            SELECT FLOOR((COUNT(*) / (2 + 1)) + 1) threshold
            FROM votes
        )) / MAX(votes_above_the_threshold)
    ) ratio
    FROM (
        SELECT vote_candidate vote_candidate, COUNT(*) votes_above_the_threshold
        FROM vote_orders
        WHERE vote_order = 1
        GROUP BY vote_candidate
        HAVING votes_above_the_threshold >= (
            SELECT FLOOR((COUNT(*) / (2 + 1)) + 1) threshold
            FROM votes
        )
    ) t
    WHERE votes_above_the_threshold = (
        SELECT MAX(votes_above_the_threshold)
        FROM vote_orders
    )
) surplus_ratio,
WITH quota AS (
    SELECT FLOOR(COUNT(*) / (2 + 1)) + 1 AS threshold
        FROM votes
), preferences AS (
    SELECT 1 AS round
         , NULL AS preferred_candidate
         , vote_candidate AS candidate
         , COUNT(*) AS votes
         , CAST(NULL AS INTEGER) AS elected_in_round
        FROM vote_orders
        WHERE vote_order = 1
        GROUP BY vote_candidate
  UNION ALL
    SELECT fallback.vote_order AS round
         , preferred.vote_candidate AS preferred_candidate
         , fallback.vote_candidate AS candidate
         , COUNT(*) AS votes
         , NULL AS elected_in_round
        FROM vote_orders AS preferred
            JOIN vote_orders AS fallback
                ON preferred.vote_id = fallback.vote_id
                AND preferred.vote_order + 1 = fallback.vote_order
        GROUP BY fallback.vote_order, preferred.vote_candidate, fallback.vote_candidate
), electees AS (
    SELECT preferences.round
         , preferences.preferred_candidate
         , preferences.candidate
         , preferences.votes
         , round AS elected_in_round
         , votes - quota.threshold AS surplus
        FROM preferences, quota
        WHERE votes >= quota.threshold
            AND round = (SELECT MIN(round) FROM preferences)
), per_electee_redistribution AS (
    SELECT this_round.round AS round
         , this_round.candidate AS preferred_candidate
         , next_round.candidate AS candidate
         , this_round.votes AS preferred_votes
         , this_round.surplus AS surplus
         , next_round.votes AS fallback_votes
         , this_round.surplus * next_round.votes / (this_round.votes + 0.0) AS redistribution
        FROM electees AS this_round
            LEFT OUTER JOIN preferences AS next_round
                ON this_round.round + 1 = next_round.round
                AND this_round.candidate = next_round.preferred_candidate
        WHERE this_round.elected_in_round = this_round.round
), redistribution AS (
    SELECT round
         , candidate AS beneficiary
         , SUM(redistribution) AS redistribution
        FROM per_electee_redistribution
        GROUP BY round, candidate
  UNION ALL
    SELECT round
         , candidate AS beneficiary
         , -surplus
         FROM electees
         WHERE elected_in_round = round
), redistributed AS (
    SELECT preferences.round AS round
         , preferences.preferred_candidate
         , preferences.candidate
         , preferences.votes
         , COALESCE(redistribution.redistribution, 0) AS redistribution
         , preferences.votes + COALESCE(redistribution.redistribution, 0) AS votes_after_redistribution
        FROM preferences
            LEFT OUTER JOIN redistribution
                ON preferences.round = redistribution.round
                AND preferences.candidate = redistribution.beneficiary
), raw_next_round AS (
    SELECT round
         , NULL AS preferred_candidate
         , candidate
         , SUM(votes) AS votes
         , NULL AS elected_in_round
     

Context

StackExchange Code Review Q#54099, answer score: 19

Revisions (0)

No revisions yet.