patternsqlModerate
Calculating Lost Reputation
Viewed 0 times
calculatinglostreputation
Problem
A discussion arose not long ago on the 2nd Monitor about how much reputation has been lost due to the reputation caps. There are a number of queries on the SEDE which try to address this:
All of these queries (there are others as well) appear to suffer from the same two flaws... ( ... until now with the query in this question .... )
To understand why the post migration and deletion is important, you have to understand that reputation caps happens on a daily cycle. The number of days that people max their rep is much lower (for non-Skeets) than the number of times things are deleted, etc. There are things that happen on days other than rep-max days, that affect your reputation, and mean that voting records for those things are not available on the Data Explorer. Your user record records your total reputation, but there is no way to sum up all the reputation events and re-create that. Thus, any attempts to figure out how much rep is lost will come up short, or long, and this leads to funny things like negative-lost-reputation.
To understand why the order is significant, consider the following poor user 'Bob':
If the order was different, then it could be lucky 'Bill' instead:
- A Users total Rep ON META if there was no reputation cap
- Percent lost to reputation cap among top 1000 users
- Top-rated users if there was no reputation cap
All of these queries (there are others as well) appear to suffer from the same two flaws... ( ... until now with the query in this question .... )
- the user's actual reputation is affected by things that are complicated, like posts being deleted, migrated on to, and off from the site, the starting rep bonus, and other things.
- that you cannot calculate a person's lost reputation without knowing the order in which the votes happened.
To understand why the post migration and deletion is important, you have to understand that reputation caps happens on a daily cycle. The number of days that people max their rep is much lower (for non-Skeets) than the number of times things are deleted, etc. There are things that happen on days other than rep-max days, that affect your reputation, and mean that voting records for those things are not available on the Data Explorer. Your user record records your total reputation, but there is no way to sum up all the reputation events and re-create that. Thus, any attempts to figure out how much rep is lost will come up short, or long, and this leads to funny things like negative-lost-reputation.
To understand why the order is significant, consider the following poor user 'Bob':
- Lucky day, and ends up at exactly 200 rep from a good answer.
- Someone upvotes again, and he loses 10 rep to the cap.
- Someone downvotes and he loses 2 (198)
- Someone downvotes again, he loses another 2
- Now his rep is 196 and the day ends ... sorry, no mortarboard for you, and you lost 10 rep
If the order was different, then it could be lucky 'Bill' instead:
- Lucky day, and ends up at exactly 2
Solution
Going through this code again, it appears there are a few things that could be improved.
Use #Temp Table for User Selection
Instead of doing a cheap-shot UNION select to get the set of users to process, the right way would be to create a temp table, and then conditionally populate it:
This approach is more verbose, but the intent of the code is clearer and it makes it more maintainable.
Explicit vs. Implicit JOIN
Explicit joins have been available in SQL since the SQL-92 standard. The code in this question uses the coding standards from the older SQL-89 standard.
There is nothing technically wrong with the implicit join syntax used in the code, but it is old-fashioned, and using explicit join syntax allows the join conditions in larger queries to be more obvious... changes would be, for example:
would become:
Use #Temp Table for User Selection
Instead of doing a cheap-shot UNION select to get the set of users to process, the right way would be to create a temp table, and then conditionally populate it:
create table #TopUsers (
UserId int not null,
Reputation int not null)
if @userid < 0
begin
insert into #TopUsers
select Users.Id as UserId, Users.Reputation as Reputation
from Users, Badges
where badges.Name = 'Mortarboard'
and Badges.UserId = Users.id
end else begin
insert into #TopUsers
select Id, Reputation
from Users
where Id = @userid
end
declare TOPUSERS cursor for
select *
from #TopUsersThis approach is more verbose, but the intent of the code is clearer and it makes it more maintainable.
Explicit vs. Implicit JOIN
Explicit joins have been available in SQL since the SQL-92 standard. The code in this question uses the coding standards from the older SQL-89 standard.
There is nothing technically wrong with the implicit join syntax used in the code, but it is old-fashioned, and using explicit join syntax allows the join conditions in larger queries to be more obvious... changes would be, for example:
select Users.Id as UserId, Users.Reputation as Reputation
from Users, Badges
where @userid < 0
and badges.Name = 'Mortarboard'
and Badges.UserId = Users.idwould become:
select Users.Id as UserId, Users.Reputation as Reputation
from Users
inner join Badges
on Badges.UserId = Users.id
where @userid < 0
and badges.Name = 'Mortarboard'Code Snippets
create table #TopUsers (
UserId int not null,
Reputation int not null)
if @userid < 0
begin
insert into #TopUsers
select Users.Id as UserId, Users.Reputation as Reputation
from Users, Badges
where badges.Name = 'Mortarboard'
and Badges.UserId = Users.id
end else begin
insert into #TopUsers
select Id, Reputation
from Users
where Id = @userid
end
declare TOPUSERS cursor for
select *
from #TopUsersselect Users.Id as UserId, Users.Reputation as Reputation
from Users, Badges
where @userid < 0
and badges.Name = 'Mortarboard'
and Badges.UserId = Users.idselect Users.Id as UserId, Users.Reputation as Reputation
from Users
inner join Badges
on Badges.UserId = Users.id
where @userid < 0
and badges.Name = 'Mortarboard'Context
StackExchange Code Review Q#40595, answer score: 10
Revisions (0)
No revisions yet.