patternModerate

Peer grading design - choosing a graph, to get accurate rankings/ratings

Submitted by: @import:stackexchange-cs·Mar 10, 2026·

Viewed 0 times

graphratingsgradingdesignpeerrankingsgetaccuratechoosing

Problem

Background. I am writing some code for semi-automated grading, using peer grading as part of the grading process. Students are given pairs of essays at a time, and the students have a slider to choose which is better and how much better it is. e.g., the slider might look something like this:

A---X-B

Based on the results of the peer grading, essays are ranked and the teacher will then grade the top X% and bottom X% and scores for all essays will be automatically calculated based on this. I have already come up with methods for doing this ranking/scoring process; that part works well.

My question. How should I select which pairs of essays to give to students?

Simulations suggest we need an essay to be peer-graded at least 3 times, to get an accurate ranking. Thus, each essay should appear in at least 3 of the pairs that are presented for peer grading.

We can think of this as a graph problem. Think of the essays as nodes. Each edge represents a pair of essays that are presented during the peer grading process. The accuracy results above suggest that the degree of each node (or of most nodes) should be at least 3. What sort of graph should I use? How should I generate the graph to be used during peer grading?

One challenge is that if you have clusters in the graph, this will skew the peer-gradings. For example, we wouldn't want to have high-quality essays peer-graded mostly against high-quality essays, because that would skew the results of the peer grading.

What would you recommend?

I think this problem could be modelled with a undirected graph using something like the following:

Start by taking the node with the least degree and link it with the next least

Continue until your average degree is at least 3

Maximise node connectivity

Minimise number of cliques

Is this a good approach? If not what would you recommend instead?

Solution

There are two parts to this: (a) selecting a graph (experimental design) to determine which pairs of essays the students will evaluate in the peer grading process, and (b) ranking all the essays, based upon the student's peer grades, to determine which the teacher should rank. I will suggest some methods for each.

Choosing a graph

Problem statement. The first step is to generate a graph. In other words, you need to select which pairs of essays to show to the students, during the peer grading exercise.

Suggested solution. For this task, I suggest that you generate a random graph $G$, selected uniformly at random from the set of all 3-regular (simple) graphs.

Justification and details. It is known that that a random $d$-regular graph is a good expander. In fact, the regular graphs have asymptotically optimal expansion factor. Also, because the graph is random, this should eliminate the risk of skewing the grading. By selecting a graph uniformly at random, you are ensuring that your approach is equally fair to all students. I suspect that a uniformly random 3-regular graph will be optimal for your purposes.

This raises the question: how do we select a 3-regular (simple) graph on $n$ vertices, uniformly at random?

Fortunately, there are known algorithms for doing this. Basically, you do the following:

-
Create $3n$ points. You can think of this as 3 copies of each of the $n$ vertices. Generate, uniformly at random, a random perfect matching on these $3n$ points. (In other words, repeat the following procedure until all $3n$ points are paired off: select any unpaired point, and pair it with another point chosen uniformly at random from the set of unpaired points.)

-
For each two points that are matched by the matching, draw an edge between the corresponding vertices (that they are a copy of). This gives you a graph on $n$ vertices.

-
Next, test if the resulting graph is simple (i.e., it has no self-loops and no repeated edges). If it is not simple, discard the graph and go back to step 1. If it is simple, you are done; output this graph.

It is known that this procedure generates a uniform distribution on the set of 3-regular (simple) graphs. Also, it is known that at step 3 you have a constant probability of accepting the resulting graph, so on average the algorithm will do $O(1)$ trials -- so this is pretty efficient (e.g., polynomial running time).

I have seen this approach credited to Bollobas, Bender, and Canfield. The approach is also summarized briefly on Wikipedia. You can also find a discussion on this blog post.

Technically speaking, this requires that the number $n$ be even (otherwise there is no 3-regular graph on $n$ vertices). However, this is easy to deal with. For instance, if $n$ is odd, you can randomly choose one essay, set it aside, generate a random 3-regular graph on the remaining essays, then add 3 more edges from the set-aside essay to 3 randomly chosen other essays. (This means that there will be 3 essays that are actually graded 4 times, but that shouldn't do any harm.)

Ranking all the essays

Problem statement. OK, so now you have a graph, and you have presented these pairs of essays (as indicated by the edges in the graph) to the students for them to grade during the peer grading exercise. You have the results of each comparison of essays. Now your task is to infer a linear ranking on all of the essays, to help you determine which ones to have the teacher evaluate.

Solution. I suggested you use the Bradley-Terry model. It is a mathematical approach that solves exactly this problem. It was designed for ranking players in some sport, based upon the results of matches between some pairs of the players. It assumes that each player has an (unknown) strength, which can be quantified as a real number, and the probability that Alice beats Bob is determined by some smooth function of the difference of their strengths. Then, given the pairwise win/loss records, it estimates the strength of each player.

This should be perfect for you. You can treat each essay as a player. Each comparison between two essays (during the peer grading process) is like the result of a match between them. The Bradley-Terry model will allow you to take all of that data, and infer a strength for each essay, where higher strengths correspond to better essays. Now you can use those strengths to rank-order all of the essays.

Details and discussion. In fact, the Bradley-Terry model is even better than what you asked for. You asked for a linear ranking, but the Bradley-Terry model actually gives a (real-number) rating to each essay. This means you know not only whether essay $i$ is stronger than essay $j$, but a rough estimate of how much stronger it is. For instance, you could use this to inform your selection of which essays to rank.

There are alternative ways to infer ratings or rankings for all the essays, given the data you have. For instance, the Elo method is ano

Context

StackExchange Computer Science Q#18493, answer score: 10

Revisions (0)

No revisions yet.