May 12
1
Introduction
Mean quadratic weighted kappa is useful for measuring the agreement between a set of raters/scorers (e.g. judges scores) as it excludes random-agreement (an example of random agreement is if you had two raters/scores scoring something randomly they would still, over a large data set, have some sort of random agreement between each others scores). The value of kappa is between -1 and 1 where -1 is complete disagreement, 0 is random agreement and 1 is complete agreement.
Calculation
Definitions
1. Let the possible scores/ratings range from s1 to sN e.g. s1=3, s2=4, s3=5 (there are 3 possible scores here 3,4 or 5)
2. Let j1 and j2 be the set set of scores allocated by each judge/scorer e.g. j1=(3,3,5), j2=(5,5,5) (judge-1 scored 3, then 3, then 5, and, judge 2 scored all fives)
Step 1 – Calculate the agreement matrix
The dimensions of this matrix are equal to the number of possible ratings, where the one dimension corresponds to judge 1 and the other corresponds to judge 2. Each element in the matrix is calculated by counting the number of times that combination of scores was given. e.g. Let A be the agreement matrix
Let
s1=3, s2=4, s3=5
j1=(3,4,5,4), j2=(5,4,5,4)
Then A is calculated to be:
| 0 | 0 | 1 |
| 0 | 2 | 0 |
| 0 | 0 | 1 |
Step 2 – Calculate the score histogram for each rater
The score histogram is calculated for each rater. It is a vector where each component contains the number of times that judge scored the given score, that is, there is a component for each possible score.
e.g. Let H1,H2 be the score histograms for judge 1 and 2 respectively
Let
s1=3, s2=4, s3=5
j1=(3,4,5,4), j2=(5,4,5,4)
Then H1 is:
(1,2,1)
H2 is:
(0,2,2)
Step 3 – Calculate Mean Quadratic Weighted Kappa
For each possible score combination from each rater (e.g. (3,3),(3,4),(3,5),(4,3)…(i,j)). So if N is the number of possible scores and T is the number of scores made by each judge then we sum over each i,j combination of scores:
Kappa = 1 – SUMi,j of (((i-j)^2)*A[i,j]/((N-1)^2)*T^2))/((((i-j)^2)*(H1[i]*H2[j]))/((N-1)^2)*T^2)
A numeric calculation of the above equation for cappa in pseudo code would be:
for(int i = 0 ; i < N; i++)
{
for(int j = 0 ; j < N; j++)
{
expected_count = ((H[i]*H[j]) / T);
weight = ((i-j)^2) / ((N-1)^2);
numerator += (weight*A[i,j]) / T;
denominator += (weight*expected_count) / T;
}
}
kappa = 1.0 - (numerator / denominator);
Example:
If two judges had to place 6 scores where the scores ranged from 4 to 6 and the scores were as follows:
j1 = (4,4,5,6,5,6)
j2 = (5,4,6,5,4,5)
Then:
A =
| 1 | 1 | 0 |
| 1 | 0 | 1 |
| 0 | 2 | 0 |
H1 = (2,2,2)
H2 = (2,3,1)
T = 6
N = 3