Searching 101MeSH BrowserOnline Statistical Text'sClinical Calculator links

Kappa

 The Kappa Statistic is used frequently to measure agreement between repeat measurements of the same test, particularly when comparing results obtained by different individuals.  This is termed inter-rater reliability. Kappa measures agreement beyond chance.

Consider the following.  Bill and Steve each toss a coin twice.  On the first toss, both get heads, on the second toss, both get tails, on the third and fourth the get opposite results.   If placed in a contingency table the results look like this.

If we used this data to calculate Kappa, we would obtain a result of 0, indicating no agreement beyond what would be expected by chance alone.
 

Now consider the following.  The same  individuals read a series of x-rays to determine the presence or absence of pneumonia.  They agree on pneumonia presence in 75 cases., on it's absence in 20 cases, and they disagree a total of 5 times.  Here, agreement is clearly better than chance. 

Which gives a kappa value of 0.86, indicating excellent agreement

Definitions vary, however poor reliability is often defined as a kappa of <0.4,  fair reliability as  .4-.6, good reliability as >0.6 to 0.8, and execellent as >0.8