The interobserver reliability of a survey instrument, like a psychological test, measures agreement between two or more subjects rating the same object, phenomenon, or concept.
For example, 5 critics are asked to evaluate the quality of 10 different works of art ("objects"), e.g. using scores from "A" (the highest) to "D" (the lowest). In an ideal case of the maximal interobserver reliability, all critics provide the same score for the same object (and the scores vary from object to object).
A related concept is Intraobserver Reliability .