Reliability
Reliability involves the standards of quality of measurements, it can be described as the repeatability or the consistency of the measurements obtained. The interrater reliability indices focus on the ratings or judgments across judges or raters. Using the modern test theory and reliability generalization, the stability of ratings obtained by raters, observers or judges in a particular event, person or object can be measured using the interrater reliability indices. In conducting research, the concept of collecting data using raters or observers involves taking a look at the relative consistency of the judgments made by more than two raters and using this as data for formulating the findings and recommendations for the research (Gamer, Lemon & Singh, 2016).
Data Collection through Raters or Observers
One of the scenarios where interrater reliability can be used as a form of data collection for research is when a researcher who is interviewing respondents on a particular makes a judgement about the respondent once the interview is completed. For instance, the interviewer may develop a scale of 0-10 to judge how the respondent was interested in the survey that was being conducted. In some cases, the respondents might seem disinterested in the research meaning that he or she might give information that is not correct. Another case where the raters or observers data can be used in a research is when collecting observational data on a particular participant in a research, a neighborhood or a household to add more data on the data that has been collected through other data collections methods such as questionnaires. By observing the way participants respond to questions or how they carry out their activities, the data collected through other methods of data collection can be complemented to make sure that research findings are formulated in a proper way.
Refusal Report in Data Collection
This form of data collection in research can also be used in a scenario where interviewers use a refusal report. Refusal occurs in a study when some participants refuse to respond to some questions or complete the study. Calculating the refusal proportions is important, this is because as a type of non-response, it can be used to calculate error estimates. Through this error estimates, the researcher collecting data can be able to observe the participants and determine the reliability of the data which has been collected from the participant.
Evaluating Interrater Reliability
In evaluating interrater reliability in research, different statistical procedures are applied. Interrater reliability can be evaluated using the Observer Agreement Percentage, this measure involves the percentages of agreement between two or more raters on a particular observation. For instance, if two judges agree on a child's playful behavior and the interval of time when they observed a playful behavior of the child, there Observer Agreement Percentage will be based on the number of intervals that they agree upon. Interrater reliability can also be evaluated using the Interobserver Correlations, this involves the calculation of the Spearman or Pearson correlation coefficient when the observations of the judges are ranked or are consistent in a respective manner (Kline, 2005). However, this has a major problem when finding the methods of agreement, this is because the method reflects the random error and the deviations in judgment as observed by different judges. Another statistical procedure that can be used to evaluate the interrater reliability is Kendall's Coefficient of Concordance, this is used when there are more than two judges with rank order using a stimuli series (Koo & Li, 2016). The stimuli can range from people to objects and the judges give their desired rank order. Under this statistical analysis, Friedman's two-way analysis is used to rank the judgments from the multiple.
References
Gamer, M., Lemon, J., " Singh, I. F. P. (2016). IRR: various coefficients of interrater reliability and agreement. 2012. R package version 0.84.
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Sage.
Koo, T. K., " Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155-163.