Reliability and Its Categories
Reliability can be described as a measure of consistency; which psychologists classify into three categories: internal consistency (across items), test-retest reliability (over time), and alternative forms (across different researchers/inter-related reliability) (Kline, 2015). Hence, it is the test through which the reliability of a test is employed to ensure appropriate stability is assessed.
Test-Retest Reliability
Test-retest reliability is an index which assesses consistency over time from a particular sample in a given test. For instance, intelligence is assumed to be consistent over time – an individual who is highly intelligent will most probably be more intelligent in the subsequent week (Kline, 2005). As such, a good measure should produce almost the same outcomes over time, and a test which produces inconsistent scores is not a good measure. The test-retest reliability comprises a set of scores in which a given sample is assessed using a particular test. The initial analysis, denoted as T1, should produce almost the same results as a later trial, T2, and the stimuli or items used should be the same in all aspects.
The Reliability Index
The reliability index is the zero-order correlation between T1 and T2 (Kline, 2005). Same results from the two tests mean correlation is perfectly 1.0, which is not achievable. The correlation is represented by the variance, of which zero-correlation is an already squared figure, as required. Measurement errors are inevitable in tests, and the results from T1 and T2 must vary because of random errors and reasons (Kline, 2015). For example, a common issue affecting the test is how fewer individuals tend to undertake the latter analysis, T2, than T1. Some people might drop out, fall sick, not show up, among other reasons (Kline, 2015).
Alternative Forms for Assessing Reliability
Alternative forms test for assessing reliability was developed primarily to overcome the problem of situational changes and carryover effects on test takers. Besides the cost incurred through the analysis, extreme measures must be observed to ensure the items being assessed are equal or parallel across the test versions. The approach entails the collection of data using one form of a test, say Form 1, at a given time, T1 (Kline, 2005). Followed by a later analysis with corresponding data Form 2 and at T2 undertaken on the same participants who participated in the earlier test. Conversely, the zero-order correlation outcome between the two test scores is used to provide the reliability index (DeVellis, 2016).
Equality of Test Items in Alternative Form Reliability
It is noteworthy to mention that the equality aspect of the test items is the primary concern regarding alternative form reliability. In fact, initially, experts' comparisons and opinions of pass rates were the only measures to establish equivalence involving two tests. Implying that concerning this approach, test developers must ensure that items are equivalent for the reliability index to be reasonable (DeVellis, 2016). Attrition sometimes influences the approach; sometimes it is mitigated by shorter periods (time) between the two tests, where the sample used in T2 may be smaller than that of T1, because of varied reasons (Kline, 2005). Although the assessment is undertaken on two separate occasions, index interpretation follows the same procedure as the test-retest reliability assessment. Therefore, the stability of the outcomes is from T1 to another test, T2.
Conclusion
In conclusion, the approaches discussed also use the same procedure to interpret the assessment outcomes; however, the variance between the two tests varies. The methods presented are effective at establishing the reliability of items, which allows psychologists to employ multiple or same patients while having the confidence of realizing consistency thus sound capacity decisions. Meaning, if the outcome from the two tests differs significantly, then the analysis is not consistent, implying something is wrong with the test, thus making it a bad measure.
References
DeVellis, R. F. (2016). Scale development: Theory and applications (Vol. 26). Sage publications.
Kline, P. (2015). A handbook of test construction (psychology revivals): introduction to psychometric design. Routledge.
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Sage.