Outlier Analysis

Updated 12 December 2023

Downloads 50

Category Education , Health , Information Science and Technology , Science

This sample was provided by a student, not a professional writer. Anyone has access to our essays, so likely it was already used by other students. Do not take a risk and order a custom paper from an expert.

Outliers and their Causes

Outliers are data points, sets of data or observations that fall far outside the normal variable population (Osborne & Overbay, 2004). Such data is inconsistent with the majority of the intended population or the variable range. It can be brought about by an experimental error or special cases of heavily skewed distribution which in such a case the assumption of normal distribution may be void. In other cases, it is a chance phenomenon (Hawkins, 2014). Some statistical estimators and calculations can deal with the occurrence of outliers while others cannot.

Causes of Outliers

According to Osborne & Overbay (2004), outliers can arise from either error made in data collection, recording or entry. Such errors can be corrected by returning to the original documents or the subjects and getting the correct values. For example, when conducting a study of the variation of self-esteem with age, and an entry on the age of students in a university leads to a wrong entry of one student being 3 years old. This is an obvious outlier that can be reviewed by the original subjects to represent a valid figure. Other outliers can be due to intentional misreporting to sabotage or influence the results, sampling error, standardization failure, faulty distribution assumptions (Rosenfeld & Penrod, 2013). Others can even be as a legitimate and from the correct population (Osborne & Overbay, 2004). For instance, in the same case of the university where the student's age range between 18 to 24 years with many of them falling between 14 and 40 then if there is a participant who is legitimately aged 75 years, it becomes an outlier sampled from the right population.

Effects of Outliers on Statistical Analysis

To researchers, the outliers play a major role in the statistical analysis. They weaken the power of statistical tests such the standard deviation and increasing the variance error. They significantly influence the magnitude of correlation hence making them inaccurate. If they are not randomly distributed they end up decreasing the normality of the distribution (Osborne & Overbay, 2004). The mean, for instance, in the above example increases or decreases depending on the position of the age outlier. To the data distribution, if the outlier is significantly below the range, then the distribution will be skewed towards the left whereas if the outlier is significantly higher than the range the distribution will be skewed towards the right (Hawkins, 2014).

Impact on Statistical Measures

The mean and median are affected by outliers while the mode is rarely affected. Taking the age example above, the 75-year-old student will increase the mean age of the student since the mean depends on the average of all the subjects (Rosenfeld & Penrod, 2013). In some cases, it also alters the position of the median. Let’s take an example of a distribution of the age of the university students to be 14, 15, 18, 18, 18 19, 20, 21, 22, 23, 24, 26, 30 32, and 75. With the presence of the outlier 75, then the median is 21 but if the outlier was not present then the median would have been the average of 20 and 21. However, this is in a few cases because median exists in the center of any given set of numbers hence if the number of subjects was fixed then it would not have been affected. In the case of the mean, it is not resistant and will tend to move towards the outlier. If we replaced the age 75 to 33 which fall in the range the mean would be 22.2, however with the outlier the mean moves to 25.

Identifying and Handling Outliers

To identify the outliers, a researcher needs to examine data for any skewed data points that are influential. Researchers are at liberty alter, remove or not to remove the outliers especially the legitimate ones. According to Osborne & Overbay (2004), there is a benefit in the removal of the outliers based on the fact that there were significant effects of accuracy and error rates in the correlation and t-tests. However, some researchers consider accommodating the outliers using “robust” methods in order to maintain the real picture of the study. For instances in the case of univariate distributions, researchers can use a trimmed mean or truncation. In our case, this might involve making an assumption that there cannot be a 75-year old person at the university and trim this age to a reasonable highest value of maybe 40.

References

Hawkins, D. (2014). Identification of outliers. Amsterdam: Springer.

Osborne, JasonW. "AmyOverbay (2004). The power of outliers (and why researchers should

always check for them). Practical Assessment, Research "Evaluation. North Carolina: North Carolina State University.

Rosenfeld, B., " Penrod, S. D. (2013). Research methods in forensic psychology.

Hoboken: Wiley.

Deadline is approaching?

Wait no more. Let us write you an essay from scratch

Receive Paper In 3 Hours

Calculate the Price

Type of service

Type of paper

Academic level

Pages

275 words

Urgency

First order 15%

Total Price:

^$38.07 ^$38.07

Calculating

Hire an expert

This discount is valid only for orders of new customer and with the total more than 25$

This sample could have been used by your fellow student... Get your own unique essay on any topic and submit it by the deadline.

Find Out the Cost of Your Paper

Get Price