1) Describe what is being measured and the level of measurement for the following variables: P344pr, A093r, SexHRP, A094r, and G018r.
The living costs and food survey (LCF) explore information on various households’ budgets in UK as reflected by data gathered from expenditure and cost of living. It cut across the UK every on annual basis making the most substantial source expenditure information. The dataset contains several variables, which can be either qualitative or quantitative in nature. The variables P344pr, A093r, SexHRP, A094r, and G018r
are part of the LCF2013 dataset. The variable P344pr represents the gross normal weekly household income and it is continuous with interval level of measurement. A093r represents economic position of referenced person and it is categorical nominal variable, which assumes the national statistics statistic socioeconomic classification of household. The categories includes economically inactive, unemployed and work related government training programs, part-time working, and full-time working. SexHRP represents the gender of referenced household member, which is a dichotomous nominal variable (male or female). A094r represents the reference person, that is, the occupation capacity or type held by the participant. The categories include higher managerial, administrative and professional occupations; intermediate occupations; never worked and long term unemployed, students and occupation not stated; not classified for other reasons; and routine and manual occupations. The reference person is a categorical nominal variable. G018r
is the number of adults in the household, that is, the count of all persons aged 18 years and above in the household; therefore it is a discrete variable, which take the ratio level of measurement.
2) Using the appropriate measures, report and interpret the central tendency and dispersion for P550tpr, P425r, A121r, and G019r. You should report your output in a table or plots.
Measure of central tendency and dispersion are important in making explorative inquiry of the data of interest. The measures of central tendency includes the mean, median and mode while measures of variability or dispersion entails the variance, standard deviation and range. In addition, descriptive plots or rather charts can be used to represent the data provided. The variables needs to summarized includes P550tpr, P425r, A121r, and G019r, which actually measures total weekly household expenditure, main income source of household income, household tenure and number of children in the household respectively. Table 1 below indicate measures of central tendency and dispersion for the total weekly household expenditure. The statistic provided in the table indicates that the mean total household expenditure = 479.76 (SD = 292.37) on a weekly basis. The median weekly total household expenditure = 419.90.
Table 1: Descriptive statistics for the total weekly expenditure
Standard deviation (SD)
The other three variables explored variables are in categorical variables, which are exploring using frequency charts such as bar and pie charts. In particular, home tenure and main source of household income are displayed using pie charts while the number of children in the household is represented using a bar chart.
3) Graphically display the distribution of P344pr by Gorx. How does P344pr vary between and within Gorx? Interpret your results.
PART B: Inferential Statistics: Confidence intervals, chi square and t-tests
1) Calculate and interpret a 95% confidence interval for the sample mean of P550tpr. Explain your working.
2) Calculate and interpret a 99% confidence interval for the sample proportion working full time (A093r) of those in employment or looking for work (i.e. A093r!= "Economically inactive"). Explain your working.
3) Create and report a cross tabulation between G018r and A121r for those living in the North West and Merseyside region. Describe any patterns observed in the table and determine if there is a statistically significant association.
4) Report the strength of the association using the appropriate test. Interpret what this can tell you about the relationship between G018r and A121r.
5) Report and interpret the mean gross normal weekly household income (P344pr) for those who work (full-time or part-time) in higher managerial, administrative and professional occupations versus those who work (fulltime or part-time) in lower social class occupations (A094r) in the full sample (i.e. living in all regions in the UK).
6) Is there a statistically significant difference in mean gross normal weekly household income (P344pr) between those who work in higher managerial, administrative and professional occupations versus those who work in lower social class occupations (A094r)? State a null hypothesis and alternative hypothesis. Explain why you chose this test and whether the data meet the assumptions to conduct the test.
PART C: Correlation and linear regression
1) State a research hypothesis on the relationship between P550tpr and P344pr. Give a brief explanation as to why you would expect this hypothesis.
2) Report the correlation between P550tpr and P344pr. Graphically display and statistically test the relationship between these variables. Interpret your results both statistically and substantively.
3) Estimate and present output from a simple regression model using P550tpr as the dependent variable and P344pr as the explanatory variable.
4) Check for heteroskedasticity in your model using a plot and an appropriate post-estimation function. Interpret the results and correct you model, if necessary.
5) Interpret your model statistically and substantively drawing on your hypothesis above.
6) Estimate expenditure when the value of your explanatory variable is £1,200. Indicate why it may not be appropriate to use your model to make this prediction.
7) Comment on the limitations of your model and whether you can infer causality.