Problem 1
Simple Linear Regression (Formula Method)
In this problem let us try to find the relation between Exam Score y and Study hours x.
Then we will interpret the regression equation, the correlation coefficient R, and coefficientof determination R squared.
This process is called a Basic Regression Analysis
Question
A teacher wants to check if there is any connection between the number of hours a student prepares for an exam with their score. He records study hours and exam score.
Fit the regression line of exam score y
on study hours x. Interpret the regression equation, R, and R squared.
Independent variable
Study hours, x
Dependent variable
Exam score, y
Observed data
| x (hours) |
1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |
| y (score) |
50 | 54 | 57 | 61 |
62 | 67 | 69 | 73 |
Solution
Step 1 Calculate The Sums (Use a Calculator in STAT mode)
Σx = 36
Σy = 493
Σx² = 204
Σxy = 2352
n = 8
Step 2 compute the regression coefficient which is for a mathematician the slope (byx)
$$ b_{yx} = \frac{n\Sigma xy - (\Sigma x)(\Sigma y)}{n\Sigma x^{2} - (\Sigma x)^{2}} $$
$$ b_{yx} = \frac{8(2352) - 36(493)}{8(204) - 36^{2}} $$
$$ b_{yx} = \frac{18816 - 17748}{1632 - 1296} $$
$$ b_{yx} = \frac{1068}{336} = 3.179 $$
Step 3 Now Mean Values Of x and y
$$ \bar{x} = \frac{\Sigma x}{n} = \frac{36}{8} = 4.500 $$
$$ \bar{y} = \frac{\Sigma y}{n} = \frac{493}{8} = 61.625 $$
Step 4 Now Plugin to the formulae and simplify to get the regression equation
y − ȳ = byx(x − x̄)
(centred form)
$$ y - 61.625 = 3.179(x - 4.500) $$
$$ y = 47.321 + 3.179x $$
Final regression equation (rounded to 3 decimals)
ŷ = 47.321 + 3.179x
Correlation coefficient and coefficient of determination
Karl Pearson’s formula
$$ R =
\frac{n\Sigma xy - (\Sigma x)(\Sigma y)}
{\sqrt{\left[n\Sigma x^{2} - (\Sigma x)^{2}\right]\left[n\Sigma y^{2} - (\Sigma y)^{2}\right]}} $$
First find Σy² = 30809
(50² + 54² + 57² + 61² + 62² + 67² + 69² + 73²)
$$ R =
\frac{8(2352) - 36(493)}
{\sqrt{\left[8(204) - 36^{2}\right]\left[8(30809) - 493^{2}\right]}} $$
$$ R =
\frac{18816 - 17748}
{\sqrt{\left[1632 - 1296\right]\left[246472 - 243049\right]}} $$
$$ R = \frac{1068}{\sqrt{(336)(3423)}} = 0.99586 $$
R² = (0.99586)² = 0.99174
which is 99.174 percent
Interpretation
The slope (regression coefficient) gives the average change in score for a one hour increase in study time.
Here it is about 3.179 marks per hour.That means, every extra hour the student studies, the score increases by 3.179
The value of R is positive and very close to 1, so the relationship is strongly positive and close to linear.
The value of R squared gives us the amount of influence of study time in exam scores is explained by this linear model is about 99.174 percent.