QCAA General Mathematics Bivariate data analysis 2
15 sample questions with marking guides and sample answers
A least squares line can be used to model the birth rate (children per 1000 population) in a country from the average daily food energy intake (megajoules) in that country.
When a least squares line is fitted to data from a selection of countries it is found that:
- for a country with an average daily food energy intake of 8.53 megajoules, the birth rate will be 32.2 children per 1000 population
- for a country with an average daily food energy intake of 14.9 megajoules, the birth rate will be 9.9 children per 1000 population.
The slope of this least squares line is closest to
2.7
25
Reveal Answer
Incorrect. This value does not match the result of the slope formula .
Correct. The slope is the change in birth rate divided by the change in energy intake: .
Incorrect. This is the reciprocal of the slope, incorrectly calculated by dividing the change in by the change in ().
2.7
Incorrect. The data shows that as energy intake increases, birth rate decreases, meaning the slope must be negative.
25
Incorrect. This value is positive, but the inverse relationship between energy intake and birth rate requires a negative slope.
A confounding variable is a variable that
can only take on a certain number of values.
remains constant throughout a statistical investigation.
is used to predict a difference in the response variable.
other than the explanatory variable, influences the response variable.
Reveal Answer
can only take on a certain number of values.
This describes a discrete variable, which is defined by having a countable number of possible values, rather than a confounding variable.
remains constant throughout a statistical investigation.
A value that remains constant is simply a constant or a controlled variable, whereas a confounding variable varies and affects the outcome.
is used to predict a difference in the response variable.
This describes the explanatory (or independent) variable, which is the specific factor the researcher is studying to see if it causes a change.
other than the explanatory variable, influences the response variable.
A confounding variable is an outside influence that affects the response variable and is related to the explanatory variable, making it difficult to determine the true cause of the observed effect.
Use the following information to answer the question.
The least squares equation for the relationship between the average number of male athletes per competing nation, males, and the number of the Summer Olympic Games, number, is
At which Summer Olympic Games will the predicted average number of males be closest to 25.6?
31st
32nd
33rd
34th
Reveal Answer
31st
Incorrect. Substituting 31 into the equation yields , which is not the closest value to 25.6.
32nd
Incorrect. Substituting 32 into the equation yields , which is not the closest value to 25.6.
33rd
Correct. Setting the equation to and solving for gives , making the 33rd Games the closest.
34th
Incorrect. Substituting 34 into the equation yields , which is not the closest value to 25.6.
It is observed that as the number of ice blocks sold each month increases, the number of fans sold also increases. Which of these statements is therefore true?
There is a negative causation between the number of ice blocks sold and the number of fans sold each month.
There is a positive causation between the number of ice blocks sold and the number of fans sold each month.
There is a negative association between the number of ice blocks sold and the number of fans sold each month.
There is a positive association between the number of ice blocks sold and the number of fans sold each month.
Reveal Answer
There is a negative causation between the number of ice blocks sold and the number of fans sold each month.
This is incorrect because a 'negative' relationship implies that as one variable increases, the other decreases. Additionally, observational data establishes association, not necessarily causation.
There is a positive causation between the number of ice blocks sold and the number of fans sold each month.
This is incorrect because correlation does not imply causation. While the variables move together, it is likely a third variable (like hot weather) causes both to increase, rather than ice blocks causing fan sales.
There is a negative association between the number of ice blocks sold and the number of fans sold each month.
This is incorrect because a negative association describes an inverse relationship where one variable decreases as the other increases, contradicting the observation that both increase.
There is a positive association between the number of ice blocks sold and the number of fans sold each month.
This is correct because 'positive' indicates that both variables move in the same direction (both increase), and 'association' correctly describes the statistical relationship without assuming one causes the other.
Determine the equation of the least-squares line where and .
Reveal Answer
While the slope () is correct, the y-intercept is incorrect. The intercept should be calculated as .
This is the correct equation. The slope is , and the y-intercept is .
This option incorrectly calculates the slope by inverting the ratio of standard deviations (). The correct slope formula is .
This option uses an incorrect slope () derived from inverting the standard deviations and an incorrect intercept calculation.
A non-causal explanation is most likely for a strong association between which pair of variables?
a town's child population and its number of schools
a phone's hours of use and its remaining battery percentage
the amount of bread sold by a bakery and its number of customers
the number of cars at a petrol station and the height of the car drivers
Reveal Answer
a town's child population and its number of schools
An increase in a town's child population directly causes a need for more schools, making this a causal relationship.
a phone's hours of use and its remaining battery percentage
Using a phone directly causes its battery to drain, which is a clear causal relationship.
the amount of bread sold by a bakery and its number of customers
An increase in the number of customers directly causes an increase in the amount of bread sold, indicating a causal relationship.
the number of cars at a petrol station and the height of the car drivers
There is no logical reason why the number of cars at a petrol station would cause a change in the drivers' heights, or vice versa. Any strong association between these variables would likely be non-causal, such as a coincidence.
The table shows the number of sales for a small business in their first six months of trading.
| Time in months, | Number of sales, |
|---|---|
| 1 | 86 |
| 2 | 180 |
| 3 | 160 |
| 4 | 226 |
| 5 | 240 |
| 6 | 335 |
Use your calculator to determine the equation of the least-squares line.
Reveal Answer
| Descriptor | Marks |
|---|---|
Correctly determines the equation of the least-squares line | 1 |
Use the equation from Question 16a) to predict the number of sales in the 21st month.
Reveal Answer
Let
The predicted number of sales is 950.
| Descriptor | Marks |
|---|---|
Substitutes into equation from Question 16a) | 1 |
Predicts number of sales | 1 |
Which statement is always true for a causal relationship between an explanatory variable and a response variable?
One of the variables is a confounding variable.
The relationship is explained by a third variable.
There is a positive association between the variables.
The response variable is dependent on the explanatory variable.
Reveal Answer
One of the variables is a confounding variable.
A confounding variable is an external third variable that influences both the explanatory and response variables; the explanatory and response variables themselves are not confounders.
The relationship is explained by a third variable.
If a relationship is entirely explained by a third variable, the association is often spurious rather than causal; a direct causal relationship implies the explanatory variable itself influences the response.
There is a positive association between the variables.
Causal relationships can be negative (inverse) as well as positive; for example, increased price often causes a decrease in sales.
The response variable is dependent on the explanatory variable.
In a causal relationship, the explanatory variable is the cause and the response variable is the effect, meaning changes in the response variable depend on changes in the explanatory variable.
The table shows the average superannuation account balance for workers of various ages in two different industries. The coefficient of determination, , for age versus account balance is 0.95 for industry A and 0.96 for industry B. 40-year-old Leigh works in the industry for which age explains a higher percentage of the account balance variation. Tony is 10 years older than Leigh and works in the other industry.
| Age (years) | Account balance ($) Industry A | Account balance ($) Industry B |
|---|---|---|
| 22 | 7500 | 8100 |
| 32 | 42 000 | 60 000 |
| 42 | 98 000 | 120 000 |
| 52 | 160 000 | 210 000 |
| 62 | 290 000 | 360 000 |
| 72 | 400 000 | 480 000 |
Use linear models to predict the difference in current superannuation account balances for Leigh and Tony.
Reveal Answer
Compare values: .
So, age explains a higher percentage of the account balance variation for the industry B dataset.
Linear model for industry A:
Let
Using calculator, and
Linear model for industry B:
Let
Using calculator, and
40-year-old Leigh works in industry B; substitute
Tony's age
Tony works in industry A; substitute
Difference
The difference in account balances for Leigh and Tony is predicted to be $50 620.
Response
| Descriptor | Marks |
|---|---|
correctly identifies dataset for which age explains a higher percentage of the account balance variation | 1 |
correctly determines linear model for age vs account balance for industry A data | 1 |
correctly determines linear model for age vs account balance for industry B data | 1 |
substitutes x = 40 into appropriate equation and calculates Leigh’s current account balance | 1 |
substitutes x = 50 into appropriate equation and calculates Tony’s current account balance | 1 |
calculates difference in current account balances for Leigh and Tony | 1 |
Communication
| Descriptor | Marks |
|---|---|
shows logical organisation communicating key steps | 1 |
Table 1 shows the latitude, , and ultraviolet index, , for Australian locations at noon on the first day of autumn. Table 2 categorises the ultraviolet index.
Table 1
| Location | Latitude ( S) | Ultraviolet index |
|---|---|---|
| Brisbane | 27 | 12 |
| Darwin | 12 | 13 |
| Melbourne | 38 | 6 |
| Perth | 32 | 11 |
| Sydney | 34 | 9 |
Table 2
| Ultraviolet index | Category |
|---|---|
| 11+ | extreme |
| 8, 9, 10 | very high |
| 6, 7 | high |
| 3, 4, 5 | moderate |
| 1, 2 | low |
A person in Hobart ( S E) at noon on the first day of autumn receives a phone app notification that the ultraviolet index is high.
Use the equation for the least-squares line for the data in table 1 and the information in table 2 to evaluate the reasonableness of the phone app notification.
Reveal Answer
slope,
vertical axis intercept,
Let
The predicted ultraviolet index is 7.
The notification is reasonable because an ultraviolet index of 7 corresponds to high.
| Descriptor | Marks |
|---|---|
correctly determines the values for the slope and vertical axis intercept | 1 |
determines least-squares line equation | 1 |
substitutes latitude into least-squares line equation | 1 |
predicts UV index | 1 |
provides appropriate statement of reasonableness linked to prior working | 1 |
A calculator is used to determine the equation of the least-squares line for the plant growth data in the table.
| Number of days, | 6 | 15 | 20 | 24 | 35 |
|---|---|---|---|---|---|
| Height of plant, | 12 | 14 | 16 | 18 | 30 |
What is the correct equation?
Reveal Answer
This option incorrectly identifies the independent and dependent variables. Since height () depends on the number of days (), the equation should be solved for .
Using linear regression on the data points yields a slope of approximately and a y-intercept of approximately , resulting in the equation .
This option incorrectly swaps the variables and also swaps the values for the slope and y-intercept.
This option incorrectly swaps the slope and the y-intercept. The calculated slope is approximately , not .
The equation of a fitted line for the number of free throws in basketball, , and the number of hours in a training session, , is
The predicted number of free throws for a 5-hour training session, when rounded to the nearest whole number, is
64
65
91
92
Reveal Answer
64
This value is close to the result of multiplying the slope by the hours (), but it ignores the y-intercept () entirely.
65
This is the result of rounded to the nearest whole number. It accounts for the rate of change but fails to add the initial constant (y-intercept) of .
91
The calculated value is . This option incorrectly rounds down (truncates) instead of rounding to the nearest whole number.
92
Substitute into the equation: . Rounding to the nearest whole number gives .
Hiroki believes that more fish are caught on warmer days. Jiro believes that the number of fish caught in a day is more dependent on the number of people fishing.
Bivariate datasets for six days are shown.
| Temperature, (C) | 32 | 26 | 20 | 27 | 23 | 29 |
|---|---|---|---|---|---|---|
| Number of fish caught, | 530 | 400 | 320 | 220 | 180 | 120 |
| Number of people fishing, | 46 | 58 | 38 | 34 | 30 | 28 |
|---|---|---|---|---|---|---|
| Number of fish caught, | 530 | 400 | 320 | 220 | 180 | 120 |
Calculate the correlation coefficient for each dataset and use the results to identify the explanatory variable for the stronger linear association. Use the least-squares line equation for the stronger linear association to predict the number of fish caught on a 25 C day when 50 people are fishing.
Reveal Answer
Calculate correlation coefficient for each dataset.
| Dataset | Correlation coefficient, |
|---|---|
| vs | 0.3 |
| vs | 0.8 |
The explanatory variable for the stronger linear association is , number of people fishing.
Using calculator,
Equation in terms of given variables is
It is predicted that 420 fish will be caught.
| Descriptor | Marks |
|---|---|
correctly calculates correlation coefficient for each dataset | 1 |
identifies explanatory variable for stronger linear association | 1 |
determines least-squares line equation for dataset with stronger linear association | 1 |
substitutes value for relevant explanatory variable | 1 |
predicts number of fish caught | 1 |
The least-squares line for a sample of five data points was found to be , with a correlation coefficient of .
Determine a set of values for and , given that these values differ by 3.
| 4 | 3 | 8 | 4 | 6 | |
|---|---|---|---|---|---|
| 4 | 16 | 8 |
Reveal Answer
From the table of values
Using
From the table
If then
| Descriptor | Marks |
|---|---|
Correctly identifies the and values | 1 |
Correctly determines | 1 |
Determines | 1 |
Determines sum of missing values | 1 |
Determines values for and | 1 |
Shows logical organisation, communicating key steps | 1 |
In a company’s first 10 years of operation, the average annual profit () was $9660 with a standard deviation () of $3010. Fitting a least-squares line to the data comparing annual profit () to the year of operation () produced a correlation coefficient of 0.9987.
Show that the predicted profit, to the nearest dollar, for this company in the 11th year of operation will be $15 121.
Reveal Answer
parameters
Given
Least-squares line parameters
Profit in the 11th year
Predicted profit in the 11th year is $15 121.
| Descriptor | Marks |
|---|---|
correctly determines and | 1 |
determines | 1 |
determines | 1 |
determines 11th year profit to the nearest dollar | 1 |
shows logical organisation communicating key steps | 1 |