QCAA General Mathematics Bivariate data analysis 2

15 sample questions with marking guides and sample answers

Q9
2023
VCAA
Paper 1
1 mark
Q9
1 mark

A least squares line can be used to model the birth rate (children per 1000 population) in a country from the average daily food energy intake (megajoules) in that country.

When a least squares line is fitted to data from a selection of countries it is found that:

  • for a country with an average daily food energy intake of 8.53 megajoules, the birth rate will be 32.2 children per 1000 population
  • for a country with an average daily food energy intake of 14.9 megajoules, the birth rate will be 9.9 children per 1000 population.

The slope of this least squares line is closest to

A

4.7-4.7

B

3.5-3.5

C

0.29-0.29

D

2.7

E

25

Reveal Answer
A

4.7-4.7

Incorrect. This value does not match the result of the slope formula m=y2y1x2x1m = \frac{y_2 - y_1}{x_2 - x_1}.

B

3.5-3.5

Correct Answer

Correct. The slope is the change in birth rate divided by the change in energy intake: m=9.932.214.98.53=22.36.373.5m = \frac{9.9 - 32.2}{14.9 - 8.53} = \frac{-22.3}{6.37} \approx -3.5.

C

0.29-0.29

Incorrect. This is the reciprocal of the slope, incorrectly calculated by dividing the change in xx by the change in yy (6.3722.30.29\frac{6.37}{-22.3} \approx -0.29).

D

2.7

Incorrect. The data shows that as energy intake increases, birth rate decreases, meaning the slope must be negative.

E

25

Incorrect. This value is positive, but the inverse relationship between energy intake and birth rate requires a negative slope.

Q4
2021
QCAA
Paper 1
1 mark
Q4
1 mark

A confounding variable is a variable that

A

can only take on a certain number of values.

B

remains constant throughout a statistical investigation.

C

is used to predict a difference in the response variable.

D

other than the explanatory variable, influences the response variable.

Reveal Answer
A

can only take on a certain number of values.

This describes a discrete variable, which is defined by having a countable number of possible values, rather than a confounding variable.

B

remains constant throughout a statistical investigation.

A value that remains constant is simply a constant or a controlled variable, whereas a confounding variable varies and affects the outcome.

C

is used to predict a difference in the response variable.

This describes the explanatory (or independent) variable, which is the specific factor the researcher is studying to see if it causes a change.

D

other than the explanatory variable, influences the response variable.

Correct Answer

A confounding variable is an outside influence that affects the response variable and is related to the explanatory variable, making it difficult to determine the true cause of the observed effect.

Q10
2024
VCAA
Paper 1
1 mark
Q10
1 mark

Use the following information to answer the question.

The least squares equation for the relationship between the average number of male athletes per competing nation, males, and the number of the Summer Olympic Games, number, is

males=67.51.27×numbermales = 67.5 - 1.27 \times number

At which Summer Olympic Games will the predicted average number of males be closest to 25.6?

A

31st

B

32nd

C

33rd

D

34th

Reveal Answer
A

31st

Incorrect. Substituting 31 into the equation yields 67.51.27×31=28.1367.5 - 1.27 \times 31 = 28.13, which is not the closest value to 25.6.

B

32nd

Incorrect. Substituting 32 into the equation yields 67.51.27×32=26.8667.5 - 1.27 \times 32 = 26.86, which is not the closest value to 25.6.

C

33rd

Correct Answer

Correct. Setting the equation to 25.6=67.51.27×number25.6 = 67.5 - 1.27 \times number and solving for numbernumber gives 41.9/1.2732.9941.9 / 1.27 \approx 32.99, making the 33rd Games the closest.

D

34th

Incorrect. Substituting 34 into the equation yields 67.51.27×34=24.3267.5 - 1.27 \times 34 = 24.32, which is not the closest value to 25.6.

Q9
2020
QCAA
Paper 1
1 mark
Q9
1 mark

It is observed that as the number of ice blocks sold each month increases, the number of fans sold also increases. Which of these statements is therefore true?

A

There is a negative causation between the number of ice blocks sold and the number of fans sold each month.

B

There is a positive causation between the number of ice blocks sold and the number of fans sold each month.

C

There is a negative association between the number of ice blocks sold and the number of fans sold each month.

D

There is a positive association between the number of ice blocks sold and the number of fans sold each month.

Reveal Answer
A

There is a negative causation between the number of ice blocks sold and the number of fans sold each month.

This is incorrect because a 'negative' relationship implies that as one variable increases, the other decreases. Additionally, observational data establishes association, not necessarily causation.

B

There is a positive causation between the number of ice blocks sold and the number of fans sold each month.

This is incorrect because correlation does not imply causation. While the variables move together, it is likely a third variable (like hot weather) causes both to increase, rather than ice blocks causing fan sales.

C

There is a negative association between the number of ice blocks sold and the number of fans sold each month.

This is incorrect because a negative association describes an inverse relationship where one variable decreases as the other increases, contradicting the observation that both increase.

D

There is a positive association between the number of ice blocks sold and the number of fans sold each month.

Correct Answer

This is correct because 'positive' indicates that both variables move in the same direction (both increase), and 'association' correctly describes the statistical relationship without assuming one causes the other.

Q5
2020
QCAA
Paper 1
1 mark
Q5
1 mark

Determine the equation of the least-squares line where r=0.926,xˉ=5.2,sx=2.3,yˉ=68.6r = 0.926, \bar{x} = 5.2, s_x = 2.3, \bar{y} = 68.6 and sy=41.7s_y = 41.7.

A

y=16.79x1146.51y = 16.79x - 1146.51

B

y=16.79x18.70y = 16.79x - 18.70

C

y=0.05x+68.33y = 0.05x + 68.33

D

y=0.05x+1.70y = 0.05x + 1.70

Reveal Answer
A

y=16.79x1146.51y = 16.79x - 1146.51

While the slope (16.7916.79) is correct, the y-intercept is incorrect. The intercept should be calculated as b0=yˉb1xˉ=68.6(16.79)(5.2)18.70b_0 = \bar{y} - b_1\bar{x} = 68.6 - (16.79)(5.2) \approx -18.70.

B

y=16.79x18.70y = 16.79x - 18.70

Correct Answer

This is the correct equation. The slope is b1=rsysx=0.926(41.72.3)16.79b_1 = r \frac{s_y}{s_x} = 0.926(\frac{41.7}{2.3}) \approx 16.79, and the y-intercept is b0=yˉb1xˉ=68.616.79(5.2)18.70b_0 = \bar{y} - b_1\bar{x} = 68.6 - 16.79(5.2) \approx -18.70.

C

y=0.05x+68.33y = 0.05x + 68.33

This option incorrectly calculates the slope by inverting the ratio of standard deviations (rsxsyr \frac{s_x}{s_y}). The correct slope formula is b1=rsysxb_1 = r \frac{s_y}{s_x}.

D

y=0.05x+1.70y = 0.05x + 1.70

This option uses an incorrect slope (0.050.05) derived from inverting the standard deviations and an incorrect intercept calculation.

Q14
2025
QCAA
Paper 1
1 mark
Q14
1 mark

A non-causal explanation is most likely for a strong association between which pair of variables?

A

a town's child population and its number of schools

B

a phone's hours of use and its remaining battery percentage

C

the amount of bread sold by a bakery and its number of customers

D

the number of cars at a petrol station and the height of the car drivers

Reveal Answer
A

a town's child population and its number of schools

An increase in a town's child population directly causes a need for more schools, making this a causal relationship.

B

a phone's hours of use and its remaining battery percentage

Using a phone directly causes its battery to drain, which is a clear causal relationship.

C

the amount of bread sold by a bakery and its number of customers

An increase in the number of customers directly causes an increase in the amount of bread sold, indicating a causal relationship.

D

the number of cars at a petrol station and the height of the car drivers

Correct Answer

There is no logical reason why the number of cars at a petrol station would cause a change in the drivers' heights, or vice versa. Any strong association between these variables would likely be non-causal, such as a coincidence.

Q16
2022
QCAA
Paper 1
3 marks
Q16

The table shows the number of sales for a small business in their first six months of trading.

Time in months, ttNumber of sales, nn
186
2180
3160
4226
5240
6335
Q16a
1 mark

Use your calculator to determine the equation of the least-squares line.

Reveal Answer

n=42.6t+55.4n = 42.6t + 55.4

Marking Criteria
DescriptorMarks

Correctly determines the equation of the least-squares line

1
Q16b
2 marks

Use the equation from Question 16a) to predict the number of sales in the 21st month.

Reveal Answer

Let t=21t = 21
n=42.6(21)+55.4n = 42.6(21) + 55.4
=950= 950

The predicted number of sales is 950.

Marking Criteria
DescriptorMarks

Substitutes into equation from Question 16a)

1

Predicts number of sales

1
Q7
2023
QCAA
Paper 1
1 mark
Q7
1 mark

Which statement is always true for a causal relationship between an explanatory variable and a response variable?

A

One of the variables is a confounding variable.

B

The relationship is explained by a third variable.

C

There is a positive association between the variables.

D

The response variable is dependent on the explanatory variable.

Reveal Answer
A

One of the variables is a confounding variable.

A confounding variable is an external third variable that influences both the explanatory and response variables; the explanatory and response variables themselves are not confounders.

B

The relationship is explained by a third variable.

If a relationship is entirely explained by a third variable, the association is often spurious rather than causal; a direct causal relationship implies the explanatory variable itself influences the response.

C

There is a positive association between the variables.

Causal relationships can be negative (inverse) as well as positive; for example, increased price often causes a decrease in sales.

D

The response variable is dependent on the explanatory variable.

Correct Answer

In a causal relationship, the explanatory variable is the cause and the response variable is the effect, meaning changes in the response variable depend on changes in the explanatory variable.

Q6
2023
QCAA
Paper 2
7 marks
Q6
7 marks

The table shows the average superannuation account balance for workers of various ages in two different industries. The coefficient of determination, R2R^2, for age versus account balance is 0.95 for industry A and 0.96 for industry B. 40-year-old Leigh works in the industry for which age explains a higher percentage of the account balance variation. Tony is 10 years older than Leigh and works in the other industry.

Age (years)Account balance ($) Industry AAccount balance ($) Industry B
2275008100
3242 00060 000
4298 000120 000
52160 000210 000
62290 000360 000
72400 000480 000

Use linear models to predict the difference in current superannuation account balances for Leigh and Tony.

Reveal Answer

Compare R2R^2 values: 0.95<0.960.95 < 0.96.
So, age explains a higher percentage of the account balance variation for the industry B dataset.

Linear model for industry A:
Let x=age,y=account balancex = \text{age}, y = \text{account balance}
y=bx+ay = bx + a
Using calculator, b=7910b = 7910 and a=205520a = -205\,520
y=7910x+205520y = 7910x + -205\,520

Linear model for industry B:
Let x=age,y=account balancex = \text{age}, y = \text{account balance}
y=bx+ay = bx + a
Using calculator, b=9570b = 9570 and a=243440a = -243\,440
y=9570x+243440y = 9570x + -243\,440

40-year-old Leigh works in industry B; substitute x=40x = 40
y=9570×40+243440y = 9570 \times 40 + -243\,440
=139360= 139\,360

Tony's age =40+10=50= 40 + 10 = 50
Tony works in industry A; substitute x=50x = 50
y=7910×50+205520y = 7910 \times 50 + -205\,520
=189980= 189\,980

Difference =189980139360= 189\,980 - 139\,360
=50620= 50\,620
The difference in account balances for Leigh and Tony is predicted to be $50 620.

Marking Criteria

Response

DescriptorMarks

correctly identifies dataset for which age explains a higher percentage of the account balance variation

1

correctly determines linear model for age vs account balance for industry A data

1

correctly determines linear model for age vs account balance for industry B data

1

substitutes x = 40 into appropriate equation and calculates Leigh’s current account balance

1

substitutes x = 50 into appropriate equation and calculates Tony’s current account balance

1

calculates difference in current account balances for Leigh and Tony

1

Communication

DescriptorMarks

shows logical organisation communicating key steps

1
Q3
2024
QCAA
Paper 2
5 marks
Q3
5 marks

Table 1 shows the latitude, xx, and ultraviolet index, yy, for Australian locations at noon on the first day of autumn. Table 2 categorises the ultraviolet index.

Table 1

LocationLatitude (^\circ S)Ultraviolet index
Brisbane2712
Darwin1213
Melbourne386
Perth3211
Sydney349

Table 2

Ultraviolet indexCategory
11+extreme
8, 9, 10very high
6, 7high
3, 4, 5moderate
1, 2low

A person in Hobart (4343^\circ S 147147^\circ E) at noon on the first day of autumn receives a phone app notification that the ultraviolet index is high.

Use the equation for the least-squares line for the data in table 1 and the information in table 2 to evaluate the reasonableness of the phone app notification.

Reveal Answer

slope, b=0.227b = -0.227
vertical axis intercept, a=16.7a = 16.7

y=a+bxy = a + bx

y=16.70.227xy = 16.7 - 0.227x

Let x=43x = 43
y=16.7+0.227(43)y = 16.7 + -0.227(43)

y=6.9y = 6.9

The predicted ultraviolet index is 7.

The notification is reasonable because an ultraviolet index of 7 corresponds to high.

Marking Criteria
DescriptorMarks

correctly determines the values for the slope and vertical axis intercept

1

determines least-squares line equation

1

substitutes latitude into least-squares line equation

1

predicts UV index

1

provides appropriate statement of reasonableness linked to prior working

1
Q14
2023
QCAA
Paper 1
1 mark
Q14
1 mark

A calculator is used to determine the equation of the least-squares line for the plant growth data in the table.

Number of days, dd615202435
Height of plant, hh1214161830

What is the correct equation?

A

d=0.6h+5.7d = 0.6h + 5.7

B

h=0.6d+5.7h = 0.6d + 5.7

C

d=5.7h+0.6d = 5.7h + 0.6

D

h=5.7d+0.6h = 5.7d + 0.6

Reveal Answer
A

d=0.6h+5.7d = 0.6h + 5.7

This option incorrectly identifies the independent and dependent variables. Since height (hh) depends on the number of days (dd), the equation should be solved for hh.

B

h=0.6d+5.7h = 0.6d + 5.7

Correct Answer

Using linear regression on the data points yields a slope of approximately 0.60.6 and a y-intercept of approximately 5.75.7, resulting in the equation h=0.6d+5.7h = 0.6d + 5.7.

C

d=5.7h+0.6d = 5.7h + 0.6

This option incorrectly swaps the variables and also swaps the values for the slope and y-intercept.

D

h=5.7d+0.6h = 5.7d + 0.6

This option incorrectly swaps the slope and the y-intercept. The calculated slope is approximately 0.60.6, not 5.75.7.

Q11
2022
QCAA
Paper 1
1 mark
Q11
1 mark

The equation of a fitted line for the number of free throws in basketball, tt, and the number of hours in a training session, hh, is t=26.781+12.974ht = 26.781 + 12.974 h

The predicted number of free throws for a 5-hour training session, when rounded to the nearest whole number, is

A

64

B

65

C

91

D

92

Reveal Answer
A

64

This value is close to the result of multiplying the slope by the hours (12.974×5=64.8712.974 \times 5 = 64.87), but it ignores the y-intercept (26.78126.781) entirely.

B

65

This is the result of 12.974×512.974 \times 5 rounded to the nearest whole number. It accounts for the rate of change but fails to add the initial constant (y-intercept) of 26.78126.781.

C

91

The calculated value is 91.65191.651. This option incorrectly rounds down (truncates) instead of rounding to the nearest whole number.

D

92

Correct Answer

Substitute h=5h=5 into the equation: t=26.781+12.974(5)=26.781+64.87=91.651t = 26.781 + 12.974(5) = 26.781 + 64.87 = 91.651. Rounding to the nearest whole number gives 9292.

Q4
2023
QCAA
Paper 2
5 marks
Q4
5 marks

Hiroki believes that more fish are caught on warmer days. Jiro believes that the number of fish caught in a day is more dependent on the number of people fishing.

Bivariate datasets for six days are shown.

Temperature, tt (^\circC)322620272329
Number of fish caught, ff530400320220180120
Number of people fishing, pp465838343028
Number of fish caught, ff530400320220180120

Calculate the correlation coefficient for each dataset and use the results to identify the explanatory variable for the stronger linear association. Use the least-squares line equation for the stronger linear association to predict the number of fish caught on a 25 ^\circC day when 50 people are fishing.

Reveal Answer

Calculate correlation coefficient for each dataset.

DatasetCorrelation coefficient, rr
tt vs ff0.3
pp vs ff0.8

0.8>0.30.8 > 0.3
The explanatory variable for the stronger linear association is pp, number of people fishing.

y=a+bxy = a + bx
Using calculator, a=130,b=11a = -130, b = 11
Equation in terms of given variables is
f=130+11pf = -130 + 11p

=130+11×50= -130 + 11 \times 50
=420= 420
It is predicted that 420 fish will be caught.

Marking Criteria
DescriptorMarks

correctly calculates correlation coefficient for each dataset

1

identifies explanatory variable for stronger linear association

1

determines least-squares line equation for dataset with stronger linear association

1

substitutes value for relevant explanatory variable

1

predicts number of fish caught

1
Q3
2020
QCAA
Paper 2
6 marks
Q3
6 marks

The least-squares line for a sample of five data points was found to be y=2.1875x+0.0625y = 2.1875x + 0.0625, with a correlation coefficient of r=0.875r = 0.875.

Determine a set of values for pp and qq, given that these values differ by 3.

xx43846
yypp4168qq
Reveal Answer

y=2.1875x+0.0625y = 2.1875x + 0.0625
b=2.1875\therefore b = 2.1875
a=0.0625a = 0.0625

From the table of values
xˉ=5\bar{x} = 5

Using aa
a=yˉbxˉa = \bar{y} - b\bar{x}
0.0625=yˉ2.1875×50.0625 = \bar{y} - 2.1875 \times 5
yˉ=11\therefore \bar{y} = 11

From the table
yˉ=Σyn\bar{y} = \frac{\Sigma y}{n}
11=4+8+p+q+165\therefore 11 = \frac{4+8+p+q+16}{5}
55=28+p+q\therefore 55 = 28 + p + q
p+q=27\therefore p + q = 27

If q=p+3q = p + 3 then
p+p+3=27p + p + 3 = 27
2p=24\therefore 2p = 24
p=12\therefore p = 12
q=15\therefore q = 15

Marking Criteria
DescriptorMarks

Correctly identifies the aa and bb values

1

Correctly determines xˉ\bar{x}

1

Determines yˉ\bar{y}

1

Determines sum of missing values

1

Determines values for pp and qq

1

Shows logical organisation, communicating key steps

1
Q3
2022
QCAA
Paper 2
5 marks
Q3
5 marks

In a company’s first 10 years of operation, the average annual profit (yˉ\bar{y}) was $9660 with a standard deviation (sys_y) of $3010. Fitting a least-squares line to the data comparing annual profit (yy) to the year of operation (xx) produced a correlation coefficient of 0.9987.

Show that the predicted profit, to the nearest dollar, for this company in the 11th year of operation will be $15 121.

Reveal Answer

xx parameters
x=1,2,...,10x = 1, 2, ..., 10
xˉ=5.5\bar{x} = 5.5
sx=3.02765s_x = 3.02765
Given
yˉ=9660\bar{y} = 9660
sy=3010s_y = 3010
r=0.9987r = 0.9987

Least-squares line parameters
b=rsysxb = r \frac{s_y}{s_x}
=0.9987×30103.02765= 0.9987 \times \frac{3010}{3.02765}
=992.878= 992.878

a=yˉbxˉa = \bar{y} - b\bar{x}
=9660992.878×5.5= 9660 - 992.878 \times 5.5
=4199.17= 4199.17

Profit in the 11th year
y=a+bxy = a + bx
=4199.17+992.878(11)= 4199.17 + 992.878(11)
=15120.83= 15\,120.83
=15121= $15\,121

Predicted profit in the 11th year is $15 121.

Marking Criteria
DescriptorMarks

correctly determines xˉ\bar{x} and sxs_x

1

determines bb

1

determines aa

1

determines 11th year profit to the nearest dollar

1

shows logical organisation communicating key steps

1

Frequently Asked Questions

How many QCAA General Mathematics questions cover Bivariate data analysis 2?
AusGrader has 126 QCAA General Mathematics questions on Bivariate data analysis 2, all with instant AI grading and detailed marking feedback.

Ready to practise QCAA General Mathematics?

Get instant AI feedback on past exam questions, aligned to the syllabus

Start Practising Free