How many VCAA General Mathematics questions cover Data analysis, probability and statistics?

AusGrader has 356 VCAA General Mathematics questions on Data analysis, probability and statistics, all with instant AI grading and detailed marking feedback.

This table shows Hobart’s actual rainfall (mm) each season for 2023 and the long-term seasonal indices. | | Autumn | Winter | Spring | Summer | | :--- | :---: | :---: | :---: | :---: | | **2023 rainfall (mm)** | 130 | 145 | 155 | 132 | | **Seasonal index** | 0.92 | 1.02 | 1.12 | 0.94 | Deseasonalise the Hobart rainfall data to identify the 2023 season with the highest seasonally adjusted rainfall.

| | Autumn | Winter | Spring | Summer | | :--- | :--- | :--- | :--- | :--- | | Deseasonalised rainfall | $130/0.92 = 141.30$ | $145/1.02 = 142.16$ | $155/1.12 = 138.39$ | $132/0.94 = 140.43$ | Winter has the highest seasonally adjusted rainfall.

VCE General Mathematics — Data analysis, probability and statistics Questions & Answers

Q5

2025

VCAA

Paper 1

1 mark

Q5

1 mark

The heights of females aged between 16 and 18 years within a population are normally distributed.

Analysis of the heights of this group of females showed that:

2.5% of the heights were greater than 178.9 cm
16% of the heights were less than 157.6 cm.

Using the 68–95–99.7% rule, the mean and standard deviation of the heights of these females are respectively

A

150.5 and 21.3

B

154.9 and 7.1

C

164.7 and 7.1

D

171.8 and 14.2

Reveal Answer

A

150.5 and 21.3

This option incorrectly uses the difference between the two given values ( $3\sigma = 21.3$ ) as the standard deviation instead of dividing by 3 to find $\sigma$ .

B

154.9 and 7.1

While the standard deviation of 7.1 is correct, the mean is miscalculated. The mean should be found by adding the standard deviation to 157.6.

C

164.7 and 7.1

Correct Answer

Using the 68-95-99.7% rule, the bottom 16% corresponds to $\mu - \sigma = 157.6$ and the top 2.5% corresponds to $\mu + 2\sigma = 178.9$ . Solving this system of equations yields a standard deviation of $\sigma = 7.1$ and a mean of $\mu = 164.7$ .

D

171.8 and 14.2

This option incorrectly calculates the standard deviation as $2\sigma = 14.2$ and subsequently miscalculates the mean.

Q22

2020

QCAA

Paper 1

4 marks

Q22

A store asked its junior and senior staff whether or not they would like to change the store uniform.
The results are in the frequency table.

	Change uniform	Do not change uniform
Junior staff	92	28
Senior staff	23	67

Q22a

2 marks

Convert the two-way table into a percentaged two-way frequency table using column totals.

Reveal Answer

Total # change uniform = 115
Total # do not change = 95

	Change uniform	Do not change uniform
Junior staff	80%	29.5%
Senior staff	20%	70.5%
	100%	100%

Marking Criteria

Descriptor	Marks
Correctly determines column totals	1
Correctly represents the data in a percentaged two-way table	1

Q22b

2 marks

Explain whether there is an association between staff groups and a desire to change the store uniform.

Reveal Answer

There does appear to be an association between the staff groups and wanting to change the uniform.
The data suggests that junior staff want to change the uniform (80% as opposed to 20% of senior staff) and senior staff do not want to change (70.5% compared with 29.5% of junior staff).

Marking Criteria

Descriptor	Marks
Suggests the presence of an association	1
Provides reasons to support conclusion	1

Q3

2024

QCAA

Paper 1

1 mark

Q3

1 mark

The coefficient of determination, $R^2$ , is equal to 0.36 for the linear association between $x$ (explanatory variable) and $y$ (response variable).

Which statement is correct?

A

36% of the variation in $x$ can be explained by the variation in $y$ .

B

36% of the total variation can be explained by the linear association.

C

36% of the predicted outcomes can be explained by the variation in $x$ .

D

36% of the variation in $x$ can be predicted by the linear association.

Reveal Answer

A

36% of the variation in $x$ can be explained by the variation in $y$ .

This reverses the variables; $R^2$ measures the proportion of variation in the response variable ( $y$ ) explained by the explanatory variable ( $x$ ), not the variation in $x$ explained by $y$ .

B

36% of the total variation can be explained by the linear association.

Correct Answer

The coefficient of determination, $R^2$ , is defined as the proportion of the total variation in the response variable ( $y$ ) that is explained by the linear relationship with the explanatory variable ( $x$ ).

C

36% of the predicted outcomes can be explained by the variation in $x$ .

$R^2$ measures the proportion of the variation in the observed response values ( $y$ ), not the predicted outcomes, that is explained by the model.

D

36% of the variation in $x$ can be predicted by the linear association.

This refers to the variation in the explanatory variable ( $x$ ), whereas $R^2$ specifically measures the explained variation in the response variable ( $y$ ).

Q1

2021

QCAA

Paper 1

1 mark

Q1

1 mark

The second smoothed value for the 3-point moving average is

Day	1	2	3	4	5	6	7
Value	5	10	18	32	52	70	90

A

32

B

25

C

20

D

18

Reveal Answer

A

32

This is the raw data value for Day 4, not the calculated moving average.

B

25

This value is incorrect; the average of the second window of data points (10, 18, and 32) is 20, not 25.

C

20

Correct Answer

The second smoothed value is calculated by averaging the second set of three data points (Days 2, 3, and 4): $\frac{10 + 18 + 32}{3} = \frac{60}{3} = 20$ .

D

18

This is the raw data value for Day 3, not the calculated moving average.

Q2

2024

VCAA

Paper 1

1 mark

Q2

1 mark

Freddie organised a function at work. He surveyed the staff about their preferences.

He asked them about their payment preference (cash or electronic payment) and their budget preference (less than ＄50 or more than ＄50).

The variables in this survey, payment preference and budget preference, are

A

both categorical variables.

B

both numerical variables.

C

categorical and numerical variables, respectively.

D

numerical and categorical variables, respectively.

Reveal Answer

A

both categorical variables.

Correct Answer

Both variables group responses into distinct categories ('cash' vs. 'electronic' and 'less than ＄50' vs. 'more than ＄50') rather than measuring specific numerical quantities.

B

both numerical variables.

Neither variable asks for a specific numerical measurement, such as an exact dollar amount. Since the responses are groups or labels, they are not numerical variables.

C

categorical and numerical variables, respectively.

While payment preference is categorical, budget preference is also categorical because it groups responses into ranges ('less than ＄50' or 'more than ＄50') rather than asking for an exact numerical value.

D

numerical and categorical variables, respectively.

Payment preference ('cash' or 'electronic') is clearly a categorical variable, not numerical. Budget preference is also categorical, making this option entirely incorrect.

Q1

2023

QCAA

Paper 1

1 mark

Q1

1 mark

A linear association with a correlation coefficient of 0.23 is best described as

A

weak positive.

B

weak negative.

C

strong positive.

D

strong negative.

Reveal Answer

A

weak positive.

Correct Answer

The correlation coefficient is positive ( $r > 0$ ) and closer to $0$ than to $1$ , which indicates a weak positive linear association.

B

weak negative.

A negative association requires a correlation coefficient less than zero ( $r < 0$ ), but the given value is $0.23$ .

C

strong positive.

A strong positive association typically corresponds to an $r$ value closer to $1$ (e.g., $r > 0.7$ ), whereas $0.23$ represents a weak relationship.

D

strong negative.

This option describes a correlation coefficient close to $-1$ , but the given value is positive and indicates a weak relationship.

Q24

2025

QCAA

Paper 1

5 marks

Q24

The average weekly earnings for Australian workers from 2013 to 2023 are modelled by the least-squares line equation $y = 49x - 97 140$ , where $x$ is the year and $y$ is average weekly earnings (＄).

The coefficient of determination, $R^2$ , is 0.997.

Q24a

1 mark

State the percentage of the variation in $y$ that is explained by the linear relationship.

Reveal Answer

$0.997 \times 100\% = 99.7\%$

Marking Criteria

Descriptor	Marks
correctly states the percentage	1

Q24b

2 marks

Identify and interpret the slope of the least-squares line.

Reveal Answer

slope = 49

Australia's average weekly earnings increased each year by ＄49.

Marking Criteria

Descriptor	Marks
correctly identifies the slope value as 49	1
correctly interprets the slope as an increase per year	1

Q24c

2 marks

Use the equation of the least-squares line to predict the average weekly earnings in 2035.

Reveal Answer

Substitute $x = 2035$ into $y = 49x - 97 140$ :

$y = 49(2035) - 97 140 = 2575$

Predicted value for average weekly earnings in 2035 is ＄2575.

Marking Criteria

Descriptor	Marks
correctly substitutes into the least-squares line equation	1
predicts value	1

Q16

2025

VCAA

Paper 1

1 mark

Q16

1 mark

The seasonal index for the number of meat pie sales in winter is 1.75

To correct for seasonality, the actual number of meat pie sales for winter should be reduced, to the nearest whole percentage, by

A

25%

B

43%

C

57%

D

75%

Reveal Answer

A

25%

Incorrect. A 25% reduction would mean multiplying the actual sales by 0.75, which does not match the required deseasonalizing factor of $1/1.75 \approx 0.57$ .

B

43%

Correct Answer

Correct. To correct for seasonality, you divide the actual sales by the seasonal index ( $1/1.75 \approx 0.5714$ ). This means the deseasonalized value is about 57% of the actual value, requiring a reduction of $100\% - 57\% = 43\%$ .

C

57%

Incorrect. This is the percentage that the deseasonalized sales represent of the actual sales ( $1/1.75 \approx 57\%$ ), rather than the percentage by which the actual sales must be reduced.

D

75%

Incorrect. While a seasonal index of 1.75 means sales are 75% above the seasonal average, reversing this increase requires dividing by 1.75, not subtracting 75%.

Q11

2021

QCAA

Paper 1

1 mark

Q11

1 mark

Which option is an example of bivariate data?

A

The rating given to a brand of meat pies as poor, fair or good.

B

The number of people in a household and amount of water used.

C

The number of cars passing through a particular set of traffic lights.

D

The time a person spends using a mobile phone on a Friday evening.

Reveal Answer

A

The rating given to a brand of meat pies as poor, fair or good.

This is an example of univariate data because it involves only one variable (the rating) for each observation.

B

The number of people in a household and amount of water used.

Correct Answer

This is bivariate data because it involves two distinct variables (household size and water usage) collected for each household to analyze the relationship between them.

C

The number of cars passing through a particular set of traffic lights.

This is univariate data because it records only a single variable (the count of cars) at a specific location.

D

The time a person spends using a mobile phone on a Friday evening.

This is univariate data because it measures only one variable (time spent) for each person observed.

Q4

2021

QCAA

Paper 1

1 mark

Q4

1 mark

A confounding variable is a variable that

A

can only take on a certain number of values.

B

remains constant throughout a statistical investigation.

C

is used to predict a difference in the response variable.

D

other than the explanatory variable, influences the response variable.

Reveal Answer

A

can only take on a certain number of values.

This describes a discrete variable, which is defined by having a countable number of possible values, rather than a confounding variable.

B

remains constant throughout a statistical investigation.

A value that remains constant is simply a constant or a controlled variable, whereas a confounding variable varies and affects the outcome.

C

is used to predict a difference in the response variable.

This describes the explanatory (or independent) variable, which is the specific factor the researcher is studying to see if it causes a change.

D

other than the explanatory variable, influences the response variable.

Correct Answer

A confounding variable is an outside influence that affects the response variable and is related to the explanatory variable, making it difficult to determine the true cause of the observed effect.

Q7

2021

QCAA

Paper 2

6 marks

Q7

6 marks

The table shows the total number of times a new song is played on a music service in the days following its first release.

Number of days since first release	5	10	15	20
Total number of times played ('000s)	8	12	18	27

The songwriter is paid 0.175 cents every time their song is played and will be paid after 60 days. They predict that by that time, they will be owed at least ＄1000.

Given that the number of times the song is played is increasing exponentially, evaluate the reasonableness of this prediction.

Reveal Answer

Let $n = \frac{\# \text{ of days}}{5}$
Let $t_n =$ the total number of plays

$\therefore t_1 = 8$

$r = \frac{12}{8}$
$= 1.5$

$\therefore t_n = 8 \times 1.5^{(n-1)}$
At 60 days
$n = \frac{60}{5}$
$= 12$

Total number of plays (in 1000s)
$\therefore t_{12} = 8 \times 1.5^{11}$
$= 691.98$

Total predicted income
Income $= 0.175 \times 691\,980$
$= 121\,096.5 \text{ cents}$
$= ＄1210.97$

At least $＄1000$ is a reasonable prediction if plays continue as a geometric progression.

Marking Criteria

Descriptor	Marks
correctly defines the variables	1
correctly determines the parameter $r$	1
correctly determines a geometric (exponential) model	1
determines total number of plays	1
determines income	1
evaluates reasonableness of solution	1

Q25

2024

QCAA

Paper 1

5 marks

Q25

The table shows Darwin’s actual rainfall (mm) each season for two years.

	2022	2023
Autumn	410	390
Winter	30	20
Spring	205	150
Summer	1135	1100

Q25a

3 marks

Calculate the seasonal index for each season in Darwin.

Reveal Answer

2022 mean rainfall $= (410 + 30 + 205 + 1135)/4 = 445$
2023 mean rainfall $= (390 + 20 + 150 + 1100)/4 = 415$

	2022	2023
Autumn	$410/445 = 0.9213$	$390/415 = 0.9398$
Winter	$30/445 = 0.0674$	$20/415 = 0.0482$
Spring	$205/445 = 0.4607$	$150/415 = 0.3614$
Summer	$1135/445 = 2.5506$	$1100/415 = 2.6506$

	Seasonal index
Autumn	$(0.9213 + 0.9398)/2 = 0.9306$
Winter	$(0.0674 + 0.0482)/2 = 0.0578$
Spring	$(0.4607 + 0.3614)/2 = 0.4111$
Summer	$(2.5506 + 2.6506)/2 = 2.6006$

Marking Criteria

Descriptor	Marks
correctly calculates the 2022 mean rainfall and 2023 mean rainfall	1
calculates seasonal ratios for 2022 and 2023	1
calculates seasonal index for each season	1

Q25b

2 marks

This table shows Hobart’s actual rainfall (mm) each season for 2023 and the long-term seasonal indices.

	Autumn	Winter	Spring	Summer
2023 rainfall (mm)	130	145	155	132
Seasonal index	0.92	1.02	1.12	0.94

Deseasonalise the Hobart rainfall data to identify the 2023 season with the highest seasonally adjusted rainfall.

Reveal Answer

	Autumn	Winter	Spring	Summer
Deseasonalised rainfall	$130/0.92 = 141.30$	$145/1.02 = 142.16$	$155/1.12 = 138.39$	$132/0.94 = 140.43$

Winter has the highest seasonally adjusted rainfall.

Marking Criteria

Descriptor	Marks
correctly calculates the deseasonalised rainfall for each season	1
identifies season with highest seasonally adjusted rainfall	1

Q1

2024

QCAA

Paper 2

5 marks

Q1

5 marks

Each of the 60 performers in a music and dance concert is either a Year 11 or Year 12 student and either a musician or a dancer.

There are four more Year 11 students than Year 12 students. One quarter of the Year 11 students are dancers and half of the Year 12 students are dancers.

Complete the two-way frequency table to calculate the percentage of students who are musicians.

	Year 11	Year 12	Total
Musician
Dancer
Total			60

Reveal Answer

	Year 11	Year 12	Total
Musician	$32 - 8 = 24$	half of $28 = 14$	$24 + 14 = 38$
Dancer	one-quarter of $32 = 8$	half of $28 = 14$	$8 + 14 = 22$
Total	32	28	60

Percentage of students who are musicians:
$\frac{38}{60}\times 100\% = 63.\dot{3}\%$

Marking Criteria

Descriptor	Marks
correctly calculates the frequencies for total Year 11 students and total Year 12 students	1
calculates frequencies for dancers in Year 11 and dancers in Year 12	1
calculates frequencies for musicians in Year 11 and musicians in Year 12	1
calculates frequencies for total musicians and total dancers	1
calculates percentage of students who are musicians	1

Q7

2025

QCAA

Paper 1

1 mark

Q7

1 mark

The association between two numerical variables is modelled by the equation $y = 4.6x - 35$ , with a correlation coefficient of 0.92.

The association is best described as

A

weak and linear.

B

strong and linear.

C

weak and non-linear.

D

strong and non-linear.

Reveal Answer

A

weak and linear.

While the equation represents a linear relationship, a correlation coefficient of 0.92 indicates a strong association, not a weak one.

B

strong and linear.

Correct Answer

The equation $y = 4.6x - 35$ is a linear equation, and a correlation coefficient of 0.92 is close to 1, indicating a strong positive linear association.

C

weak and non-linear.

The equation $y = 4.6x - 35$ represents a linear relationship, and a correlation coefficient of 0.92 indicates a strong association, making both parts of this description incorrect.

D

strong and non-linear.

Although the association is strong, the equation $y = 4.6x - 35$ is in the form $y = mx + c$ , which models a linear relationship, not a non-linear one.

Q9

2023

VCAA

Paper 1

1 mark

Q9

1 mark

A least squares line can be used to model the birth rate (children per 1000 population) in a country from the average daily food energy intake (megajoules) in that country.

When a least squares line is fitted to data from a selection of countries it is found that:

for a country with an average daily food energy intake of 8.53 megajoules, the birth rate will be 32.2 children per 1000 population
for a country with an average daily food energy intake of 14.9 megajoules, the birth rate will be 9.9 children per 1000 population.

The slope of this least squares line is closest to

A

$-4.7$

B

$-3.5$

C

$-0.29$

D

2.7

E

25

Reveal Answer

A

$-4.7$

Incorrect. This value does not match the result of the slope formula $m = \frac{y_2 - y_1}{x_2 - x_1}$ .

B

$-3.5$

Correct Answer

Correct. The slope is the change in birth rate divided by the change in energy intake: $m = \frac{9.9 - 32.2}{14.9 - 8.53} = \frac{-22.3}{6.37} \approx -3.5$ .

C

$-0.29$

Incorrect. This is the reciprocal of the slope, incorrectly calculated by dividing the change in $x$ by the change in $y$ ( $\frac{6.37}{-22.3} \approx -0.29$ ).

D

2.7

Incorrect. The data shows that as energy intake increases, birth rate decreases, meaning the slope must be negative.

E

25

Incorrect. This value is positive, but the inverse relationship between energy intake and birth rate requires a negative slope.

VCAA General Mathematics Data analysis, probability and statistics

Frequently Asked Questions

Ready to practise VCAA General Mathematics?

VCAA General Mathematics Data analysis, probability and statistics

Sample Answer

Sample Answer

Sample Answer

Sample Answer

Sample Answer

Sample Answer

Sample Answer

Sample Answer

Sample Answer

Frequently Asked Questions

Ready to practise VCAA General Mathematics?