SCSA Mathematics Applications Bivariate data analysis
15 sample questions with marking guides and sample answers · Avg. score: 58.7%
A store asked its junior and senior staff whether or not they would like to change the store uniform.
The results are in the frequency table.
| Change uniform | Do not change uniform | |
|---|---|---|
| Junior staff | 92 | 28 |
| Senior staff | 23 | 67 |
Convert the two-way table into a percentaged two-way frequency table using column totals.
Reveal Answer
Total # change uniform = 115
Total # do not change = 95
| Change uniform | Do not change uniform | |
|---|---|---|
| Junior staff | 80% | 29.5% |
| Senior staff | 20% | 70.5% |
| 100% | 100% |
| Descriptor | Marks |
|---|---|
Correctly determines column totals | 1 |
Correctly represents the data in a percentaged two-way table | 1 |
Explain whether there is an association between staff groups and a desire to change the store uniform.
Reveal Answer
There does appear to be an association between the staff groups and wanting to change the uniform.
The data suggests that junior staff want to change the uniform (80% as opposed to 20% of senior staff) and senior staff do not want to change (70.5% compared with 29.5% of junior staff).
| Descriptor | Marks |
|---|---|
Suggests the presence of an association | 1 |
Provides reasons to support conclusion | 1 |
The two-way table summarises the semester 1 results for students enrolled in two courses, Machinery and Electrical. Students achieved either satisfactory (S) or unsatisfactory (U).
| Machinery S | Machinery U | |
|---|---|---|
| Electrical S | 80% | 10% |
| Electrical U | 20% | 90% |
The 10% cell in the table indicates that
10% of all students achieved satisfactory in Electrical.
10% of all students achieved unsatisfactory in Machinery.
10% of the students who achieved satisfactory in Electrical achieved unsatisfactory in Machinery.
10% of the students who achieved unsatisfactory in Machinery achieved satisfactory in Electrical.
Reveal Answer
10% of all students achieved satisfactory in Electrical.
This option describes the marginal percentage of all students who passed Electrical. The table provides conditional percentages based on Machinery results, not the total population distribution.
10% of all students achieved unsatisfactory in Machinery.
This option describes the marginal percentage of all students who failed Machinery. The value 10% represents a specific intersection of results relative to a subgroup, not the total proportion of students failing Machinery.
10% of the students who achieved satisfactory in Electrical achieved unsatisfactory in Machinery.
This interprets the condition in reverse (conditioning on the row). Since the rows do not sum to 100% (), the percentages are not based on the group of students who achieved satisfactory in Electrical.
10% of the students who achieved unsatisfactory in Machinery achieved satisfactory in Electrical.
The columns in the table sum to 100% (), indicating that the percentages are conditional on the column variable. Therefore, the 10% represents the portion of students within the 'Machinery U' group who achieved 'Electrical S'.
A linear association with a correlation coefficient of 0.23 is best described as
weak positive.
weak negative.
strong positive.
strong negative.
Reveal Answer
weak positive.
The correlation coefficient is positive () and closer to than to , which indicates a weak positive linear association.
weak negative.
A negative association requires a correlation coefficient less than zero (), but the given value is .
strong positive.
A strong positive association typically corresponds to an value closer to (e.g., ), whereas represents a weak relationship.
strong negative.
This option describes a correlation coefficient close to , but the given value is positive and indicates a weak relationship.
For a dataset with 10 points, the value of is equal to -4.5.
Calculate the correlation coefficient.
-0.50
-0.45
0.45
0.50
Reveal Answer
-0.50
The sample correlation coefficient is calculated as . With , you divide the given sum () by , resulting in .
-0.45
This result comes from dividing the sum by () instead of (). The formula for the sample correlation coefficient uses degrees of freedom.
0.45
This option has the incorrect sign and uses the incorrect divisor (). Since the sum of the products is negative, the correlation coefficient must be negative.
0.50
This option has the incorrect sign. The correlation coefficient always shares the same sign as the sum of the standardized products, which is given as negative.
Which example states an explanatory variable followed by a response variable?
car manufacturers and car colours
dog breeds and frequency of names
plant growth and amount of fertiliser used
daily temperatures and daily ice cream sales
Reveal Answer
car manufacturers and car colours
These are typically treated as two categorical variables associated with a car, rather than a clear explanatory variable driving a response variable.
dog breeds and frequency of names
This example describes an association between a category and a summary statistic, rather than a direct explanatory-response relationship between variables.
plant growth and amount of fertiliser used
This option lists the response variable (plant growth) first and the explanatory variable (amount of fertiliser) second, which is the reverse of the order requested.
daily temperatures and daily ice cream sales
Daily temperature is the explanatory variable because it influences or causes changes in the response variable, daily ice cream sales.
A scatterplot is created to identify the nature of the relationship between two variables: vehicle age and distance travelled.
Which statement is correct?
The vertical axis should show vehicle age as the response variable.
The horizontal axis should show vehicle age as the explanatory variable.
The horizontal axis should show distance travelled as the response variable.
The vertical axis should show distance travelled as the explanatory variable.
Reveal Answer
The vertical axis should show vehicle age as the response variable.
Vehicle age is the explanatory variable because it predicts the distance travelled, not the response variable.
The horizontal axis should show vehicle age as the explanatory variable.
Vehicle age is the explanatory (independent) variable, which is conventionally plotted on the horizontal axis (-axis).
The horizontal axis should show distance travelled as the response variable.
Response variables are plotted on the vertical axis (-axis), not the horizontal axis.
The vertical axis should show distance travelled as the explanatory variable.
Distance travelled is the response (dependent) variable because it depends on the age of the vehicle, not the explanatory variable.
A least squares line can be used to model the birth rate (children per 1000 population) in a country from the average daily food energy intake (megajoules) in that country.
When a least squares line is fitted to data from a selection of countries it is found that:
- for a country with an average daily food energy intake of 8.53 megajoules, the birth rate will be 32.2 children per 1000 population
- for a country with an average daily food energy intake of 14.9 megajoules, the birth rate will be 9.9 children per 1000 population.
The slope of this least squares line is closest to
2.7
25
Reveal Answer
Incorrect. This value does not match the result of the slope formula .
Correct. The slope is the change in birth rate divided by the change in energy intake: .
Incorrect. This is the reciprocal of the slope, incorrectly calculated by dividing the change in by the change in ().
2.7
Incorrect. The data shows that as energy intake increases, birth rate decreases, meaning the slope must be negative.
25
Incorrect. This value is positive, but the inverse relationship between energy intake and birth rate requires a negative slope.
The coefficient of determination, , is equal to 0.36 for the linear association between (explanatory variable) and (response variable).
Which statement is correct?
36% of the variation in can be explained by the variation in .
36% of the total variation can be explained by the linear association.
36% of the predicted outcomes can be explained by the variation in .
36% of the variation in can be predicted by the linear association.
Reveal Answer
36% of the variation in can be explained by the variation in .
This reverses the variables; measures the proportion of variation in the response variable () explained by the explanatory variable (), not the variation in explained by .
36% of the total variation can be explained by the linear association.
The coefficient of determination, , is defined as the proportion of the total variation in the response variable () that is explained by the linear relationship with the explanatory variable ().
36% of the predicted outcomes can be explained by the variation in .
measures the proportion of the variation in the observed response values (), not the predicted outcomes, that is explained by the model.
36% of the variation in can be predicted by the linear association.
This refers to the variation in the explanatory variable (), whereas specifically measures the explained variation in the response variable ().
Use the following information to answer the question.
The least squares equation for the relationship between the average number of male athletes per competing nation, males, and the number of the Summer Olympic Games, number, is
At which Summer Olympic Games will the predicted average number of males be closest to 25.6?
31st
32nd
33rd
34th
Reveal Answer
31st
Incorrect. Substituting 31 into the equation yields , which is not the closest value to 25.6.
32nd
Incorrect. Substituting 32 into the equation yields , which is not the closest value to 25.6.
33rd
Correct. Setting the equation to and solving for gives , making the 33rd Games the closest.
34th
Incorrect. Substituting 34 into the equation yields , which is not the closest value to 25.6.
The table shows the number of sales for a small business in their first six months of trading.
| Time in months, | Number of sales, |
|---|---|
| 1 | 86 |
| 2 | 180 |
| 3 | 160 |
| 4 | 226 |
| 5 | 240 |
| 6 | 335 |
Use your calculator to determine the equation of the least-squares line.
Reveal Answer
| Descriptor | Marks |
|---|---|
Correctly determines the equation of the least-squares line | 1 |
Use the equation from Question 16a) to predict the number of sales in the 21st month.
Reveal Answer
Let
The predicted number of sales is 950.
| Descriptor | Marks |
|---|---|
Substitutes into equation from Question 16a) | 1 |
Predicts number of sales | 1 |
It is observed that as the number of ice blocks sold each month increases, the number of fans sold also increases. Which of these statements is therefore true?
There is a negative causation between the number of ice blocks sold and the number of fans sold each month.
There is a positive causation between the number of ice blocks sold and the number of fans sold each month.
There is a negative association between the number of ice blocks sold and the number of fans sold each month.
There is a positive association between the number of ice blocks sold and the number of fans sold each month.
Reveal Answer
There is a negative causation between the number of ice blocks sold and the number of fans sold each month.
This is incorrect because a 'negative' relationship implies that as one variable increases, the other decreases. Additionally, observational data establishes association, not necessarily causation.
There is a positive causation between the number of ice blocks sold and the number of fans sold each month.
This is incorrect because correlation does not imply causation. While the variables move together, it is likely a third variable (like hot weather) causes both to increase, rather than ice blocks causing fan sales.
There is a negative association between the number of ice blocks sold and the number of fans sold each month.
This is incorrect because a negative association describes an inverse relationship where one variable decreases as the other increases, contradicting the observation that both increase.
There is a positive association between the number of ice blocks sold and the number of fans sold each month.
This is correct because 'positive' indicates that both variables move in the same direction (both increase), and 'association' correctly describes the statistical relationship without assuming one causes the other.
A confounding variable is a variable that
can only take on a certain number of values.
remains constant throughout a statistical investigation.
is used to predict a difference in the response variable.
other than the explanatory variable, influences the response variable.
Reveal Answer
can only take on a certain number of values.
This describes a discrete variable, which is defined by having a countable number of possible values, rather than a confounding variable.
remains constant throughout a statistical investigation.
A value that remains constant is simply a constant or a controlled variable, whereas a confounding variable varies and affects the outcome.
is used to predict a difference in the response variable.
This describes the explanatory (or independent) variable, which is the specific factor the researcher is studying to see if it causes a change.
other than the explanatory variable, influences the response variable.
A confounding variable is an outside influence that affects the response variable and is related to the explanatory variable, making it difficult to determine the true cause of the observed effect.
Each of the 60 performers in a music and dance concert is either a Year 11 or Year 12 student and either a musician or a dancer.
There are four more Year 11 students than Year 12 students. One quarter of the Year 11 students are dancers and half of the Year 12 students are dancers.
Complete the two-way frequency table to calculate the percentage of students who are musicians.
| Year 11 | Year 12 | Total | |
|---|---|---|---|
| Musician | |||
| Dancer | |||
| Total | 60 |
Reveal Answer
| Year 11 | Year 12 | Total | |
|---|---|---|---|
| Musician | half of | ||
| Dancer | one-quarter of | half of | |
| Total | 32 | 28 | 60 |
Percentage of students who are musicians:
| Descriptor | Marks |
|---|---|
correctly calculates the frequencies for total Year 11 students and total Year 12 students | 1 |
calculates frequencies for dancers in Year 11 and dancers in Year 12 | 1 |
calculates frequencies for musicians in Year 11 and musicians in Year 12 | 1 |
calculates frequencies for total musicians and total dancers | 1 |
calculates percentage of students who are musicians | 1 |
Data was collected relating the number of hours spent fishing and the total number of fish caught.
The linear model for this data was found to be , where is the number of hours spent fishing, and is the total number of fish caught.
Use the model to predict the number of fish caught if 12 hours were spent fishing.
Reveal Answer
| Descriptor | Marks |
|---|---|
Correctly calculates 59 | 1 |
The correlation coefficient for this data is 0.688 and the coefficient of determination is 0.473. Use each of these to describe the strength of the linear association between the two variables and decide if your prediction is valid.
Reveal Answer
A correlation coefficient of 0.688 suggests a moderate association, which means that as the hours spent fishing increase so do the number of fish caught.
A coefficient of determination of 0.473 means that 47% of the variation in results can be explained by the variation of hours spent fishing.
Therefore the prediction of catching 59 fish after fishing for 12 hours may be valid, however other factors will also come into play.
| Descriptor | Marks |
|---|---|
Correctly describes the strength as either moderate or strong | 1 |
correctly describes the meaning of the coefficient of determination | 1 |
evaluates the reasonableness of the solution | 1 |
Hiroki believes that more fish are caught on warmer days. Jiro believes that the number of fish caught in a day is more dependent on the number of people fishing.
Bivariate datasets for six days are shown.
| Temperature, (C) | 32 | 26 | 20 | 27 | 23 | 29 |
|---|---|---|---|---|---|---|
| Number of fish caught, | 530 | 400 | 320 | 220 | 180 | 120 |
| Number of people fishing, | 46 | 58 | 38 | 34 | 30 | 28 |
|---|---|---|---|---|---|---|
| Number of fish caught, | 530 | 400 | 320 | 220 | 180 | 120 |
Calculate the correlation coefficient for each dataset and use the results to identify the explanatory variable for the stronger linear association. Use the least-squares line equation for the stronger linear association to predict the number of fish caught on a 25 C day when 50 people are fishing.
Reveal Answer
Calculate correlation coefficient for each dataset.
| Dataset | Correlation coefficient, |
|---|---|
| vs | 0.3 |
| vs | 0.8 |
The explanatory variable for the stronger linear association is , number of people fishing.
Using calculator,
Equation in terms of given variables is
It is predicted that 420 fish will be caught.
| Descriptor | Marks |
|---|---|
correctly calculates correlation coefficient for each dataset | 1 |
identifies explanatory variable for stronger linear association | 1 |
determines least-squares line equation for dataset with stronger linear association | 1 |
substitutes value for relevant explanatory variable | 1 |
predicts number of fish caught | 1 |