SCSA Mathematics Specialist Statistical inference
15 sample questions with marking guides and sample answers · Avg. score: 47.3%
Consider the following information.
| mean | variance | |
|---|---|---|
| Continuous random variable |
The waiting time (minutes) until workers at a certain call centre receive their th phone call, where , is a random variable with probability density function
where is a positive constant.
The waiting time until workers receive their 5th call is collected from a random sample of 80 workers.
Determine the probability that the mean waiting time from this sample is more than 16 minutes.
Reveal Answer
Using the property of a PDF
Using in the given PDF
Solving the equation:
Mean of distribution for waiting time until 5th call,
Variance of distribution for 5th call
Consider the distribution of the sample mean of the waiting time until the 5th phone call is received, .
As the sample size is large, the distribution of can be considered normal.
and
Using normal cdf on GDC:
| Descriptor | Marks |
|---|---|
Correctly determines equation in terms of k | 1 |
Solves equation to determine k | 1 |
Determines population mean | 1 |
Determine population variance | 1 |
Justifies that the distribution of T can be considered normal | 1 |
Determines mean and standard deviation of the sample mean | 1 |
Determines required probability | 1 |
In a town, the mean number of residents per household is 3.79 people with a standard deviation of 1.47 people.
Using a random sample of 45 households from the town, determine the probability that the mean number of residents per household will be more than 4.
0.17
0.33
0.83
0.96
Reveal Answer
0.17
First, calculate the z-score: . The probability is .
0.33
This value does not result from the standard normal distribution calculation using the Central Limit Theorem parameters provided.
0.83
This represents the probability that the mean is less than 4 (). You must subtract this from 1 to find the probability of being more than 4.
0.96
This value is the calculated z-score (), not the probability associated with that z-score.
The inner diameter of a cylinder in a motor car engine is critical to its performance. Let mm denote the population mean cylinder diameter produced by a manufacturing process. A random sample, , of 100 cylinder diameters is taken and the standard deviation for this sample was found to be 1 mm.
Let the sample mean cylinder diameter for sample .
From random sample , a 95% confidence interval for is formed.
State the distribution for and its parameters.
Reveal Answer
Since , then
i.e. normally distributed and centred with a mean and estimated standard deviation of the sample mean
| Descriptor | Marks |
|---|---|
states that the sample mean will be normally distributed | 1 |
states the mean of the distribution is | 1 |
states the expected standard deviation is | 1 |
What is the probability that differs from by more than 0.2 mm. Give your answer correct to 0.001.
Reveal Answer
| Descriptor | Marks |
|---|---|
forms the correct probability statement in terms of | 1 |
calculates the probability correct to | 1 |
Calculate the width of this confidence interval, correct to 0.001.
Reveal Answer
Width
| Descriptor | Marks |
|---|---|
forms the correct expression for the width | 1 |
calculates the width correctly | 1 |
Lilian, the production manager, wishes to decrease the width of the confidence interval. She suggests:
"We can form sample by using the data from sample and then combining this data with itself to form a sample with 200 observations. Using will decrease the width of the confidence interval."
State two major problems with using this idea.
Reveal Answer
Any 2 of the following:
- The idea simply replicates (repeats) the data in sample . As such the sample is no longer random. Therefore the assumptions for using the normal distribution for the sample mean does not hold anymore.
- Repeating the data values in sample will not reflect the true random variation in the data (manufacturing process). The confidence interval will therefore not be a true representation of the variation in the sample mean.
- The sample mean and standard deviation will change if a new larger sample is taken. However, with Lillian's idea these will not change. This will affect both the width and location of the confidence interval.
- Replicating the sample does NOT decrease the width of the confidence interval. Let the sample observations for be .
| Descriptor | Marks |
|---|---|
states that the data will no longer be a random sample | 1 |
states that the assumptions for using the normal distribution for the sample mean does not hold anymore (i.e. states any two of the four points outlined in the solution) | 1 |
A company claims that the mean battery life of their latest model of smartphone is 9.5 hours.
To test this claim, the battery lives of a random sample of 40 of the smartphones were measured.
A sample mean of 9.31 hours and a standard deviation of 0.52 hours were calculated from this data.
Determine an approximate 95% confidence interval for . Give your answer to at least two decimal places.
Reveal Answer
Given and
Using GDC
hours
| Descriptor | Marks |
|---|---|
correctly calculates 95% confidence interval to at least two decimal places | 1 |
Determine an approximate 99% confidence interval for . Give your answer to at least two decimal places.
Reveal Answer
Using GDC
hours
| Descriptor | Marks |
|---|---|
correctly calculates 99% confidence interval to at least two decimal places | 1 |
A manager comments that either confidence interval could be used to support the company’s claim.
Use your results from Questions 11a) and 11b) to evaluate the reasonableness of the manager’s comment. Justify your decision using mathematical reasoning.
Reveal Answer
The 95% confidence interval does not include the claimed mean battery life of 9.5 hours, although the 99% CI does.
So the comment is not reasonable.
| Descriptor | Marks |
|---|---|
justifies decision using mathematical reasoning | 1 |
provides appropriate statement of reasonableness | 1 |
Rounded to two decimal places, the z-value used in the calculation of an approximate 95% confidence interval for is
0.95
1.64
1.96
2.58
Reveal Answer
0.95
This value represents the confidence level itself (0.95), not the critical z-score derived from the standard normal distribution.
1.64
This z-value (approximately 1.645) is typically used for a 90% confidence interval, corresponding to a tail area of 0.05.
1.96
For a 95% confidence interval, the significance level is . The critical value leaves in the upper tail, which corresponds to .
2.58
This z-value is typically used for a 99% confidence interval, corresponding to a tail area of 0.005.
The time taken to complete orders at a pizza store is normally distributed with a mean time () of 10 minutes.
The owner of the pizza store records the time taken to complete orders for a random sample of 20 pizzas each day over a 30-day period. From this data, an approximate 90% confidence interval for is calculated at the end of each day.
How many of these confidence intervals would be expected to contain ?
3
18
27
30
Reveal Answer
3
This represents of the 30 days (). This is the expected number of intervals that would \textit{fail} to contain the mean, not the number that would contain it.
18
This represents only of the 30 days (). Given a confidence level, the expected number of successful intervals should be higher.
27
By definition, a confidence interval is expected to contain the true population parameter of the time in repeated sampling. Therefore, the expected number is .
30
This assumes that every single interval will contain the mean (). While possible, the expected value is determined by the specific confidence level of , not .
The mass of chocolate that is placed into each biscuit produced by the BikkiesAreUs company has been observed to be normally distributed with mean grams and standard deviation grams.
Determine the probability, correct to 0.01, that the total amount of chocolate used for 50 biscuits is less than 365 grams.
Reveal Answer
| Descriptor | Marks |
|---|---|
states that the sample mean is a normal random variable | 1 |
states the correct parameters for the normal random variable | 1 |
calculates the sample mean correctly for the total 365 grams | 1 |
determines the correct probability (to 0.01) | 1 |
If the probability that the mean amount of chocolate used per biscuit differs from by less than 0.2 grams is 98%, determine , the number of biscuits that need to be sampled.
Reveal Answer
i.e. we require at least 305 biscuits to have the sample mean differ by less than 0.2 grams
| Descriptor | Marks |
|---|---|
uses the standard score that represents 98% confidence | 1 |
forms the correct inequality/equation to solve for | 1 |
states the correct minimum integer value for | 1 |
A competitor company called YouBeautChokkies produces similar biscuits. A sample of 144 biscuits was taken and it was found that the standard deviation of the mass of chocolate used in each biscuit was 1.8 grams and the total amount of chocolate used in the sample of 144 biscuits was 1.09 kg.
Charlie Chokka, a representative from the YouBeautChokkies company, stated that "we are using significantly more chocolate for each biscuit than BikkiesAreUs. If you want that real chocolate taste, then buy from us!"
Perform the necessary calculations to comment on Charlie's claim.
Reveal Answer
Let be the population mean for the mass of chocolate per biscuit for the YBC company (grams).
For the YBC total of 1090 grams, this gives grams.
is WITHIN the confidence interval using . i.e. the claim is NOT vindicated.
i.e. the YBC company are NOT using significantly more chocolate per biscuit than compared to BAU.
| Descriptor | Marks |
|---|---|
determines the expected variation using | 1 |
determines an appropriate confidence interval for the YouBeautChokkies population mean | 1 |
states that the BikkiesAreUs population mean 7.5 is within the confidence interval | 1 |
concludes correctly by writing a comment about the claim | 1 |
The travel time for students attending a certain university is assumed to be normally distributed, with a population mean of 25.2 minutes and standard deviation of 4.7 minutes.
Travel times are collected from a random sample of 120 of these students and used to calculate a sample mean, , in minutes.
Determine .
Reveal Answer
Given
Using GDC
| Descriptor | Marks |
|---|---|
correctly calculates for the first sample | 1 |
calculates required probability | 1 |
Given , determine the value of .
Reveal Answer
Using GDC
minutes
| Descriptor | Marks |
|---|---|
calculates | 1 |
Travel times are collected from a second random sample of the university's students and used to calculate a second sample mean, , in minutes.
Given , determine the number of students in the second sample.
Reveal Answer
Using GDC
The sample size is 35.
| Descriptor | Marks |
|---|---|
correctly calculates the z-value based on given probability | 1 |
determines an equation in terms of the sample size (n) | 1 |
determines an approximate value of n | 1 |
evaluates the reasonableness of the solution by rounding n to an integer value | 1 |
A random variable is normally distributed with a mean of 36 and a standard deviation of 4.
The respective mean and standard deviation of the distribution of from repeated random samples of size 9 are
4 and
4 and
36 and
36 and
Reveal Answer
4 and
This is incorrect because the mean of the sampling distribution should equal the population mean (36), not the population standard deviation (4). Additionally, the standard deviation is calculated incorrectly as .
4 and
This is incorrect because the mean of the sampling distribution is equal to the population mean (36), not 4. However, the standard deviation value of is calculated correctly.
36 and
This is incorrect because the standard deviation of the sample mean (standard error) is calculated as instead of the correct formula .
36 and
The mean of the sampling distribution equals the population mean (36), and the standard deviation is calculated as .
A researcher is interested in estimating the population mean (dollars) that Perth residents had spent via online shopping in December 2020. A random sample of size gave a sample mean of $400, a sample standard deviation and a 95% confidence interval of width $200.
Four different confidence intervals (A, B, C and D) are obtained for the mean amount spent via online shopping by Perth residents in December 2020.
| Confidence interval | Sample size | Sample standard deviation | Confidence level |
|---|---|---|---|
| A | 95% | ||
| B | 99% | ||
| C | 95% | ||
| D | 95% |
For each of the following, state the confidence interval that has the smaller width. Justify your answers.
State the 95% confidence interval obtained.
Reveal Answer
| Descriptor | Marks |
|---|---|
states the upper and lower limits of the interval correctly | 1 |
Calculate the standard deviation of the sample mean, correct to $0.01.
Reveal Answer
Margin of error
i.e.
| Descriptor | Marks |
|---|---|
forms the equation relating the margin of error and the standard deviation correctly | 1 |
determines the standard deviation correctly to 0.01 | 1 |
In terms of , what sample size would yield a 95% confidence interval of width $50? Show your reasoning.
Reveal Answer
The interval width is reduced by a factor of 4, so the sample size needs to increase by a factor of
i.e. a sample size of is required.
| Descriptor | Marks |
|---|---|
uses an interval width equal to one-quarter the original | 1 |
states the new sample size in terms of | 1 |
What is the probability that another sample of size would produce a sample mean that differs from by more than $50?
Reveal Answer
| Descriptor | Marks |
|---|---|
determines the standard deviation for the sample size correctly | 1 |
forms the correct probability statement | 1 |
calculates the correct probability | 1 |
Which of the confidence intervals (A, B, C or D) contains , the population mean expenditure for online shopping in December 2020? Justify your answer.
Reveal Answer
Since the true value of is unknown, we CANNOT determine which interval contains the true mean. This is due to the inherent nature of random sampling.
| Descriptor | Marks |
|---|---|
states we cannot determine which interval contains | 1 |
states that either is unknown OR refers to the nature of random sampling | 1 |
... A and B.
Reveal Answer
Confidence interval A will have the smaller width since the level of confidence 95% is less than that of B 99%.
| Descriptor | Marks |
|---|---|
justifies why A will have the smaller width | 1 |
... (ii) C and D.
Reveal Answer
Need to compare the standard deviation of the sample means:
Hence confidence interval C will have the smaller width.
| Descriptor | Marks |
|---|---|
justifies why C will have the smaller width by correctly comparing the respective standard deviations of the sample mean | 1 |
A scientist investigates the distribution of the masses of fish in a particular river. A 95% confidence interval for the mean mass of a fish, in grams, calculated from a random sample of 100 fish is (70.2, 75.8).
The sample mean divided by the population standard deviation is closest to
1.3
2.6
5.1
10.2
13.0
Reveal Answer
1.3
This is incorrect. The sample mean is 73 and the population standard deviation is approximately 14.29, which does not yield a ratio of 1.3.
2.6
This is incorrect. This value is half of the correct ratio, which might result from incorrectly using the full interval width (5.6) instead of the margin of error (2.8) to calculate the standard deviation.
5.1
This is correct. The sample mean is the midpoint of the interval, . The margin of error is 2.8, so , giving . The ratio is .
10.2
This is incorrect. This is double the correct ratio, likely resulting from forgetting to divide the interval width by 2 when calculating the margin of error, which would incorrectly halve the calculated standard deviation.
13.0
This is incorrect. This value does not represent the ratio of the sample mean (73) to the population standard deviation ().
The scores on a test are assumed to be normally distributed.
Researchers use the results from a random sample of scores to calculate a confidence interval for the population mean. However, a shorter confidence interval width is required so the researchers decide to use a second sample for their calculations.
Assuming that the standard deviations for both samples are the same, the researchers can ensure that a shorter confidence interval width is produced by
decreasing the sample size and decreasing the confidence level.
decreasing the sample size and increasing the confidence level.
increasing the sample size and decreasing the confidence level.
increasing the sample size and increasing the confidence level.
Reveal Answer
decreasing the sample size and decreasing the confidence level.
Decreasing the sample size increases the standard error (), which widens the interval and counteracts the narrowing effect of a lower confidence level.
decreasing the sample size and increasing the confidence level.
Both decreasing the sample size and increasing the confidence level contribute to a wider confidence interval, not a shorter one.
increasing the sample size and decreasing the confidence level.
A confidence interval width is determined by . Increasing the sample size () reduces the standard error, and decreasing the confidence level reduces the critical value (), both of which shorten the interval.
increasing the sample size and increasing the confidence level.
Increasing the confidence level requires a larger critical value, which widens the interval and opposes the narrowing effect of the increased sample size.
Repeated random samples will be used to calculate a large number of 90% confidence intervals for a population mean .
Which statement best describes the possible outcomes?
Approximately 90% of the intervals will contain .
More than 90% of the intervals will contain .
Less than 90% of the intervals will contain .
Exactly 90% of the intervals will contain .
Reveal Answer
Approximately 90% of the intervals will contain .
This is the definition of a confidence level; in repeated sampling, the proportion of intervals capturing the true population mean will approach the stated confidence level (90%).
More than 90% of the intervals will contain .
A 90% confidence level indicates that the method is designed to capture the parameter 90% of the time, not more than that.
Less than 90% of the intervals will contain .
A 90% confidence level indicates that the method is designed to capture the parameter 90% of the time, not less than that.
Exactly 90% of the intervals will contain .
Due to sampling variability, the actual proportion of intervals capturing in a finite number of samples will likely be close to 90%, but rarely exactly 90%.
The mass of a certain species of kangaroo is known to be normally distributed with a mean mass of kg and standard deviation of kg.
When one of the kangaroos is randomly selected, the probability that its mass is greater than 83.2 kg is 0.145.
When a sample of 12 kangaroos is randomly selected, the probability that the sample mean mass is less than 74.1 kg is 0.079.
A 90% approximate confidence interval for is calculated using a random sample of of the kangaroos that has a sample mean mass of 79.1 kg and a sample standard deviation equal to .
Determine the possible range of values that could have been, given that the confidence interval did not contain .
Reveal Answer
Sample 1:
Sample 2:
Using graph facility of GDC to solve (1) and (2)
Sample 3: Consider the 90% CI
Since can only lie in an interval below the lower bound of CI.
Determining where the lower bound of CI
Using solve facility of GDC,
As must lie in an interval below the lower bound of CI, the range of values is where .
| Descriptor | Marks |
|---|---|
correctly uses the sample of 1 to determine an equation in terms of μ and σ | 1 |
correctly uses the sample of 12 to determine an equation in terms of μ and σ | 1 |
solves simultaneous equations to determine the values of μ and σ | 1 |
determines solution of n | 1 |
evaluates the reasonableness of the solution to the equation to determine suitable integer values of n | 1 |
shows logical organisation communicating key steps | 1 |
The WeLuvYas Bank extends personal loans to approved customers. A random sample of personal loans is taken. A 99% confidence interval for the population mean loan (in thousands of dollars) based on this sample is .
What is the mean personal loan for this sample?
Reveal Answer
Hence the mean personal loan was $17 800.
| Descriptor | Marks |
|---|---|
calculates the midpoint of the confidence interval correctly | 1 |
states the personal loan amount in dollars | 1 |
Calculate the standard deviation of the sample mean.
Reveal Answer
Half-width
i.e. (2 d.p.)$
i.e. standard deviation is $2950
| Descriptor | Marks |
|---|---|
forms the expression for half-width of interval in terms of the standard deviation | 1 |
calculates the standard deviation correctly | 1 |
Ali exclaims excitedly 'everyone here at WeLuvYas is 99% certain that the true population mean is within the interval '.
State two reasons why Ali is not correct.
Reveal Answer
Ali is not correct. It can be said that:
- A single confidence interval either contains or it doesn't.
- The value of is unknown so we do not know if any given CI contains .
- If we repeatedly take samples of size then we will find that approximately 99% of these intervals will contain the true value of .
- The value of may be less than 30, meaning that the distribution of the sample mean may not be distributed, hence the 99% confidence interval may not be valid.
| Descriptor | Marks |
|---|---|
states one reason | 1 |
states a second reason | 1 |
A data analyst discovers that the sample size was actually . In addition to this, the sample mean was actually $2000 more than that originally determined.
Re-calculate the 99% confidence interval for the population mean on the basis of the updated information.
Reveal Answer
The confidence interval changes is two ways.
- The whole interval is translated upwards by 2.
- The standard error and consequently the width of the interval is scaled by a factor of :
| Descriptor | Marks |
|---|---|
indicates the midpoint of the confidence interval is increased by 2 | 1 |
calculates the new standard deviation correctly | 1 |
calculates the new confidence interval correctly | 1 |