Statistics MCQs & Answers
- Standard Deviation
- Range
- Median
- Variance
Ans. C
The median is a measure of central tendency that represents the middle value of a dataset when arranged in ascending order.
- Sum of data values / Number of data points
- Standard Deviation / Mean
- (Sum of squared differences from the mean) / Number of data points
- Median / Range
Ans. C
The variance is calculated by taking the sum of the squared differences of each data point from the mean and then dividing it by the number of data points.
- The power of the test
- The significance level of the test
- The probability of observing the data or more extreme data under the null hypothesis
- The confidence interval of the test
Ans. C
The p-value represents the probability of observing the data or more extreme data under the null hypothesis, indicating the strength of evidence against the null hypothesis.
- To show the distribution of data and identify outliers
- To display the frequency distribution of data
- To represent the relationship between two variables
- To compare means of different data sets
Ans. A
A box plot is used to visualize the distribution of data, including the identification of outliers and the spread of the data.
- (Sum of products of deviations from means) / (Product of standard deviations)
- (Sum of squared deviations from means) / (Product of means)
- (Sum of squared deviations from means) / (Product of standard deviations)
- (Sum of products of deviations from means) / (Sum of squared deviations from means)
Ans. A
The formula for Pearson’s correlation coefficient (r) involves the sum of products of deviations from means divided by the product of standard deviations of the two variables.
- Mean
- Median
- Range
- Mode
Ans. C
The range measures the spread or variability of data in a data set, calculated as the difference between the maximum and minimum values.
- Normal distribution
- Poisson distribution
- Binomial distribution
- Exponential distribution
Ans. B
The Poisson distribution is used to model the number of events occurring within a fixed interval when events are rare and independent.
- Analysis of Variability
- Analysis of Varying Outcomes
- Analysis of Variance
- Association of Variables
Ans. C
ANOVA stands for “Analysis of Variance,” a statistical technique used to analyze the variance between groups in a dataset.
- To determine the sample size required for an experiment
- To estimate a population parameter with a range of values
- To test the null hypothesis
- To compare means of two groups
Ans. B
The primary purpose of a confidence interval is to provide an estimate of a population parameter along with a range of values within which the parameter is likely to fall.
- The measure of how spread out data values are
- The measure of symmetry in a data distribution
- The measure of central tendency
- The measure of variability
Ans. B
Skewness measures the degree of symmetry or asymmetry in a data distribution. Positive skew indicates a tail on the right, and negative skew indicates a tail on the left.
- 0.05
- 0.01
- 0.10
- 0.50
Ans. A
For a two-tailed test at a 95% confidence level, the significance level is often set at 0.05, meaning there is a 5% chance of making a Type I error.
- Range
- Mean Absolute Deviation (MAD)
- Variance
- Interquartile Range (IQR)
Ans. D
The Interquartile Range (IQR) is less sensitive to extreme outliers because it is based on the middle 50% of the data and ignores extreme values.
- T-test
- Chi-squared test
- ANOVA
- Regression analysis
Ans. B
The Chi-squared test is used to determine if there is a significant relationship between two categorical variables by comparing observed and expected frequencies.
- (Standard Deviation / Mean) × 100
- (Range / Median) × 100
- (Variance / Mode) × 100
- (Mean Absolute Deviation / Range) × 100
Ans. A
The coefficient of variation (CV) is calculated by dividing the standard deviation by the mean and multiplying the result by 100.
- The probability of making a Type II error
- The probability of observing the null hypothesis being true
- The probability of obtaining the observed results by chance under the null hypothesis
- The power of the test
Ans. C
The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, and it helps assess the strength of evidence against the null hypothesis.
- To compare two data sets
- To show the distribution of categorical data
- To display the relationship between two variables
- To represent the frequency distribution of a continuous variable
Ans. D
A histogram is used to represent the frequency distribution of a continuous variable, showing how data is distributed across different values or intervals.
- The mean of a dataset
- Data points that are significantly different from the others
- The median of a dataset
- The range of a dataset
Ans. B
Outliers are data points that are significantly different from the majority of the data in a dataset and may skew statistical analysis.
- Normal distribution
- Poisson distribution
- Binomial distribution
- Exponential distribution
Ans. C
The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials (experiments with two possible outcomes).
- Alternative hypothesis
- Null hypothesis
- Two-tailed hypothesis
- Significance hypothesis
Ans. B
The null hypothesis (H0) assumes no effect or relationship in a statistical test and is used for hypothesis testing.
- Standard Deviation / Sample Size
- Sample Size / Standard Deviation
- Range / Mean
- Variance / Median
Ans. A
The standard error of the mean (SEM) is calculated by dividing the standard deviation by the square root of the sample size.
- Student’s t-test
- Mann-Whitney U test
- Analysis of Variance (ANOVA)
- Chi-squared test
Ans. C
Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.
- To compare means of two groups
- To represent the frequency distribution of data
- To display the relationship between two continuous variables
- To show the distribution of categorical data
Ans. C
A scatter plot is used to display the relationship between two continuous variables, helping to identify patterns and correlations between them.
- (Data Value – Mean) / Standard Deviation
- (Data Value – Median) / Range
- (Data Value – Variance) / Mean
- (Data Value – Mode) / Sample Size
Ans. A
The z-score of a data point in a normal distribution is calculated by subtracting the mean from the data value and dividing by the standard deviation.
- The null hypothesis is likely true
- There is strong evidence against the null hypothesis
- The significance level is 0.02
- The test is inconclusive
Ans. B
A p-value of 0.02 indicates that there is strong evidence against the null hypothesis, and it is likely to be rejected.
- The error made by the statistician during data collection
- The difference between the sample statistic and the population parameter
- The error introduced during data entry and analysis
- The variation within the sample data
Ans. B
Sampling error is the difference between a sample statistic and the population parameter it estimates and is due to random sampling.
- Student’s t-test
- Chi-squared test
- Wilcoxon signed-rank test
- Analysis of Variance (ANOVA)
Ans. C
The Wilcoxon signed-rank test is a non-parametric test used for comparing two related groups or paired data when assumptions of parametric tests are not met.
- Normal distribution
- Poisson distribution
- Exponential distribution
- Negative Binomial distribution
Ans. D
The Negative Binomial distribution describes the number of successful Bernoulli trials before a specified number of failures occurs.
- The level of significance in hypothesis testing
- The likelihood that a sample is representative of the population
- The range of values within which a parameter is estimated to fall
- The proportion of a population captured by a sample
Ans. C
The confidence level represents the range of values within which a parameter is estimated to fall with a certain level of certainty, typically expressed as a percentage.
- (Probability of the event) / (1 – Probability of the event)
- (Probability of the event) × (1 – Probability of the event)
- (Odds of the event) / (1 – Odds of the event)
- (Odds of the event) × (1 – Odds of the event)
Ans. A
The probability of an event can be calculated from the odds ratio using the formula (Probability of the event) / (1 – Probability of the event).
- The strength of a linear relationship between two variables
- The difference between the mean and median of a dataset
- The variability within a sample
- The spread of data values in a dataset
Ans. A
Correlation measures the strength and direction of a linear relationship between two variables, indicating how one variable changes when the other changes.
- Variance
- Standard Deviation
- Range
- Mode
Ans. B
The standard deviation is a measure of how much data values tend to deviate from the mean in a dataset, indicating the spread or dispersion of data.
- The probability of making a Type II error
- The probability of observing the null hypothesis being true
- The probability of obtaining the observed results by chance under the null hypothesis
- The power of the test
Ans. C
The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, helping assess the strength of evidence against the null hypothesis.
- Range
- Mean Absolute Deviation (MAD)
- Variance
- Interquartile Range (IQR)
Ans. D
The Interquartile Range (IQR) is a measure of the spread of data values around the median and is less affected by extreme outliers.
- The standard deviation of a sample
- The margin of error in a confidence interval
- The mean of a population
- The range of values in a data set
Ans. B
The standard error represents the margin of error in a confidence interval, indicating the precision of an estimate based on sample data.
- Pearson’s correlation coefficient (r)
- Spearman’s rank correlation (rho)
- Chi-squared test
- ANOVA
Ans. B
Spearman’s rank correlation (rho) is a measure of association used to assess the relationship between two ordinal variables by ranking their values.
- Incorrectly rejecting a true null hypothesis
- Incorrectly accepting a false null hypothesis
- Correctly rejecting a false null hypothesis
- Correctly accepting a true null hypothesis
Ans. A
A Type I error occurs when a true null hypothesis is incorrectly rejected, leading to a false positive result in hypothesis testing.
- Outlier detection
- Data transformation
- Imputation
- Sampling
Ans. C
Imputation is the method used to assign a value to missing data points based on other available data in a dataset, allowing for analysis with complete data.
- Interquartile Range (IQR)
- Variance
- Standard Error
- Coefficient of Variation
Ans. A
The Interquartile Range (IQR) is the range of values that separates the central 50% of data from the extreme values in a dataset and is a measure of data spread.
- Chi-squared test
- Two-sample t-test
- Mann-Whitney U test
- ANOVA
Ans. B
The two-sample t-test is used to determine if there is a significant difference in means between two independent groups or samples.
- The measure of symmetry in a data distribution
- The measure of central tendency
- The measure of variability within a sample
- The measure of the spread of data values
Ans. A
Skewness measures the symmetry or asymmetry in a data distribution. Positive skew indicates a right-skewed distribution, while negative skew indicates a left-skewed distribution.
- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution
Ans. C
The exponential distribution is commonly used to model the time between events occurring at a constant rate or in a Poisson process.
- Confidence interval
- Type I error
- P-value
- Margin of error
Ans. C
The p-value is the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test. A smaller p-value indicates stronger evidence against the null hypothesis.
- Mean
- Median
- Mode
- Range
Ans. A
The mean is the measure of central tendency most affected by outliers, as it takes all values into account when calculating the average.
- Chi-squared test
- Pearson’s correlation coefficient
- Mann-Whitney U test
- T-test
Ans. B
Pearson’s correlation coefficient is used to assess the strength and direction of the linear relationship between two continuous variables.
- (Odds of the event) / (Odds against the event)
- (Probability of the event) × (Probability against the event)
- (Odds of the event) / (Probability of the event)
- (Probability of the event) / (1 – Probability of the event)
Ans. A
The probability of an event can be calculated from odds using the formula (Odds of the event) / (Odds against the event).
- Standard Deviation
- Range
- Variance
- Dispersion
Ans. D
Dispersion is a measure of how data values tend to cluster around a central point, reflecting the degree of spread or concentration in the data.
- A range of values within which a parameter is estimated to fall
- The probability of making a Type I error
- The strength of a linear relationship between two variables
- The margin of error in a hypothesis test
Ans. A
A confidence interval represents a range of values within which a parameter is estimated to fall with a specified level of confidence.
- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution
Ans. D
The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.
- The probability of making a Type I error
- The probability of observing the null hypothesis being true
- The probability of correctly rejecting the null hypothesis
- The probability of obtaining extreme data by chance
Ans. C
Power represents the probability of correctly rejecting the null hypothesis when it is false, indicating the test’s ability to detect a true effect.
- Scatter plot
- Box plot
- Bar chart
- Frequency table
Ans. B
A box plot (box-and-whisker plot) is a graphical representation that displays the distribution, central tendency, and spread of a dataset.
- Continuous data
- Nominal data
- Ordinal data
- Discrete data
Ans. D
Discrete data is a type of data that can take on only specific, often whole number values, and is used to represent counts or categories.
- Confidence interval
- Sampling error
- Type I error
- Standard error
Ans. D
Standard error is a measure of the degree of uncertainty or variability associated with a statistic and indicates how much the sample statistic might vary from the population parameter.
- Student’s t-test
- Chi-squared test
- Mann-Whitney U test
- Analysis of Variance (ANOVA)
Ans. D
Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.
- The range of data values
- The strength and direction of a relationship between two variables
- The probability of making a Type I error
- The margin of error in a confidence interval
Ans. B
A correlation coefficient measures the strength and direction of a relationship between two variables, indicating how they are related to each other.
- Standard deviation
- Range
- Interquartile range
- Mode
Ans. A
Standard deviation is a measure of how data values are distributed around a central point, representing the degree of spread or dispersion in the data.
- Skewness
- Variance
- Standard error
- Mean absolute deviation (MAD)
Ans. B
Variance is a measure of how much data values tend to deviate from the mean in a dataset, indicating the degree of variability.
- The probability of making a Type I error
- The significance level or threshold for rejecting the null hypothesis
- The p-value
- The probability of correctly accepting the null hypothesis
Ans. B
The alpha level represents the significance level or threshold for rejecting the null hypothesis in hypothesis testing.
- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution
Ans. C
The exponential distribution is used to model the time between events occurring at a constant rate or in a Poisson process.
- Outlier detection
- Feature selection
- Imputation
- Hypothesis testing
Ans. B
Feature selection is a method used to reduce the dimensionality of data while retaining as much relevant information as possible for analysis.
- The number of data points in a dataset
- The sample size
- The number of groups in an ANOVA test
- The number of values that are free to vary in a statistical calculation
Ans. D
Degrees of freedom refer to the number of values that are free to vary in a statistical calculation and play a role in various statistical tests.
- The hypothesis that is proven to be true
- The alternative hypothesis
- The hypothesis to be rejected if evidence suggests otherwise
- The initial assumption to be tested
Ans. D
The null hypothesis is the initial assumption to be tested in hypothesis testing, typically representing no effect or no difference.
- Student’s t-test
- Pearson’s chi-squared test
- Wilcoxon signed-rank test
- Analysis of Variance (ANOVA)
Ans. B
Pearson’s chi-squared test is used to assess the significant difference between observed and expected frequencies in a contingency table.
- Simple random sampling
- Cluster sampling
- Stratified sampling
- Convenience sampling
Ans. C
Stratified sampling is a method in which the population is divided into non-overlapping strata, and a random sample is taken from each stratum, ensuring representation from all groups.
- Mean
- Median
- Mode
- Range
Ans. C
The mode is a measure of central tendency used with nominal data and represents the most frequently occurring value in a dataset.
- The level of significance in hypothesis testing
- The likelihood that a sample is representative of the population
- The range of values within which a parameter is estimated to fall
- The proportion of a population captured by a sample
Ans. C
The confidence level in a confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence.
- Central tendency
- Variance
- Standard error
- Mode
Ans. B
Variance is the measure of how data values are spread out or dispersed in a dataset, indicating the degree of variability.
- Probability
- Causation
- Correlation
- Variance
Ans. C
Correlation is a statistical measure that describes the direction and strength of a relationship between two variables, indicating how they are related.
- Skewness
- Kurtosis
- Central tendency
- Spread
Ans. A
Skewness is a statistical measure that describes the symmetry or asymmetry of a probability distribution, indicating whether it is skewed to the left or right.
- The process of selecting a sample from a population
- The list of all elements in the population
- The margin of error in a confidence interval
- The probability of making a Type I error
Ans. B
A sampling frame is the list of all elements in the population from which a sample is drawn, serving as the basis for selecting a sample.
- Central tendency
- Standard error
- Variance
- Mode
Ans. C
Variance is a measure of how much data values are dispersed around the mean in a dataset, indicating the degree of spread.
- Student’s t-test
- Chi-squared test
- Analysis of Variance (ANOVA)
- Wilcoxon signed-rank test
Ans. D
The Wilcoxon signed-rank test is used to determine if there is a significant difference between the means of two paired groups or conditions when the data is not normally distributed.
- Skewness
- Variance
- Kurtosis
- Dispersion
Ans. C
Kurtosis is a statistical measure that describes the degree to which data values are concentrated around the mean, indicating the shape of the distribution.
- Variance
- Interquartile range
- Mode
- Range
Ans. B
The interquartile range is a measure of the spread of data values around the median in a dataset, representing the central 50% of the data.
- Pearson’s correlation coefficient
- Spearman’s rank correlation
- Chi-squared test
- ANOVA
Ans. B
Spearman’s rank correlation is a statistical measure that describes the strength and direction of a non-linear relationship between two variables by ranking their values.
- The probability of making a Type I error
- The likelihood of observing the data or more extreme data, assuming the null hypothesis is true
- The strength of a linear relationship between two variables
- The margin of error in a confidence interval
Ans. B
The p-value represents the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in hypothesis testing. A smaller p-value indicates stronger evidence against the null hypothesis.
- Type I error
- Type II error
- Power
- Significance level
Ans. C
Power is a measure of the reliability of a statistical test in detecting a true effect, indicating the test’s ability to avoid a Type II error (false negative).
- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution
Ans. D
The binomial distribution is used to model the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.
- R-squared (R^2)
- Coefficient of determination
- Pearson’s correlation coefficient
- Standard error of the estimate
Ans. A
R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model.
- Dispersion
- Range
- Skewness
- Central tendency
Ans. D
Central tendency is the measure of the center of a probability distribution, indicating the typical or central value of a dataset.
- Analysis of Variance (ANOVA)
- Chi-squared test
- T-test
- Mann-Whitney U test
Ans. C
The T-test is used to compare the means of two independent groups or conditions in a statistical analysis.
- Regression
- Correlation
- Variance
- Kurtosis
Ans. B
Correlation is a measure of the extent to which two variables change together in a linear relationship, indicating the strength and direction of the relationship.
- Standard error
- Interquartile range
- Mode
- Skewness
Ans. B
The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data.
- Simple random sampling
- Cluster sampling
- Stratified sampling
- Convenience sampling
Ans. A
Simple random sampling is a method in which elements are randomly selected from a population, and every element has an equal chance of being selected, ensuring unbiased representation.
- Skewness
- Kurtosis
- Central tendency
- Spread
Ans. B
Kurtosis is a statistical measure that describes the shape of a probability distribution, indicating whether it is peaked or flat compared to a normal distribution.
- Confidence level
- Z-score
- Area under the curve
- Percentile
Ans. C
The term “Area under the curve” represents the proportion of the total area under a normal distribution curve between two specific values, indicating the probability of observing data within that range.
- Rejecting the null hypothesis when it is true
- Failing to reject the null hypothesis when it is false
- The power of the test
- The probability of making a Type II error
Ans. A
Type I error represents the error of rejecting the null hypothesis when it is true, leading to a false positive conclusion in hypothesis testing.
- Standard deviation
- Interquartile range
- Pearson’s correlation coefficient
- Mode
Ans. C
Pearson’s correlation coefficient is a measure of the strength of a linear relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship.
- The range of data values in a dataset
- The likelihood of making a Type I error in hypothesis testing
- The range of values within which a parameter is estimated to fall with a specified level of confidence
- The margin of error in a confidence interval
Ans. C
A confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence, typically denoted by a confidence level.
- Student’s t-test
- Chi-squared test
- Wilcoxon signed-rank test
- Analysis of Variance (ANOVA)
Ans. B
The Chi-squared test is used to determine if there is a significant association between two categorical variables, testing the independence of variables in a contingency table.
- Interquartile range
- Mode
- Standard deviation
- Range
Ans. C
Standard deviation is a measure of the average distance of data values from the mean in a dataset, indicating the degree of dispersion or variability.
- Range
- Skewness
- Interquartile range
- Mode
Ans. C
The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data distribution.
- Chi-squared statistic
- P-value
- Pearson’s correlation coefficient
- Standard error
Ans. C
Pearson’s correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between two variables, providing a value between -1 and 1.
- The probability of making a Type I error
- The margin of error in a confidence interval
- The significance level or threshold for rejecting the null hypothesis
- The strength of a linear relationship between two variables
Ans. A
The p-value represents the probability of making a Type I error, which is the error of incorrectly rejecting the null hypothesis when it is true.
- Sensitivity
- Specificity
- Precision
- Accuracy
Ans. A
Sensitivity is a measure that describes the proportion of true positive results out of all actual positive cases in a classification problem, indicating the model’s ability to detect positives correctly.
- Outlier detection
- Feature selection
- Hypothesis testing
- Imputation
Ans. D
Imputation is a method used to replace missing values in a dataset with estimated values based on other data points, ensuring completeness for analysis.
- Z-score
- Confidence level
- Percentile
- P-value
Ans. C
The term “Percentile” represents the measure of the proportion of the total area under a probability distribution curve to the left of a specific value, indicating the position of a value within a distribution.
- Type I error
- Type II error
- Power
- Confidence level
Ans. C
Power is the probability of correctly rejecting the null hypothesis in hypothesis testing, indicating the test’s ability to detect a true effect.
- Variance
- Range
- Standard error
- Interquartile range
Ans. B
The range is a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values.
- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution
Ans. B
The Poisson distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence, such as the number of emails received per hour.
- Median
- Mode
- Mean
- Interquartile range
Ans. C
The mean is the measure of the average value of a set of data points, calculated by summing all values and dividing by the number of data points.
- The probability of making a Type I error
- The likelihood that a sample is representative of the population
- The range of values within which a parameter is estimated to fall
- The proportion of a population captured by a sample
Ans. A
The significance level in hypothesis testing represents the probability of making a Type I error, typically denoted by alpha (α).
- Hypothesis testing
- Confidence interval
- Regression analysis
- Bootstrapping
Ans. B
A confidence interval is a method used to estimate population parameters based on sample data, taking into account sampling variability and providing a range of possible values for the parameter.
- Sensitivity
- Specificity
- Precision
- Accuracy
Ans. B
Specificity is a measure that describes the proportion of true negative results out of all actual negative cases in a classification problem, indicating the model’s ability to correctly identify negatives.
- R-squared (R^2)
- Coefficient of determination
- Pearson’s correlation coefficient
- Standard error of the estimate
Ans. A
R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model, also known as the coefficient of determination.