## Statistics MCQs & Answers

**Which of the following is a measure of central tendency?**

- Standard Deviation
- Range
- Median
- Variance

Ans. C

The median is a measure of central tendency that represents the middle value of a dataset when arranged in ascending order.

**What is the formula for calculating the variance of a data set?**

- Sum of data values / Number of data points
- Standard Deviation / Mean
- (Sum of squared differences from the mean) / Number of data points
- Median / Range

Ans. C

The variance is calculated by taking the sum of the squared differences of each data point from the mean and then dividing it by the number of data points.

**What does the p-value represent in hypothesis testing?**

- The power of the test
- The significance level of the test
- The probability of observing the data or more extreme data under the null hypothesis
- The confidence interval of the test

Ans. C

The p-value represents the probability of observing the data or more extreme data under the null hypothesis, indicating the strength of evidence against the null hypothesis.

**What is the purpose of a box plot in data visualization?**

- To show the distribution of data and identify outliers
- To display the frequency distribution of data
- To represent the relationship between two variables
- To compare means of different data sets

Ans. A

A box plot is used to visualize the distribution of data, including the identification of outliers and the spread of the data.

**What is the formula for calculating the correlation coefficient (Pearson’s r) between two variables X and Y?**

- (Sum of products of deviations from means) / (Product of standard deviations)
- (Sum of squared deviations from means) / (Product of means)
- (Sum of squared deviations from means) / (Product of standard deviations)
- (Sum of products of deviations from means) / (Sum of squared deviations from means)

Ans. A

The formula for Pearson’s correlation coefficient (r) involves the sum of products of deviations from means divided by the product of standard deviations of the two variables.

**What is the term for the measure of how spread out data values are in a data set?**

- Mean
- Median
- Range
- Mode

Ans. C

The range measures the spread or variability of data in a data set, calculated as the difference between the maximum and minimum values.

**Which statistical distribution is often used to model the number of events occurring within a fixed interval of time or space?**

- Normal distribution
- Poisson distribution
- Binomial distribution
- Exponential distribution

Ans. B

The Poisson distribution is used to model the number of events occurring within a fixed interval when events are rare and independent.

**In statistics, what does the acronym “ANOVA” stand for?**

- Analysis of Variability
- Analysis of Varying Outcomes
- Analysis of Variance
- Association of Variables

Ans. C

ANOVA stands for “Analysis of Variance,” a statistical technique used to analyze the variance between groups in a dataset.

**What is the primary purpose of a confidence interval in statistics?**

- To determine the sample size required for an experiment
- To estimate a population parameter with a range of values
- To test the null hypothesis
- To compare means of two groups

Ans. B

The primary purpose of a confidence interval is to provide an estimate of a population parameter along with a range of values within which the parameter is likely to fall.

**What does the term “skewness” refer to in statistics?**

- The measure of how spread out data values are
- The measure of symmetry in a data distribution
- The measure of central tendency
- The measure of variability

Ans. B

Skewness measures the degree of symmetry or asymmetry in a data distribution. Positive skew indicates a tail on the right, and negative skew indicates a tail on the left.

**In hypothesis testing, what is the significance level often set at for a two-tailed test at a 95% confidence level?**

- 0.05
- 0.01
- 0.10
- 0.50

Ans. A

For a two-tailed test at a 95% confidence level, the significance level is often set at 0.05, meaning there is a 5% chance of making a Type I error.

**Which measure of dispersion is less sensitive to extreme outliers in a data set?**

- Range
- Mean Absolute Deviation (MAD)
- Variance
- Interquartile Range (IQR)

Ans. D

The Interquartile Range (IQR) is less sensitive to extreme outliers because it is based on the middle 50% of the data and ignores extreme values.

**Which of the following statistical tests is used to determine if there is a significant relationship between two categorical variables?**

- T-test
- Chi-squared test
- ANOVA
- Regression analysis

Ans. B

The Chi-squared test is used to determine if there is a significant relationship between two categorical variables by comparing observed and expected frequencies.

**What is the formula for calculating the coefficient of variation (CV) in statistics?**

- (Standard Deviation / Mean) × 100
- (Range / Median) × 100
- (Variance / Mode) × 100
- (Mean Absolute Deviation / Range) × 100

Ans. A

The coefficient of variation (CV) is calculated by dividing the standard deviation by the mean and multiplying the result by 100.

**What does the term “p-value” represent in hypothesis testing?**

- The probability of making a Type II error
- The probability of observing the null hypothesis being true
- The probability of obtaining the observed results by chance under the null hypothesis
- The power of the test

Ans. C

The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, and it helps assess the strength of evidence against the null hypothesis.

**What is the primary purpose of a histogram in data visualization?**

- To compare two data sets
- To show the distribution of categorical data
- To display the relationship between two variables
- To represent the frequency distribution of a continuous variable

Ans. D

A histogram is used to represent the frequency distribution of a continuous variable, showing how data is distributed across different values or intervals.

**In statistics, what does the term “outlier” refer to?**

- The mean of a dataset
- Data points that are significantly different from the others
- The median of a dataset
- The range of a dataset

Ans. B

Outliers are data points that are significantly different from the majority of the data in a dataset and may skew statistical analysis.

**Which type of probability distribution is commonly used to model the number of successes in a fixed number of Bernoulli trials?**

- Normal distribution
- Poisson distribution
- Binomial distribution
- Exponential distribution

Ans. C

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials (experiments with two possible outcomes).

**What is the term for a hypothesis that assumes no effect or relationship in a statistical test?**

- Alternative hypothesis
- Null hypothesis
- Two-tailed hypothesis
- Significance hypothesis

Ans. B

The null hypothesis (H0) assumes no effect or relationship in a statistical test and is used for hypothesis testing.

**What is the formula for calculating the standard error of the mean (SEM) in statistics?**

- Standard Deviation / Sample Size
- Sample Size / Standard Deviation
- Range / Mean
- Variance / Median

Ans. A

The standard error of the mean (SEM) is calculated by dividing the standard deviation by the square root of the sample size.

**Which statistical test is appropriate for comparing the means of three or more groups in a study?**

- Student’s t-test
- Mann-Whitney U test
- Analysis of Variance (ANOVA)
- Chi-squared test

Ans. C

Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.

**What is the primary purpose of a scatter plot in data visualization?**

- To compare means of two groups
- To represent the frequency distribution of data
- To display the relationship between two continuous variables
- To show the distribution of categorical data

Ans. C

A scatter plot is used to display the relationship between two continuous variables, helping to identify patterns and correlations between them.

**What is the formula for calculating the z-score of a data point in a normal distribution?**

- (Data Value – Mean) / Standard Deviation
- (Data Value – Median) / Range
- (Data Value – Variance) / Mean
- (Data Value – Mode) / Sample Size

Ans. A

The z-score of a data point in a normal distribution is calculated by subtracting the mean from the data value and dividing by the standard deviation.

**In a hypothesis test, what does a p-value of 0.02 indicate?**

- The null hypothesis is likely true
- There is strong evidence against the null hypothesis
- The significance level is 0.02
- The test is inconclusive

Ans. B

A p-value of 0.02 indicates that there is strong evidence against the null hypothesis, and it is likely to be rejected.

**What does the term “sampling error” refer to in statistics?**

- The error made by the statistician during data collection
- The difference between the sample statistic and the population parameter
- The error introduced during data entry and analysis
- The variation within the sample data

Ans. B

Sampling error is the difference between a sample statistic and the population parameter it estimates and is due to random sampling.

**Which of the following is a non-parametric statistical test used for comparing two related groups?**

- Student’s t-test
- Chi-squared test
- Wilcoxon signed-rank test
- Analysis of Variance (ANOVA)

Ans. C

The Wilcoxon signed-rank test is a non-parametric test used for comparing two related groups or paired data when assumptions of parametric tests are not met.

**What is the term for the probability distribution that describes the number of successful Bernoulli trials before a specified number of failures is reached?**

- Normal distribution
- Poisson distribution
- Exponential distribution
- Negative Binomial distribution

Ans. D

The Negative Binomial distribution describes the number of successful Bernoulli trials before a specified number of failures occurs.

**What does the term “confidence level” represent in statistics?**

- The level of significance in hypothesis testing
- The likelihood that a sample is representative of the population
- The range of values within which a parameter is estimated to fall
- The proportion of a population captured by a sample

Ans. C

The confidence level represents the range of values within which a parameter is estimated to fall with a certain level of certainty, typically expressed as a percentage.

**What is the formula for calculating the probability of an event using the odds ratio?**

- (Probability of the event) / (1 – Probability of the event)
- (Probability of the event) × (1 – Probability of the event)
- (Odds of the event) / (1 – Odds of the event)
- (Odds of the event) × (1 – Odds of the event)

Ans. A

The probability of an event can be calculated from the odds ratio using the formula (Probability of the event) / (1 – Probability of the event).

**What does the term “correlation” measure in statistics?**

- The strength of a linear relationship between two variables
- The difference between the mean and median of a dataset
- The variability within a sample
- The spread of data values in a dataset

Ans. A

Correlation measures the strength and direction of a linear relationship between two variables, indicating how one variable changes when the other changes.

**What is the term for the measure of how much data values tend to deviate from the mean in a dataset?**

- Variance
- Standard Deviation
- Range
- Mode

Ans. B

The standard deviation is a measure of how much data values tend to deviate from the mean in a dataset, indicating the spread or dispersion of data.

**What does the term “p-value” represent in hypothesis testing?**

- The probability of making a Type II error
- The probability of observing the null hypothesis being true
- The probability of obtaining the observed results by chance under the null hypothesis
- The power of the test

Ans. C

The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, helping assess the strength of evidence against the null hypothesis.

**In statistics, what is the term for a measure of the spread of data values around the median?**

- Range
- Mean Absolute Deviation (MAD)
- Variance
- Interquartile Range (IQR)

Ans. D

The Interquartile Range (IQR) is a measure of the spread of data values around the median and is less affected by extreme outliers.

**What does the term “standard error” represent in statistics?**

- The standard deviation of a sample
- The margin of error in a confidence interval
- The mean of a population
- The range of values in a data set

Ans. B

The standard error represents the margin of error in a confidence interval, indicating the precision of an estimate based on sample data.

**Which of the following is a measure of association used to assess the strength and direction of the relationship between two ordinal variables?**

- Pearson’s correlation coefficient (r)
- Spearman’s rank correlation (rho)
- Chi-squared test
- ANOVA

Ans. B

Spearman’s rank correlation (rho) is a measure of association used to assess the relationship between two ordinal variables by ranking their values.

**In hypothesis testing, what does a Type I error refer to?**

- Incorrectly rejecting a true null hypothesis
- Incorrectly accepting a false null hypothesis
- Correctly rejecting a false null hypothesis
- Correctly accepting a true null hypothesis

Ans. A

A Type I error occurs when a true null hypothesis is incorrectly rejected, leading to a false positive result in hypothesis testing.

**What is the term for the method used to assign a value to missing data points based on other available data in a dataset?**

- Outlier detection
- Data transformation
- Imputation
- Sampling

Ans. C

Imputation is the method used to assign a value to missing data points based on other available data in a dataset, allowing for analysis with complete data.

**What is the term for the range of values that separates the central 50% of data from the extreme values in a dataset?**

- Interquartile Range (IQR)
- Variance
- Standard Error
- Coefficient of Variation

Ans. A

The Interquartile Range (IQR) is the range of values that separates the central 50% of data from the extreme values in a dataset and is a measure of data spread.

**Which statistical test is appropriate for determining if there is a significant difference in means between two independent groups?**

- Chi-squared test
- Two-sample t-test
- Mann-Whitney U test
- ANOVA

Ans. B

The two-sample t-test is used to determine if there is a significant difference in means between two independent groups or samples.

**What does the term “skewness” refer to in statistics?**

- The measure of symmetry in a data distribution
- The measure of central tendency
- The measure of variability within a sample
- The measure of the spread of data values

Ans. A

Skewness measures the symmetry or asymmetry in a data distribution. Positive skew indicates a right-skewed distribution, while negative skew indicates a left-skewed distribution.

**Which statistical distribution is often used to model the time between events occurring at a constant rate?**

- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution

Ans. C

The exponential distribution is commonly used to model the time between events occurring at a constant rate or in a Poisson process.

**What is the term for the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test?**

- Confidence interval
- Type I error
- P-value
- Margin of error

Ans. C

The p-value is the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test. A smaller p-value indicates stronger evidence against the null hypothesis.

**What is the term for the measure of the central tendency that is most affected by outliers in a dataset?**

- Mean
- Median
- Mode
- Range

Ans. A

The mean is the measure of central tendency most affected by outliers, as it takes all values into account when calculating the average.

**Which statistical test is used to determine if there is a significant relationship between two continuous variables?**

- Chi-squared test
- Pearson’s correlation coefficient
- Mann-Whitney U test
- T-test

Ans. B

Pearson’s correlation coefficient is used to assess the strength and direction of the linear relationship between two continuous variables.

**What is the formula for calculating the probability of an event using odds in statistics?**

- (Odds of the event) / (Odds against the event)
- (Probability of the event) × (Probability against the event)
- (Odds of the event) / (Probability of the event)
- (Probability of the event) / (1 – Probability of the event)

Ans. A

The probability of an event can be calculated from odds using the formula (Odds of the event) / (Odds against the event).

**What is the term for a measure of how data values tend to cluster around a central point in a dataset?**

- Standard Deviation
- Range
- Variance
- Dispersion

Ans. D

Dispersion is a measure of how data values tend to cluster around a central point, reflecting the degree of spread or concentration in the data.

**What does the term “confidence interval” represent in statistics?**

- A range of values within which a parameter is estimated to fall
- The probability of making a Type I error
- The strength of a linear relationship between two variables
- The margin of error in a hypothesis test

Ans. A

A confidence interval represents a range of values within which a parameter is estimated to fall with a specified level of confidence.

**Which statistical distribution is used to model the number of successes in a fixed number of independent Bernoulli trials?**

- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution

Ans. D

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.

**What does the term “power” represent in hypothesis testing?**

- The probability of making a Type I error
- The probability of observing the null hypothesis being true
- The probability of correctly rejecting the null hypothesis
- The probability of obtaining extreme data by chance

Ans. C

Power represents the probability of correctly rejecting the null hypothesis when it is false, indicating the test’s ability to detect a true effect.

**What is the term for the graphical representation of data that displays the distribution, central tendency, and spread of a dataset?**

- Scatter plot
- Box plot
- Bar chart
- Frequency table

Ans. B

A box plot (box-and-whisker plot) is a graphical representation that displays the distribution, central tendency, and spread of a dataset.

**What is the term for a type of data that can take on only specific values, typically whole numbers, and is often used to represent counts or categories?**

- Continuous data
- Nominal data
- Ordinal data
- Discrete data

Ans. D

Discrete data is a type of data that can take on only specific, often whole number values, and is used to represent counts or categories.

**What is the term for a measure of the degree of uncertainty or variability associated with a statistic?**

- Confidence interval
- Sampling error
- Type I error
- Standard error

Ans. D

Standard error is a measure of the degree of uncertainty or variability associated with a statistic and indicates how much the sample statistic might vary from the population parameter.

**Which statistical test is used to compare the means of three or more groups in a study?**

- Student’s t-test
- Chi-squared test
- Mann-Whitney U test
- Analysis of Variance (ANOVA)

Ans. D

Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.

**What does the term “correlation coefficient” measure in statistics?**

- The range of data values
- The strength and direction of a relationship between two variables
- The probability of making a Type I error
- The margin of error in a confidence interval

Ans. B

A correlation coefficient measures the strength and direction of a relationship between two variables, indicating how they are related to each other.

**What is the term for a measure of how data values are distributed around a central point in a dataset?**

- Standard deviation
- Range
- Interquartile range
- Mode

Ans. A

Standard deviation is a measure of how data values are distributed around a central point, representing the degree of spread or dispersion in the data.

**What is the term for a measure of how much data values tend to deviate from the mean in a dataset?**

- Skewness
- Variance
- Standard error
- Mean absolute deviation (MAD)

Ans. B

Variance is a measure of how much data values tend to deviate from the mean in a dataset, indicating the degree of variability.

**In hypothesis testing, what does the term “alpha level” represent?**

- The probability of making a Type I error
- The significance level or threshold for rejecting the null hypothesis
- The p-value
- The probability of correctly accepting the null hypothesis

Ans. B

The alpha level represents the significance level or threshold for rejecting the null hypothesis in hypothesis testing.

**Which statistical distribution is used to model the time between events occurring at a constant rate?**

- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution

Ans. C

The exponential distribution is used to model the time between events occurring at a constant rate or in a Poisson process.

**What is the term for a method used to reduce the dimensionality of data while retaining as much information as possible?**

- Outlier detection
- Feature selection
- Imputation
- Hypothesis testing

Ans. B

Feature selection is a method used to reduce the dimensionality of data while retaining as much relevant information as possible for analysis.

**What does the term “degrees of freedom” refer to in statistics?**

- The number of data points in a dataset
- The sample size
- The number of groups in an ANOVA test
- The number of values that are free to vary in a statistical calculation

Ans. D

Degrees of freedom refer to the number of values that are free to vary in a statistical calculation and play a role in various statistical tests.

**What does the term “null hypothesis” represent in hypothesis testing?**

- The hypothesis that is proven to be true
- The alternative hypothesis
- The hypothesis to be rejected if evidence suggests otherwise
- The initial assumption to be tested

Ans. D

The null hypothesis is the initial assumption to be tested in hypothesis testing, typically representing no effect or no difference.

**Which statistical test is used to determine if there is a significant difference between the observed and expected frequencies in a contingency table?**

- Student’s t-test
- Pearson’s chi-squared test
- Wilcoxon signed-rank test
- Analysis of Variance (ANOVA)

Ans. B

Pearson’s chi-squared test is used to assess the significant difference between observed and expected frequencies in a contingency table.

**What is the term for a type of sampling method in which the population is divided into non-overlapping subgroups or strata, and a random sample is then taken from each stratum?**

- Simple random sampling
- Cluster sampling
- Stratified sampling
- Convenience sampling

Ans. C

Stratified sampling is a method in which the population is divided into non-overlapping strata, and a random sample is taken from each stratum, ensuring representation from all groups.

**What is the term for a measure of the central tendency that is often used with nominal data and represents the most frequently occurring value?**

- Mean
- Median
- Mode
- Range

Ans. C

The mode is a measure of central tendency used with nominal data and represents the most frequently occurring value in a dataset.

**What does the term “confidence level” represent in a confidence interval?**

- The level of significance in hypothesis testing
- The likelihood that a sample is representative of the population
- The range of values within which a parameter is estimated to fall
- The proportion of a population captured by a sample

Ans. C

The confidence level in a confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence.

**What is the term for the measure of how data values are spread out in a dataset?**

- Central tendency
- Variance
- Standard error
- Mode

Ans. B

Variance is the measure of how data values are spread out or dispersed in a dataset, indicating the degree of variability.

**What is the term for a statistical measure that describes the direction and strength of a relationship between two variables?**

- Probability
- Causation
- Correlation
- Variance

Ans. C

Correlation is a statistical measure that describes the direction and strength of a relationship between two variables, indicating how they are related.

**What is the term for a statistical measure that describes the symmetry of a probability distribution?**

- Skewness
- Kurtosis
- Central tendency
- Spread

Ans. A

Skewness is a statistical measure that describes the symmetry or asymmetry of a probability distribution, indicating whether it is skewed to the left or right.

**What does the term “sampling frame” refer to in sampling methods?**

- The process of selecting a sample from a population
- The list of all elements in the population
- The margin of error in a confidence interval
- The probability of making a Type I error

Ans. B

A sampling frame is the list of all elements in the population from which a sample is drawn, serving as the basis for selecting a sample.

**What is the term for a measure of how much data values are dispersed around the mean in a dataset?**

- Central tendency
- Standard error
- Variance
- Mode

Ans. C

Variance is a measure of how much data values are dispersed around the mean in a dataset, indicating the degree of spread.

**Which statistical test is used to determine if there is a significant difference between the means of two paired groups or conditions?**

- Student’s t-test
- Chi-squared test
- Analysis of Variance (ANOVA)
- Wilcoxon signed-rank test

Ans. D

The Wilcoxon signed-rank test is used to determine if there is a significant difference between the means of two paired groups or conditions when the data is not normally distributed.

**What is the term for a statistical measure that describes the degree to which data values are concentrated around the mean?**

- Skewness
- Variance
- Kurtosis
- Dispersion

Ans. C

Kurtosis is a statistical measure that describes the degree to which data values are concentrated around the mean, indicating the shape of the distribution.

**What is the term for a measure of the spread of data values around the median in a dataset?**

- Variance
- Interquartile range
- Mode
- Range

Ans. B

The interquartile range is a measure of the spread of data values around the median in a dataset, representing the central 50% of the data.

**What is the term for a statistical measure that describes the strength and direction of a non-linear relationship between two variables?**

- Pearson’s correlation coefficient
- Spearman’s rank correlation
- Chi-squared test
- ANOVA

Ans. B

Spearman’s rank correlation is a statistical measure that describes the strength and direction of a non-linear relationship between two variables by ranking their values.

**What does the term “p-value” represent in hypothesis testing?**

- The probability of making a Type I error
- The likelihood of observing the data or more extreme data, assuming the null hypothesis is true
- The strength of a linear relationship between two variables
- The margin of error in a confidence interval

Ans. B

The p-value represents the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in hypothesis testing. A smaller p-value indicates stronger evidence against the null hypothesis.

**What is the term for a measure of the reliability of a statistical test in detecting a true effect?**

- Type I error
- Type II error
- Power
- Significance level

Ans. C

Power is a measure of the reliability of a statistical test in detecting a true effect, indicating the test’s ability to avoid a Type II error (false negative).

**Which statistical distribution is used to model the number of successes in a fixed number of independent Bernoulli trials?**

- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution

Ans. D

The binomial distribution is used to model the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.

**What is the term for a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model?**

- R-squared (R^2)
- Coefficient of determination
- Pearson’s correlation coefficient
- Standard error of the estimate

Ans. A

R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model.

**What is the term for the measure of the center of a probability distribution in statistics?**

- Dispersion
- Range
- Skewness
- Central tendency

Ans. D

Central tendency is the measure of the center of a probability distribution, indicating the typical or central value of a dataset.

**Which statistical test is used to compare the means of two independent groups or conditions?**

- Analysis of Variance (ANOVA)
- Chi-squared test
- T-test
- Mann-Whitney U test

Ans. C

The T-test is used to compare the means of two independent groups or conditions in a statistical analysis.

**What is the term for a measure of the extent to which two variables change together in a linear relationship?**

- Regression
- Correlation
- Variance
- Kurtosis

Ans. B

Correlation is a measure of the extent to which two variables change together in a linear relationship, indicating the strength and direction of the relationship.

**What is the term for a measure of how data values tend to cluster around the median in a dataset?**

- Standard error
- Interquartile range
- Mode
- Skewness

Ans. B

The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data.

**What is the term for a type of sampling method in which elements are randomly selected from a population, and every element has an equal chance of being selected?**

- Simple random sampling
- Cluster sampling
- Stratified sampling
- Convenience sampling

Ans. A

Simple random sampling is a method in which elements are randomly selected from a population, and every element has an equal chance of being selected, ensuring unbiased representation.

**What is the term for a statistical measure that describes the shape of a probability distribution?**

- Skewness
- Kurtosis
- Central tendency
- Spread

Ans. B

Kurtosis is a statistical measure that describes the shape of a probability distribution, indicating whether it is peaked or flat compared to a normal distribution.

**What is the term for the proportion of the total area under a normal distribution curve between two specific values?**

- Confidence level
- Z-score
- Area under the curve
- Percentile

Ans. C

The term “Area under the curve” represents the proportion of the total area under a normal distribution curve between two specific values, indicating the probability of observing data within that range.

**What does the term “Type I error” represent in hypothesis testing?**

- Rejecting the null hypothesis when it is true
- Failing to reject the null hypothesis when it is false
- The power of the test
- The probability of making a Type II error

Ans. A

Type I error represents the error of rejecting the null hypothesis when it is true, leading to a false positive conclusion in hypothesis testing.

**What is the term for the measure of the strength of a relationship between two variables that varies from -1 to 1, with 0 indicating no linear relationship?**

- Standard deviation
- Interquartile range
- Pearson’s correlation coefficient
- Mode

Ans. C

Pearson’s correlation coefficient is a measure of the strength of a linear relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship.

**What does the term “confidence interval” represent in statistics?**

- The range of data values in a dataset
- The likelihood of making a Type I error in hypothesis testing
- The range of values within which a parameter is estimated to fall with a specified level of confidence
- The margin of error in a confidence interval

Ans. C

A confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence, typically denoted by a confidence level.

**Which statistical test is used to determine if there is a significant association between two categorical variables?**

- Student’s t-test
- Chi-squared test
- Wilcoxon signed-rank test
- Analysis of Variance (ANOVA)

Ans. B

The Chi-squared test is used to determine if there is a significant association between two categorical variables, testing the independence of variables in a contingency table.

**What is the term for a measure of the average distance of data values from the mean in a dataset?**

- Interquartile range
- Mode
- Standard deviation
- Range

Ans. C

Standard deviation is a measure of the average distance of data values from the mean in a dataset, indicating the degree of dispersion or variability.

**What is the term for the measure of how data values tend to cluster around the median in a dataset?**

- Range
- Skewness
- Interquartile range
- Mode

Ans. C

The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data distribution.

**What is the term for a statistical measure that quantifies the strength and direction of a relationship between two variables in a linear model?**

- Chi-squared statistic
- P-value
- Pearson’s correlation coefficient
- Standard error

Ans. C

Pearson’s correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between two variables, providing a value between -1 and 1.

**What does the term “p-value” represent in hypothesis testing?**

- The probability of making a Type I error
- The margin of error in a confidence interval
- The significance level or threshold for rejecting the null hypothesis
- The strength of a linear relationship between two variables

Ans. A

The p-value represents the probability of making a Type I error, which is the error of incorrectly rejecting the null hypothesis when it is true.

**What is the term for a statistical measure that describes the proportion of true positive results out of all actual positive cases in a classification problem?**

- Sensitivity
- Specificity
- Precision
- Accuracy

Ans. A

Sensitivity is a measure that describes the proportion of true positive results out of all actual positive cases in a classification problem, indicating the model’s ability to detect positives correctly.

**What is the term for a method used to impute missing values in a dataset by replacing them with estimated values based on other data points?**

- Outlier detection
- Feature selection
- Hypothesis testing
- Imputation

Ans. D

Imputation is a method used to replace missing values in a dataset with estimated values based on other data points, ensuring completeness for analysis.

**What is the term for a measure of the proportion of the total area under a probability distribution curve to the left of a specific value?**

- Z-score
- Confidence level
- Percentile
- P-value

Ans. C

The term “Percentile” represents the measure of the proportion of the total area under a probability distribution curve to the left of a specific value, indicating the position of a value within a distribution.

**What is the term for the probability of correctly rejecting the null hypothesis in hypothesis testing?**

- Type I error
- Type II error
- Power
- Confidence level

Ans. C

Power is the probability of correctly rejecting the null hypothesis in hypothesis testing, indicating the test’s ability to detect a true effect.

**What is the term for a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values?**

- Variance
- Range
- Standard error
- Interquartile range

Ans. B

The range is a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values.

**Which statistical distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence?**

- Normal distribution
- Poisson distribution
- Exponential distribution
- Binomial distribution

Ans. B

The Poisson distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence, such as the number of emails received per hour.

**What is the term for the measure of the average value of a set of data points?**

- Median
- Mode
- Mean
- Interquartile range

Ans. C

The mean is the measure of the average value of a set of data points, calculated by summing all values and dividing by the number of data points.

**What does the term “significance level” represent in hypothesis testing?**

- The probability of making a Type I error
- The likelihood that a sample is representative of the population
- The range of values within which a parameter is estimated to fall
- The proportion of a population captured by a sample

Ans. A

The significance level in hypothesis testing represents the probability of making a Type I error, typically denoted by alpha (α).

**What is the term for a method used to estimate population parameters based on sample data, taking into account sampling variability?**

- Hypothesis testing
- Confidence interval
- Regression analysis
- Bootstrapping

Ans. B

A confidence interval is a method used to estimate population parameters based on sample data, taking into account sampling variability and providing a range of possible values for the parameter.

**What is the term for a statistical measure that describes the proportion of true negative results out of all actual negative cases in a classification problem?**

- Sensitivity
- Specificity
- Precision
- Accuracy

Ans. B

Specificity is a measure that describes the proportion of true negative results out of all actual negative cases in a classification problem, indicating the model’s ability to correctly identify negatives.

**What is the term for a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model?**

- R-squared (R^2)
- Coefficient of determination
- Pearson’s correlation coefficient
- Standard error of the estimate

Ans. A

R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model, also known as the coefficient of determination.