[100+] Statistics MCQs & Answers {PDF} For NTS, NET, SSB, SSC, IBPS, RRB, Govt. Exams ~ BrainyNote

Statistics MCQs & Answers

Which of the following is a measure of central tendency?

Standard Deviation
Range
Median
Variance

Ans. C

The median is a measure of central tendency that represents the middle value of a dataset when arranged in ascending order.

What is the formula for calculating the variance of a data set?

Sum of data values / Number of data points
Standard Deviation / Mean
(Sum of squared differences from the mean) / Number of data points
Median / Range

Ans. C

The variance is calculated by taking the sum of the squared differences of each data point from the mean and then dividing it by the number of data points.

What does the p-value represent in hypothesis testing?

The power of the test
The significance level of the test
The probability of observing the data or more extreme data under the null hypothesis
The confidence interval of the test

Ans. C

The p-value represents the probability of observing the data or more extreme data under the null hypothesis, indicating the strength of evidence against the null hypothesis.

What is the purpose of a box plot in data visualization?

To show the distribution of data and identify outliers
To display the frequency distribution of data
To represent the relationship between two variables
To compare means of different data sets

Ans. A

A box plot is used to visualize the distribution of data, including the identification of outliers and the spread of the data.

What is the formula for calculating the correlation coefficient (Pearson’s r) between two variables X and Y?

(Sum of products of deviations from means) / (Product of standard deviations)
(Sum of squared deviations from means) / (Product of means)
(Sum of squared deviations from means) / (Product of standard deviations)
(Sum of products of deviations from means) / (Sum of squared deviations from means)

Ans. A

The formula for Pearson’s correlation coefficient (r) involves the sum of products of deviations from means divided by the product of standard deviations of the two variables.

What is the term for the measure of how spread out data values are in a data set?

Mean
Median
Range
Mode

Ans. C

The range measures the spread or variability of data in a data set, calculated as the difference between the maximum and minimum values.

Which statistical distribution is often used to model the number of events occurring within a fixed interval of time or space?

Normal distribution
Poisson distribution
Binomial distribution
Exponential distribution

Ans. B

The Poisson distribution is used to model the number of events occurring within a fixed interval when events are rare and independent.

In statistics, what does the acronym “ANOVA” stand for?

Analysis of Variability
Analysis of Varying Outcomes
Analysis of Variance
Association of Variables

Ans. C

ANOVA stands for “Analysis of Variance,” a statistical technique used to analyze the variance between groups in a dataset.

What is the primary purpose of a confidence interval in statistics?

To determine the sample size required for an experiment
To estimate a population parameter with a range of values
To test the null hypothesis
To compare means of two groups

Ans. B

The primary purpose of a confidence interval is to provide an estimate of a population parameter along with a range of values within which the parameter is likely to fall.

What does the term “skewness” refer to in statistics?

The measure of how spread out data values are
The measure of symmetry in a data distribution
The measure of central tendency
The measure of variability

Ans. B

Skewness measures the degree of symmetry or asymmetry in a data distribution. Positive skew indicates a tail on the right, and negative skew indicates a tail on the left.

In hypothesis testing, what is the significance level often set at for a two-tailed test at a 95% confidence level?

0.05
0.01
0.10
0.50

Ans. A

For a two-tailed test at a 95% confidence level, the significance level is often set at 0.05, meaning there is a 5% chance of making a Type I error.

Which measure of dispersion is less sensitive to extreme outliers in a data set?

Range
Mean Absolute Deviation (MAD)
Variance
Interquartile Range (IQR)

Ans. D

The Interquartile Range (IQR) is less sensitive to extreme outliers because it is based on the middle 50% of the data and ignores extreme values.

Which of the following statistical tests is used to determine if there is a significant relationship between two categorical variables?

T-test
Chi-squared test
ANOVA
Regression analysis

Ans. B

The Chi-squared test is used to determine if there is a significant relationship between two categorical variables by comparing observed and expected frequencies.

What is the formula for calculating the coefficient of variation (CV) in statistics?

(Standard Deviation / Mean) × 100
(Range / Median) × 100
(Variance / Mode) × 100
(Mean Absolute Deviation / Range) × 100

Ans. A

The coefficient of variation (CV) is calculated by dividing the standard deviation by the mean and multiplying the result by 100.

What does the term “p-value” represent in hypothesis testing?

The probability of making a Type II error
The probability of observing the null hypothesis being true
The probability of obtaining the observed results by chance under the null hypothesis
The power of the test

Ans. C

The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, and it helps assess the strength of evidence against the null hypothesis.

What is the primary purpose of a histogram in data visualization?

To compare two data sets
To show the distribution of categorical data
To display the relationship between two variables
To represent the frequency distribution of a continuous variable

Ans. D

A histogram is used to represent the frequency distribution of a continuous variable, showing how data is distributed across different values or intervals.

In statistics, what does the term “outlier” refer to?

The mean of a dataset
Data points that are significantly different from the others
The median of a dataset
The range of a dataset

Ans. B

Outliers are data points that are significantly different from the majority of the data in a dataset and may skew statistical analysis.

Which type of probability distribution is commonly used to model the number of successes in a fixed number of Bernoulli trials?

Normal distribution
Poisson distribution
Binomial distribution
Exponential distribution

Ans. C

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials (experiments with two possible outcomes).

What is the term for a hypothesis that assumes no effect or relationship in a statistical test?

Alternative hypothesis
Null hypothesis
Two-tailed hypothesis
Significance hypothesis

Ans. B

The null hypothesis (H0) assumes no effect or relationship in a statistical test and is used for hypothesis testing.

What is the formula for calculating the standard error of the mean (SEM) in statistics?

Standard Deviation / Sample Size
Sample Size / Standard Deviation
Range / Mean
Variance / Median

Ans. A

The standard error of the mean (SEM) is calculated by dividing the standard deviation by the square root of the sample size.

Which statistical test is appropriate for comparing the means of three or more groups in a study?

Student’s t-test
Mann-Whitney U test
Analysis of Variance (ANOVA)
Chi-squared test

Ans. C

Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.

What is the primary purpose of a scatter plot in data visualization?

To compare means of two groups
To represent the frequency distribution of data
To display the relationship between two continuous variables
To show the distribution of categorical data

Ans. C

A scatter plot is used to display the relationship between two continuous variables, helping to identify patterns and correlations between them.

What is the formula for calculating the z-score of a data point in a normal distribution?

(Data Value – Mean) / Standard Deviation
(Data Value – Median) / Range
(Data Value – Variance) / Mean
(Data Value – Mode) / Sample Size

Ans. A

The z-score of a data point in a normal distribution is calculated by subtracting the mean from the data value and dividing by the standard deviation.

In a hypothesis test, what does a p-value of 0.02 indicate?

The null hypothesis is likely true
There is strong evidence against the null hypothesis
The significance level is 0.02
The test is inconclusive

Ans. B

A p-value of 0.02 indicates that there is strong evidence against the null hypothesis, and it is likely to be rejected.

What does the term “sampling error” refer to in statistics?

The error made by the statistician during data collection
The difference between the sample statistic and the population parameter
The error introduced during data entry and analysis
The variation within the sample data

Ans. B

Sampling error is the difference between a sample statistic and the population parameter it estimates and is due to random sampling.

Which of the following is a non-parametric statistical test used for comparing two related groups?

Student’s t-test
Chi-squared test
Wilcoxon signed-rank test
Analysis of Variance (ANOVA)

Ans. C

The Wilcoxon signed-rank test is a non-parametric test used for comparing two related groups or paired data when assumptions of parametric tests are not met.

What is the term for the probability distribution that describes the number of successful Bernoulli trials before a specified number of failures is reached?

Normal distribution
Poisson distribution
Exponential distribution
Negative Binomial distribution

Ans. D

The Negative Binomial distribution describes the number of successful Bernoulli trials before a specified number of failures occurs.

What does the term “confidence level” represent in statistics?

The level of significance in hypothesis testing
The likelihood that a sample is representative of the population
The range of values within which a parameter is estimated to fall
The proportion of a population captured by a sample

Ans. C

The confidence level represents the range of values within which a parameter is estimated to fall with a certain level of certainty, typically expressed as a percentage.

What is the formula for calculating the probability of an event using the odds ratio?

(Probability of the event) / (1 – Probability of the event)
(Probability of the event) × (1 – Probability of the event)
(Odds of the event) / (1 – Odds of the event)
(Odds of the event) × (1 – Odds of the event)

Ans. A

The probability of an event can be calculated from the odds ratio using the formula (Probability of the event) / (1 – Probability of the event).

What does the term “correlation” measure in statistics?

The strength of a linear relationship between two variables
The difference between the mean and median of a dataset
The variability within a sample
The spread of data values in a dataset

Ans. A

Correlation measures the strength and direction of a linear relationship between two variables, indicating how one variable changes when the other changes.

What is the term for the measure of how much data values tend to deviate from the mean in a dataset?

Variance
Standard Deviation
Range
Mode

Ans. B

The standard deviation is a measure of how much data values tend to deviate from the mean in a dataset, indicating the spread or dispersion of data.

What does the term “p-value” represent in hypothesis testing?

The probability of making a Type II error
The probability of observing the null hypothesis being true
The probability of obtaining the observed results by chance under the null hypothesis
The power of the test

Ans. C

The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, helping assess the strength of evidence against the null hypothesis.

In statistics, what is the term for a measure of the spread of data values around the median?

Range
Mean Absolute Deviation (MAD)
Variance
Interquartile Range (IQR)

Ans. D

The Interquartile Range (IQR) is a measure of the spread of data values around the median and is less affected by extreme outliers.

What does the term “standard error” represent in statistics?

The standard deviation of a sample
The margin of error in a confidence interval
The mean of a population
The range of values in a data set

Ans. B

The standard error represents the margin of error in a confidence interval, indicating the precision of an estimate based on sample data.

Which of the following is a measure of association used to assess the strength and direction of the relationship between two ordinal variables?

Pearson’s correlation coefficient (r)
Spearman’s rank correlation (rho)
Chi-squared test
ANOVA

Ans. B

Spearman’s rank correlation (rho) is a measure of association used to assess the relationship between two ordinal variables by ranking their values.

In hypothesis testing, what does a Type I error refer to?

Incorrectly rejecting a true null hypothesis
Incorrectly accepting a false null hypothesis
Correctly rejecting a false null hypothesis
Correctly accepting a true null hypothesis

Ans. A

A Type I error occurs when a true null hypothesis is incorrectly rejected, leading to a false positive result in hypothesis testing.

What is the term for the method used to assign a value to missing data points based on other available data in a dataset?

Outlier detection
Data transformation
Imputation
Sampling

Ans. C

Imputation is the method used to assign a value to missing data points based on other available data in a dataset, allowing for analysis with complete data.

What is the term for the range of values that separates the central 50% of data from the extreme values in a dataset?

Interquartile Range (IQR)
Variance
Standard Error
Coefficient of Variation

Ans. A

The Interquartile Range (IQR) is the range of values that separates the central 50% of data from the extreme values in a dataset and is a measure of data spread.

Which statistical test is appropriate for determining if there is a significant difference in means between two independent groups?

Chi-squared test
Two-sample t-test
Mann-Whitney U test
ANOVA

Ans. B

The two-sample t-test is used to determine if there is a significant difference in means between two independent groups or samples.

What does the term “skewness” refer to in statistics?

The measure of symmetry in a data distribution
The measure of central tendency
The measure of variability within a sample
The measure of the spread of data values

Ans. A

Skewness measures the symmetry or asymmetry in a data distribution. Positive skew indicates a right-skewed distribution, while negative skew indicates a left-skewed distribution.

Which statistical distribution is often used to model the time between events occurring at a constant rate?

Normal distribution
Poisson distribution
Exponential distribution
Binomial distribution

Ans. C

The exponential distribution is commonly used to model the time between events occurring at a constant rate or in a Poisson process.

What is the term for the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test?

Confidence interval
Type I error
P-value
Margin of error

Ans. C

The p-value is the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test. A smaller p-value indicates stronger evidence against the null hypothesis.

What is the term for the measure of the central tendency that is most affected by outliers in a dataset?

Mean
Median
Mode
Range

Ans. A

The mean is the measure of central tendency most affected by outliers, as it takes all values into account when calculating the average.

Which statistical test is used to determine if there is a significant relationship between two continuous variables?

Chi-squared test
Pearson’s correlation coefficient
Mann-Whitney U test
T-test

Ans. B

Pearson’s correlation coefficient is used to assess the strength and direction of the linear relationship between two continuous variables.

What is the formula for calculating the probability of an event using odds in statistics?

(Odds of the event) / (Odds against the event)
(Probability of the event) × (Probability against the event)
(Odds of the event) / (Probability of the event)
(Probability of the event) / (1 – Probability of the event)

Ans. A

The probability of an event can be calculated from odds using the formula (Odds of the event) / (Odds against the event).

What is the term for a measure of how data values tend to cluster around a central point in a dataset?

Standard Deviation
Range
Variance
Dispersion

Ans. D

Dispersion is a measure of how data values tend to cluster around a central point, reflecting the degree of spread or concentration in the data.

What does the term “confidence interval” represent in statistics?

A range of values within which a parameter is estimated to fall
The probability of making a Type I error
The strength of a linear relationship between two variables
The margin of error in a hypothesis test

Ans. A

A confidence interval represents a range of values within which a parameter is estimated to fall with a specified level of confidence.

Which statistical distribution is used to model the number of successes in a fixed number of independent Bernoulli trials?

Normal distribution
Poisson distribution
Exponential distribution
Binomial distribution

Ans. D

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.

What does the term “power” represent in hypothesis testing?

The probability of making a Type I error
The probability of observing the null hypothesis being true
The probability of correctly rejecting the null hypothesis
The probability of obtaining extreme data by chance

Ans. C

Power represents the probability of correctly rejecting the null hypothesis when it is false, indicating the test’s ability to detect a true effect.

What is the term for the graphical representation of data that displays the distribution, central tendency, and spread of a dataset?

Scatter plot
Box plot
Bar chart
Frequency table

Ans. B

A box plot (box-and-whisker plot) is a graphical representation that displays the distribution, central tendency, and spread of a dataset.

What is the term for a type of data that can take on only specific values, typically whole numbers, and is often used to represent counts or categories?

Continuous data
Nominal data
Ordinal data
Discrete data

Ans. D

Discrete data is a type of data that can take on only specific, often whole number values, and is used to represent counts or categories.

What is the term for a measure of the degree of uncertainty or variability associated with a statistic?

Confidence interval
Sampling error
Type I error
Standard error

Ans. D

Standard error is a measure of the degree of uncertainty or variability associated with a statistic and indicates how much the sample statistic might vary from the population parameter.

Which statistical test is used to compare the means of three or more groups in a study?

Student’s t-test
Chi-squared test
Mann-Whitney U test
Analysis of Variance (ANOVA)

Ans. D

Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.

What does the term “correlation coefficient” measure in statistics?

The range of data values
The strength and direction of a relationship between two variables
The probability of making a Type I error
The margin of error in a confidence interval

Ans. B

A correlation coefficient measures the strength and direction of a relationship between two variables, indicating how they are related to each other.

What is the term for a measure of how data values are distributed around a central point in a dataset?

Standard deviation
Range
Interquartile range
Mode

Ans. A

Standard deviation is a measure of how data values are distributed around a central point, representing the degree of spread or dispersion in the data.

What is the term for a measure of how much data values tend to deviate from the mean in a dataset?

Skewness
Variance
Standard error
Mean absolute deviation (MAD)

Ans. B

Variance is a measure of how much data values tend to deviate from the mean in a dataset, indicating the degree of variability.

In hypothesis testing, what does the term “alpha level” represent?

The probability of making a Type I error
The significance level or threshold for rejecting the null hypothesis
The p-value
The probability of correctly accepting the null hypothesis

Ans. B

The alpha level represents the significance level or threshold for rejecting the null hypothesis in hypothesis testing.

Which statistical distribution is used to model the time between events occurring at a constant rate?

Normal distribution
Poisson distribution
Exponential distribution
Binomial distribution

Ans. C

The exponential distribution is used to model the time between events occurring at a constant rate or in a Poisson process.

What is the term for a method used to reduce the dimensionality of data while retaining as much information as possible?

Outlier detection
Feature selection
Imputation
Hypothesis testing

Ans. B

Feature selection is a method used to reduce the dimensionality of data while retaining as much relevant information as possible for analysis.

What does the term “degrees of freedom” refer to in statistics?

The number of data points in a dataset
The sample size
The number of groups in an ANOVA test
The number of values that are free to vary in a statistical calculation

Ans. D

Degrees of freedom refer to the number of values that are free to vary in a statistical calculation and play a role in various statistical tests.

What does the term “null hypothesis” represent in hypothesis testing?

The hypothesis that is proven to be true
The alternative hypothesis
The hypothesis to be rejected if evidence suggests otherwise
The initial assumption to be tested

Ans. D

The null hypothesis is the initial assumption to be tested in hypothesis testing, typically representing no effect or no difference.

Which statistical test is used to determine if there is a significant difference between the observed and expected frequencies in a contingency table?

Student’s t-test
Pearson’s chi-squared test
Wilcoxon signed-rank test
Analysis of Variance (ANOVA)

Ans. B

Pearson’s chi-squared test is used to assess the significant difference between observed and expected frequencies in a contingency table.

What is the term for a type of sampling method in which the population is divided into non-overlapping subgroups or strata, and a random sample is then taken from each stratum?

Simple random sampling
Cluster sampling
Stratified sampling
Convenience sampling

Ans. C

Stratified sampling is a method in which the population is divided into non-overlapping strata, and a random sample is taken from each stratum, ensuring representation from all groups.

What is the term for a measure of the central tendency that is often used with nominal data and represents the most frequently occurring value?

Mean
Median
Mode
Range

Ans. C

The mode is a measure of central tendency used with nominal data and represents the most frequently occurring value in a dataset.

What does the term “confidence level” represent in a confidence interval?

The level of significance in hypothesis testing
The likelihood that a sample is representative of the population
The range of values within which a parameter is estimated to fall
The proportion of a population captured by a sample

Ans. C

The confidence level in a confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence.

What is the term for the measure of how data values are spread out in a dataset?

Central tendency
Variance
Standard error
Mode

Ans. B

Variance is the measure of how data values are spread out or dispersed in a dataset, indicating the degree of variability.

What is the term for a statistical measure that describes the direction and strength of a relationship between two variables?

Probability
Causation
Correlation
Variance

Ans. C

Correlation is a statistical measure that describes the direction and strength of a relationship between two variables, indicating how they are related.

What is the term for a statistical measure that describes the symmetry of a probability distribution?

Skewness
Kurtosis
Central tendency
Spread

Ans. A

Skewness is a statistical measure that describes the symmetry or asymmetry of a probability distribution, indicating whether it is skewed to the left or right.

What does the term “sampling frame” refer to in sampling methods?

The process of selecting a sample from a population
The list of all elements in the population
The margin of error in a confidence interval
The probability of making a Type I error

Ans. B

A sampling frame is the list of all elements in the population from which a sample is drawn, serving as the basis for selecting a sample.

What is the term for a measure of how much data values are dispersed around the mean in a dataset?

Central tendency
Standard error
Variance
Mode

Ans. C

Variance is a measure of how much data values are dispersed around the mean in a dataset, indicating the degree of spread.

Which statistical test is used to determine if there is a significant difference between the means of two paired groups or conditions?

Student’s t-test
Chi-squared test
Analysis of Variance (ANOVA)
Wilcoxon signed-rank test

Ans. D

The Wilcoxon signed-rank test is used to determine if there is a significant difference between the means of two paired groups or conditions when the data is not normally distributed.

What is the term for a statistical measure that describes the degree to which data values are concentrated around the mean?

Skewness
Variance
Kurtosis
Dispersion

Ans. C

Kurtosis is a statistical measure that describes the degree to which data values are concentrated around the mean, indicating the shape of the distribution.

What is the term for a measure of the spread of data values around the median in a dataset?

Variance
Interquartile range
Mode
Range

Ans. B

The interquartile range is a measure of the spread of data values around the median in a dataset, representing the central 50% of the data.

What is the term for a statistical measure that describes the strength and direction of a non-linear relationship between two variables?

Pearson’s correlation coefficient
Spearman’s rank correlation
Chi-squared test
ANOVA

Ans. B

Spearman’s rank correlation is a statistical measure that describes the strength and direction of a non-linear relationship between two variables by ranking their values.

What does the term “p-value” represent in hypothesis testing?

The probability of making a Type I error
The likelihood of observing the data or more extreme data, assuming the null hypothesis is true
The strength of a linear relationship between two variables
The margin of error in a confidence interval

Ans. B

The p-value represents the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in hypothesis testing. A smaller p-value indicates stronger evidence against the null hypothesis.

What is the term for a measure of the reliability of a statistical test in detecting a true effect?

Type I error
Type II error
Power
Significance level

Ans. C

Power is a measure of the reliability of a statistical test in detecting a true effect, indicating the test’s ability to avoid a Type II error (false negative).

Which statistical distribution is used to model the number of successes in a fixed number of independent Bernoulli trials?

Normal distribution
Poisson distribution
Exponential distribution
Binomial distribution

Ans. D

The binomial distribution is used to model the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.

What is the term for a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model?

R-squared (R^2)
Coefficient of determination
Pearson’s correlation coefficient
Standard error of the estimate

Ans. A

R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model.

What is the term for the measure of the center of a probability distribution in statistics?

Dispersion
Range
Skewness
Central tendency

Ans. D

Central tendency is the measure of the center of a probability distribution, indicating the typical or central value of a dataset.

Which statistical test is used to compare the means of two independent groups or conditions?

Analysis of Variance (ANOVA)
Chi-squared test
T-test
Mann-Whitney U test

Ans. C

The T-test is used to compare the means of two independent groups or conditions in a statistical analysis.

What is the term for a measure of the extent to which two variables change together in a linear relationship?

Regression
Correlation
Variance
Kurtosis

Ans. B

Correlation is a measure of the extent to which two variables change together in a linear relationship, indicating the strength and direction of the relationship.

What is the term for a measure of how data values tend to cluster around the median in a dataset?

Standard error
Interquartile range
Mode
Skewness

Ans. B

The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data.

What is the term for a type of sampling method in which elements are randomly selected from a population, and every element has an equal chance of being selected?

Simple random sampling
Cluster sampling
Stratified sampling
Convenience sampling

Ans. A

Simple random sampling is a method in which elements are randomly selected from a population, and every element has an equal chance of being selected, ensuring unbiased representation.

What is the term for a statistical measure that describes the shape of a probability distribution?

Skewness
Kurtosis
Central tendency
Spread

Ans. B

Kurtosis is a statistical measure that describes the shape of a probability distribution, indicating whether it is peaked or flat compared to a normal distribution.

What is the term for the proportion of the total area under a normal distribution curve between two specific values?

Confidence level
Z-score
Area under the curve
Percentile

Ans. C

The term “Area under the curve” represents the proportion of the total area under a normal distribution curve between two specific values, indicating the probability of observing data within that range.

What does the term “Type I error” represent in hypothesis testing?

Rejecting the null hypothesis when it is true
Failing to reject the null hypothesis when it is false
The power of the test
The probability of making a Type II error

Ans. A

Type I error represents the error of rejecting the null hypothesis when it is true, leading to a false positive conclusion in hypothesis testing.

What is the term for the measure of the strength of a relationship between two variables that varies from -1 to 1, with 0 indicating no linear relationship?

Standard deviation
Interquartile range
Pearson’s correlation coefficient
Mode

Ans. C

Pearson’s correlation coefficient is a measure of the strength of a linear relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship.

What does the term “confidence interval” represent in statistics?

The range of data values in a dataset
The likelihood of making a Type I error in hypothesis testing
The range of values within which a parameter is estimated to fall with a specified level of confidence
The margin of error in a confidence interval

Ans. C

A confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence, typically denoted by a confidence level.

Which statistical test is used to determine if there is a significant association between two categorical variables?

Student’s t-test
Chi-squared test
Wilcoxon signed-rank test
Analysis of Variance (ANOVA)

Ans. B

The Chi-squared test is used to determine if there is a significant association between two categorical variables, testing the independence of variables in a contingency table.

What is the term for a measure of the average distance of data values from the mean in a dataset?

Interquartile range
Mode
Standard deviation
Range

Ans. C

Standard deviation is a measure of the average distance of data values from the mean in a dataset, indicating the degree of dispersion or variability.

What is the term for the measure of how data values tend to cluster around the median in a dataset?

Range
Skewness
Interquartile range
Mode

Ans. C

The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data distribution.

What is the term for a statistical measure that quantifies the strength and direction of a relationship between two variables in a linear model?

Chi-squared statistic
P-value
Pearson’s correlation coefficient
Standard error

Ans. C

Pearson’s correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between two variables, providing a value between -1 and 1.

What does the term “p-value” represent in hypothesis testing?

The probability of making a Type I error
The margin of error in a confidence interval
The significance level or threshold for rejecting the null hypothesis
The strength of a linear relationship between two variables

Ans. A

The p-value represents the probability of making a Type I error, which is the error of incorrectly rejecting the null hypothesis when it is true.

What is the term for a statistical measure that describes the proportion of true positive results out of all actual positive cases in a classification problem?

Sensitivity
Specificity
Precision
Accuracy

Ans. A

Sensitivity is a measure that describes the proportion of true positive results out of all actual positive cases in a classification problem, indicating the model’s ability to detect positives correctly.

What is the term for a method used to impute missing values in a dataset by replacing them with estimated values based on other data points?

Outlier detection
Feature selection
Hypothesis testing
Imputation

Ans. D

Imputation is a method used to replace missing values in a dataset with estimated values based on other data points, ensuring completeness for analysis.

What is the term for a measure of the proportion of the total area under a probability distribution curve to the left of a specific value?

Z-score
Confidence level
Percentile
P-value

Ans. C

The term “Percentile” represents the measure of the proportion of the total area under a probability distribution curve to the left of a specific value, indicating the position of a value within a distribution.

What is the term for the probability of correctly rejecting the null hypothesis in hypothesis testing?

Type I error
Type II error
Power
Confidence level

Ans. C

Power is the probability of correctly rejecting the null hypothesis in hypothesis testing, indicating the test’s ability to detect a true effect.

What is the term for a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values?

Variance
Range
Standard error
Interquartile range

Ans. B

The range is a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values.

Which statistical distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence?

Normal distribution
Poisson distribution
Exponential distribution
Binomial distribution

Ans. B

The Poisson distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence, such as the number of emails received per hour.

What is the term for the measure of the average value of a set of data points?

Median
Mode
Mean
Interquartile range

Ans. C

The mean is the measure of the average value of a set of data points, calculated by summing all values and dividing by the number of data points.

What does the term “significance level” represent in hypothesis testing?

The probability of making a Type I error
The likelihood that a sample is representative of the population
The range of values within which a parameter is estimated to fall
The proportion of a population captured by a sample

Ans. A

The significance level in hypothesis testing represents the probability of making a Type I error, typically denoted by alpha (α).

What is the term for a method used to estimate population parameters based on sample data, taking into account sampling variability?

Hypothesis testing
Confidence interval
Regression analysis
Bootstrapping

Ans. B

A confidence interval is a method used to estimate population parameters based on sample data, taking into account sampling variability and providing a range of possible values for the parameter.

What is the term for a statistical measure that describes the proportion of true negative results out of all actual negative cases in a classification problem?

Sensitivity
Specificity
Precision
Accuracy

Ans. B

Specificity is a measure that describes the proportion of true negative results out of all actual negative cases in a classification problem, indicating the model’s ability to correctly identify negatives.

What is the term for a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model?

R-squared (R^2)
Coefficient of determination
Pearson’s correlation coefficient
Standard error of the estimate

Ans. A

R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model, also known as the coefficient of determination.

Statistics MCQs & Answers

Related Articles

Leave a Comment Cancel Reply