# [100+] Statistics MCQs & Answers {PDF} For NTS, NET, SSB, SSC, IBPS, RRB, Govt. Exams

Which of the following is a measure of central tendency?
1. Standard Deviation
2. Range
3. Median
4. Variance

Ans. C

The median is a measure of central tendency that represents the middle value of a dataset when arranged in ascending order.

What is the formula for calculating the variance of a data set?
1. Sum of data values / Number of data points
2. Standard Deviation / Mean
3. (Sum of squared differences from the mean) / Number of data points
4. Median / Range

Ans. C

The variance is calculated by taking the sum of the squared differences of each data point from the mean and then dividing it by the number of data points.

What does the p-value represent in hypothesis testing?
1. The power of the test
2. The significance level of the test
3. The probability of observing the data or more extreme data under the null hypothesis
4. The confidence interval of the test

Ans. C

The p-value represents the probability of observing the data or more extreme data under the null hypothesis, indicating the strength of evidence against the null hypothesis.

What is the purpose of a box plot in data visualization?
1. To show the distribution of data and identify outliers
2. To display the frequency distribution of data
3. To represent the relationship between two variables
4. To compare means of different data sets

Ans. A

A box plot is used to visualize the distribution of data, including the identification of outliers and the spread of the data.

What is the formula for calculating the correlation coefficient (Pearson’s r) between two variables X and Y?
1. (Sum of products of deviations from means) / (Product of standard deviations)
2. (Sum of squared deviations from means) / (Product of means)
3. (Sum of squared deviations from means) / (Product of standard deviations)
4. (Sum of products of deviations from means) / (Sum of squared deviations from means)

Ans. A

The formula for Pearson’s correlation coefficient (r) involves the sum of products of deviations from means divided by the product of standard deviations of the two variables.

What is the term for the measure of how spread out data values are in a data set?
1. Mean
2. Median
3. Range
4. Mode

Ans. C

The range measures the spread or variability of data in a data set, calculated as the difference between the maximum and minimum values.

Which statistical distribution is often used to model the number of events occurring within a fixed interval of time or space?
1. Normal distribution
2. Poisson distribution
3. Binomial distribution
4. Exponential distribution

Ans. B

The Poisson distribution is used to model the number of events occurring within a fixed interval when events are rare and independent.

In statistics, what does the acronym “ANOVA” stand for?
1. Analysis of Variability
2. Analysis of Varying Outcomes
3. Analysis of Variance
4. Association of Variables

Ans. C

ANOVA stands for “Analysis of Variance,” a statistical technique used to analyze the variance between groups in a dataset.

What is the primary purpose of a confidence interval in statistics?
1. To determine the sample size required for an experiment
2. To estimate a population parameter with a range of values
3. To test the null hypothesis
4. To compare means of two groups

Ans. B

The primary purpose of a confidence interval is to provide an estimate of a population parameter along with a range of values within which the parameter is likely to fall.

What does the term “skewness” refer to in statistics?
1. The measure of how spread out data values are
2. The measure of symmetry in a data distribution
3. The measure of central tendency
4. The measure of variability

Ans. B

Skewness measures the degree of symmetry or asymmetry in a data distribution. Positive skew indicates a tail on the right, and negative skew indicates a tail on the left.

In hypothesis testing, what is the significance level often set at for a two-tailed test at a 95% confidence level?
1. 0.05
2. 0.01
3. 0.10
4. 0.50

Ans. A

For a two-tailed test at a 95% confidence level, the significance level is often set at 0.05, meaning there is a 5% chance of making a Type I error.

Which measure of dispersion is less sensitive to extreme outliers in a data set?
1. Range
3. Variance
4. Interquartile Range (IQR)

Ans. D

The Interquartile Range (IQR) is less sensitive to extreme outliers because it is based on the middle 50% of the data and ignores extreme values.

Which of the following statistical tests is used to determine if there is a significant relationship between two categorical variables?
1. T-test
2. Chi-squared test
3. ANOVA
4. Regression analysis

Ans. B

The Chi-squared test is used to determine if there is a significant relationship between two categorical variables by comparing observed and expected frequencies.

What is the formula for calculating the coefficient of variation (CV) in statistics?
1. (Standard Deviation / Mean) × 100
2. (Range / Median) × 100
3. (Variance / Mode) × 100
4. (Mean Absolute Deviation / Range) × 100

Ans. A

The coefficient of variation (CV) is calculated by dividing the standard deviation by the mean and multiplying the result by 100.

What does the term “p-value” represent in hypothesis testing?
1. The probability of making a Type II error
2. The probability of observing the null hypothesis being true
3. The probability of obtaining the observed results by chance under the null hypothesis
4. The power of the test

Ans. C

The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, and it helps assess the strength of evidence against the null hypothesis.

What is the primary purpose of a histogram in data visualization?
1. To compare two data sets
2. To show the distribution of categorical data
3. To display the relationship between two variables
4. To represent the frequency distribution of a continuous variable

Ans. D

A histogram is used to represent the frequency distribution of a continuous variable, showing how data is distributed across different values or intervals.

In statistics, what does the term “outlier” refer to?
1. The mean of a dataset
2. Data points that are significantly different from the others
3. The median of a dataset
4. The range of a dataset

Ans. B

Outliers are data points that are significantly different from the majority of the data in a dataset and may skew statistical analysis.

Which type of probability distribution is commonly used to model the number of successes in a fixed number of Bernoulli trials?
1. Normal distribution
2. Poisson distribution
3. Binomial distribution
4. Exponential distribution

Ans. C

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials (experiments with two possible outcomes).

What is the term for a hypothesis that assumes no effect or relationship in a statistical test?
1. Alternative hypothesis
2. Null hypothesis
3. Two-tailed hypothesis
4. Significance hypothesis

Ans. B

The null hypothesis (H0) assumes no effect or relationship in a statistical test and is used for hypothesis testing.

What is the formula for calculating the standard error of the mean (SEM) in statistics?
1. Standard Deviation / Sample Size
2. Sample Size / Standard Deviation
3. Range / Mean
4. Variance / Median

Ans. A

The standard error of the mean (SEM) is calculated by dividing the standard deviation by the square root of the sample size.

Which statistical test is appropriate for comparing the means of three or more groups in a study?
1. Student’s t-test
2. Mann-Whitney U test
3. Analysis of Variance (ANOVA)
4. Chi-squared test

Ans. C

Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.

What is the primary purpose of a scatter plot in data visualization?
1. To compare means of two groups
2. To represent the frequency distribution of data
3. To display the relationship between two continuous variables
4. To show the distribution of categorical data

Ans. C

A scatter plot is used to display the relationship between two continuous variables, helping to identify patterns and correlations between them.

What is the formula for calculating the z-score of a data point in a normal distribution?
1. (Data Value – Mean) / Standard Deviation
2. (Data Value – Median) / Range
3. (Data Value – Variance) / Mean
4. (Data Value – Mode) / Sample Size

Ans. A

The z-score of a data point in a normal distribution is calculated by subtracting the mean from the data value and dividing by the standard deviation.

In a hypothesis test, what does a p-value of 0.02 indicate?
1. The null hypothesis is likely true
2. There is strong evidence against the null hypothesis
3. The significance level is 0.02
4. The test is inconclusive

Ans. B

A p-value of 0.02 indicates that there is strong evidence against the null hypothesis, and it is likely to be rejected.

What does the term “sampling error” refer to in statistics?
1. The error made by the statistician during data collection
2. The difference between the sample statistic and the population parameter
3. The error introduced during data entry and analysis
4. The variation within the sample data

Ans. B

Sampling error is the difference between a sample statistic and the population parameter it estimates and is due to random sampling.

Which of the following is a non-parametric statistical test used for comparing two related groups?
1. Student’s t-test
2. Chi-squared test
3. Wilcoxon signed-rank test
4. Analysis of Variance (ANOVA)

Ans. C

The Wilcoxon signed-rank test is a non-parametric test used for comparing two related groups or paired data when assumptions of parametric tests are not met.

What is the term for the probability distribution that describes the number of successful Bernoulli trials before a specified number of failures is reached?
1. Normal distribution
2. Poisson distribution
3. Exponential distribution
4. Negative Binomial distribution

Ans. D

The Negative Binomial distribution describes the number of successful Bernoulli trials before a specified number of failures occurs.

What does the term “confidence level” represent in statistics?
1. The level of significance in hypothesis testing
2. The likelihood that a sample is representative of the population
3. The range of values within which a parameter is estimated to fall
4. The proportion of a population captured by a sample

Ans. C

The confidence level represents the range of values within which a parameter is estimated to fall with a certain level of certainty, typically expressed as a percentage.

What is the formula for calculating the probability of an event using the odds ratio?
1. (Probability of the event) / (1 – Probability of the event)
2. (Probability of the event) × (1 – Probability of the event)
3. (Odds of the event) / (1 – Odds of the event)
4. (Odds of the event) × (1 – Odds of the event)

Ans. A

The probability of an event can be calculated from the odds ratio using the formula (Probability of the event) / (1 – Probability of the event).

What does the term “correlation” measure in statistics?
1. The strength of a linear relationship between two variables
2. The difference between the mean and median of a dataset
3. The variability within a sample
4. The spread of data values in a dataset

Ans. A

Correlation measures the strength and direction of a linear relationship between two variables, indicating how one variable changes when the other changes.

What is the term for the measure of how much data values tend to deviate from the mean in a dataset?
1. Variance
2. Standard Deviation
3. Range
4. Mode

Ans. B

The standard deviation is a measure of how much data values tend to deviate from the mean in a dataset, indicating the spread or dispersion of data.

What does the term “p-value” represent in hypothesis testing?
1. The probability of making a Type II error
2. The probability of observing the null hypothesis being true
3. The probability of obtaining the observed results by chance under the null hypothesis
4. The power of the test

Ans. C

The p-value represents the probability of obtaining the observed results by chance under the null hypothesis, helping assess the strength of evidence against the null hypothesis.

In statistics, what is the term for a measure of the spread of data values around the median?
1. Range
3. Variance
4. Interquartile Range (IQR)

Ans. D

The Interquartile Range (IQR) is a measure of the spread of data values around the median and is less affected by extreme outliers.

What does the term “standard error” represent in statistics?
1. The standard deviation of a sample
2. The margin of error in a confidence interval
3. The mean of a population
4. The range of values in a data set

Ans. B

The standard error represents the margin of error in a confidence interval, indicating the precision of an estimate based on sample data.

Which of the following is a measure of association used to assess the strength and direction of the relationship between two ordinal variables?
1. Pearson’s correlation coefficient (r)
2. Spearman’s rank correlation (rho)
3. Chi-squared test
4. ANOVA

Ans. B

Spearman’s rank correlation (rho) is a measure of association used to assess the relationship between two ordinal variables by ranking their values.

In hypothesis testing, what does a Type I error refer to?
1. Incorrectly rejecting a true null hypothesis
2. Incorrectly accepting a false null hypothesis
3. Correctly rejecting a false null hypothesis
4. Correctly accepting a true null hypothesis

Ans. A

A Type I error occurs when a true null hypothesis is incorrectly rejected, leading to a false positive result in hypothesis testing.

What is the term for the method used to assign a value to missing data points based on other available data in a dataset?
1. Outlier detection
2. Data transformation
3. Imputation
4. Sampling

Ans. C

Imputation is the method used to assign a value to missing data points based on other available data in a dataset, allowing for analysis with complete data.

What is the term for the range of values that separates the central 50% of data from the extreme values in a dataset?
1. Interquartile Range (IQR)
2. Variance
3. Standard Error
4. Coefficient of Variation

Ans. A

The Interquartile Range (IQR) is the range of values that separates the central 50% of data from the extreme values in a dataset and is a measure of data spread.

Which statistical test is appropriate for determining if there is a significant difference in means between two independent groups?
1. Chi-squared test
2. Two-sample t-test
3. Mann-Whitney U test
4. ANOVA

Ans. B

The two-sample t-test is used to determine if there is a significant difference in means between two independent groups or samples.

What does the term “skewness” refer to in statistics?
1. The measure of symmetry in a data distribution
2. The measure of central tendency
3. The measure of variability within a sample
4. The measure of the spread of data values

Ans. A

Skewness measures the symmetry or asymmetry in a data distribution. Positive skew indicates a right-skewed distribution, while negative skew indicates a left-skewed distribution.

Which statistical distribution is often used to model the time between events occurring at a constant rate?
1. Normal distribution
2. Poisson distribution
3. Exponential distribution
4. Binomial distribution

Ans. C

The exponential distribution is commonly used to model the time between events occurring at a constant rate or in a Poisson process.

What is the term for the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test?
1. Confidence interval
2. Type I error
3. P-value
4. Margin of error

Ans. C

The p-value is the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in a hypothesis test. A smaller p-value indicates stronger evidence against the null hypothesis.

What is the term for the measure of the central tendency that is most affected by outliers in a dataset?
1. Mean
2. Median
3. Mode
4. Range

Ans. A

The mean is the measure of central tendency most affected by outliers, as it takes all values into account when calculating the average.

Which statistical test is used to determine if there is a significant relationship between two continuous variables?
1. Chi-squared test
2. Pearson’s correlation coefficient
3. Mann-Whitney U test
4. T-test

Ans. B

Pearson’s correlation coefficient is used to assess the strength and direction of the linear relationship between two continuous variables.

What is the formula for calculating the probability of an event using odds in statistics?
1. (Odds of the event) / (Odds against the event)
2. (Probability of the event) × (Probability against the event)
3. (Odds of the event) / (Probability of the event)
4. (Probability of the event) / (1 – Probability of the event)

Ans. A

The probability of an event can be calculated from odds using the formula (Odds of the event) / (Odds against the event).

What is the term for a measure of how data values tend to cluster around a central point in a dataset?
1. Standard Deviation
2. Range
3. Variance
4. Dispersion

Ans. D

Dispersion is a measure of how data values tend to cluster around a central point, reflecting the degree of spread or concentration in the data.

What does the term “confidence interval” represent in statistics?
1. A range of values within which a parameter is estimated to fall
2. The probability of making a Type I error
3. The strength of a linear relationship between two variables
4. The margin of error in a hypothesis test

Ans. A

A confidence interval represents a range of values within which a parameter is estimated to fall with a specified level of confidence.

Which statistical distribution is used to model the number of successes in a fixed number of independent Bernoulli trials?
1. Normal distribution
2. Poisson distribution
3. Exponential distribution
4. Binomial distribution

Ans. D

The binomial distribution models the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.

What does the term “power” represent in hypothesis testing?
1. The probability of making a Type I error
2. The probability of observing the null hypothesis being true
3. The probability of correctly rejecting the null hypothesis
4. The probability of obtaining extreme data by chance

Ans. C

Power represents the probability of correctly rejecting the null hypothesis when it is false, indicating the test’s ability to detect a true effect.

What is the term for the graphical representation of data that displays the distribution, central tendency, and spread of a dataset?
1. Scatter plot
2. Box plot
3. Bar chart
4. Frequency table

Ans. B

A box plot (box-and-whisker plot) is a graphical representation that displays the distribution, central tendency, and spread of a dataset.

What is the term for a type of data that can take on only specific values, typically whole numbers, and is often used to represent counts or categories?
1. Continuous data
2. Nominal data
3. Ordinal data
4. Discrete data

Ans. D

Discrete data is a type of data that can take on only specific, often whole number values, and is used to represent counts or categories.

What is the term for a measure of the degree of uncertainty or variability associated with a statistic?
1. Confidence interval
2. Sampling error
3. Type I error
4. Standard error

Ans. D

Standard error is a measure of the degree of uncertainty or variability associated with a statistic and indicates how much the sample statistic might vary from the population parameter.

Which statistical test is used to compare the means of three or more groups in a study?
1. Student’s t-test
2. Chi-squared test
3. Mann-Whitney U test
4. Analysis of Variance (ANOVA)

Ans. D

Analysis of Variance (ANOVA) is used to compare the means of three or more groups in a study to determine if there are statistically significant differences among them.

What does the term “correlation coefficient” measure in statistics?
1. The range of data values
2. The strength and direction of a relationship between two variables
3. The probability of making a Type I error
4. The margin of error in a confidence interval

Ans. B

A correlation coefficient measures the strength and direction of a relationship between two variables, indicating how they are related to each other.

What is the term for a measure of how data values are distributed around a central point in a dataset?
1. Standard deviation
2. Range
3. Interquartile range
4. Mode

Ans. A

Standard deviation is a measure of how data values are distributed around a central point, representing the degree of spread or dispersion in the data.

What is the term for a measure of how much data values tend to deviate from the mean in a dataset?
1. Skewness
2. Variance
3. Standard error

Ans. B

Variance is a measure of how much data values tend to deviate from the mean in a dataset, indicating the degree of variability.

In hypothesis testing, what does the term “alpha level” represent?
1. The probability of making a Type I error
2. The significance level or threshold for rejecting the null hypothesis
3. The p-value
4. The probability of correctly accepting the null hypothesis

Ans. B

The alpha level represents the significance level or threshold for rejecting the null hypothesis in hypothesis testing.

Which statistical distribution is used to model the time between events occurring at a constant rate?
1. Normal distribution
2. Poisson distribution
3. Exponential distribution
4. Binomial distribution

Ans. C

The exponential distribution is used to model the time between events occurring at a constant rate or in a Poisson process.

What is the term for a method used to reduce the dimensionality of data while retaining as much information as possible?
1. Outlier detection
2. Feature selection
3. Imputation
4. Hypothesis testing

Ans. B

Feature selection is a method used to reduce the dimensionality of data while retaining as much relevant information as possible for analysis.

What does the term “degrees of freedom” refer to in statistics?
1. The number of data points in a dataset
2. The sample size
3. The number of groups in an ANOVA test
4. The number of values that are free to vary in a statistical calculation

Ans. D

Degrees of freedom refer to the number of values that are free to vary in a statistical calculation and play a role in various statistical tests.

What does the term “null hypothesis” represent in hypothesis testing?
1. The hypothesis that is proven to be true
2. The alternative hypothesis
3. The hypothesis to be rejected if evidence suggests otherwise
4. The initial assumption to be tested

Ans. D

The null hypothesis is the initial assumption to be tested in hypothesis testing, typically representing no effect or no difference.

Which statistical test is used to determine if there is a significant difference between the observed and expected frequencies in a contingency table?
1. Student’s t-test
2. Pearson’s chi-squared test
3. Wilcoxon signed-rank test
4. Analysis of Variance (ANOVA)

Ans. B

Pearson’s chi-squared test is used to assess the significant difference between observed and expected frequencies in a contingency table.

What is the term for a type of sampling method in which the population is divided into non-overlapping subgroups or strata, and a random sample is then taken from each stratum?
1. Simple random sampling
2. Cluster sampling
3. Stratified sampling
4. Convenience sampling

Ans. C

Stratified sampling is a method in which the population is divided into non-overlapping strata, and a random sample is taken from each stratum, ensuring representation from all groups.

What is the term for a measure of the central tendency that is often used with nominal data and represents the most frequently occurring value?
1. Mean
2. Median
3. Mode
4. Range

Ans. C

The mode is a measure of central tendency used with nominal data and represents the most frequently occurring value in a dataset.

What does the term “confidence level” represent in a confidence interval?
1. The level of significance in hypothesis testing
2. The likelihood that a sample is representative of the population
3. The range of values within which a parameter is estimated to fall
4. The proportion of a population captured by a sample

Ans. C

The confidence level in a confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence.

What is the term for the measure of how data values are spread out in a dataset?
1. Central tendency
2. Variance
3. Standard error
4. Mode

Ans. B

Variance is the measure of how data values are spread out or dispersed in a dataset, indicating the degree of variability.

What is the term for a statistical measure that describes the direction and strength of a relationship between two variables?
1. Probability
2. Causation
3. Correlation
4. Variance

Ans. C

Correlation is a statistical measure that describes the direction and strength of a relationship between two variables, indicating how they are related.

What is the term for a statistical measure that describes the symmetry of a probability distribution?
1. Skewness
2. Kurtosis
3. Central tendency

Ans. A

Skewness is a statistical measure that describes the symmetry or asymmetry of a probability distribution, indicating whether it is skewed to the left or right.

What does the term “sampling frame” refer to in sampling methods?
1. The process of selecting a sample from a population
2. The list of all elements in the population
3. The margin of error in a confidence interval
4. The probability of making a Type I error

Ans. B

A sampling frame is the list of all elements in the population from which a sample is drawn, serving as the basis for selecting a sample.

What is the term for a measure of how much data values are dispersed around the mean in a dataset?
1. Central tendency
2. Standard error
3. Variance
4. Mode

Ans. C

Variance is a measure of how much data values are dispersed around the mean in a dataset, indicating the degree of spread.

Which statistical test is used to determine if there is a significant difference between the means of two paired groups or conditions?
1. Student’s t-test
2. Chi-squared test
3. Analysis of Variance (ANOVA)
4. Wilcoxon signed-rank test

Ans. D

The Wilcoxon signed-rank test is used to determine if there is a significant difference between the means of two paired groups or conditions when the data is not normally distributed.

What is the term for a statistical measure that describes the degree to which data values are concentrated around the mean?
1. Skewness
2. Variance
3. Kurtosis
4. Dispersion

Ans. C

Kurtosis is a statistical measure that describes the degree to which data values are concentrated around the mean, indicating the shape of the distribution.

What is the term for a measure of the spread of data values around the median in a dataset?
1. Variance
2. Interquartile range
3. Mode
4. Range

Ans. B

The interquartile range is a measure of the spread of data values around the median in a dataset, representing the central 50% of the data.

What is the term for a statistical measure that describes the strength and direction of a non-linear relationship between two variables?
1. Pearson’s correlation coefficient
2. Spearman’s rank correlation
3. Chi-squared test
4. ANOVA

Ans. B

Spearman’s rank correlation is a statistical measure that describes the strength and direction of a non-linear relationship between two variables by ranking their values.

What does the term “p-value” represent in hypothesis testing?
1. The probability of making a Type I error
2. The likelihood of observing the data or more extreme data, assuming the null hypothesis is true
3. The strength of a linear relationship between two variables
4. The margin of error in a confidence interval

Ans. B

The p-value represents the likelihood of observing the data or more extreme data, assuming the null hypothesis is true, in hypothesis testing. A smaller p-value indicates stronger evidence against the null hypothesis.

What is the term for a measure of the reliability of a statistical test in detecting a true effect?
1. Type I error
2. Type II error
3. Power
4. Significance level

Ans. C

Power is a measure of the reliability of a statistical test in detecting a true effect, indicating the test’s ability to avoid a Type II error (false negative).

Which statistical distribution is used to model the number of successes in a fixed number of independent Bernoulli trials?
1. Normal distribution
2. Poisson distribution
3. Exponential distribution
4. Binomial distribution

Ans. D

The binomial distribution is used to model the number of successes in a fixed number of independent Bernoulli trials, which are experiments with two possible outcomes.

What is the term for a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model?
1. R-squared (R^2)
2. Coefficient of determination
3. Pearson’s correlation coefficient
4. Standard error of the estimate

Ans. A

R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model.

What is the term for the measure of the center of a probability distribution in statistics?
1. Dispersion
2. Range
3. Skewness
4. Central tendency

Ans. D

Central tendency is the measure of the center of a probability distribution, indicating the typical or central value of a dataset.

Which statistical test is used to compare the means of two independent groups or conditions?
1. Analysis of Variance (ANOVA)
2. Chi-squared test
3. T-test
4. Mann-Whitney U test

Ans. C

The T-test is used to compare the means of two independent groups or conditions in a statistical analysis.

What is the term for a measure of the extent to which two variables change together in a linear relationship?
1. Regression
2. Correlation
3. Variance
4. Kurtosis

Ans. B

Correlation is a measure of the extent to which two variables change together in a linear relationship, indicating the strength and direction of the relationship.

What is the term for a measure of how data values tend to cluster around the median in a dataset?
1. Standard error
2. Interquartile range
3. Mode
4. Skewness

Ans. B

The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data.

What is the term for a type of sampling method in which elements are randomly selected from a population, and every element has an equal chance of being selected?
1. Simple random sampling
2. Cluster sampling
3. Stratified sampling
4. Convenience sampling

Ans. A

Simple random sampling is a method in which elements are randomly selected from a population, and every element has an equal chance of being selected, ensuring unbiased representation.

What is the term for a statistical measure that describes the shape of a probability distribution?
1. Skewness
2. Kurtosis
3. Central tendency

Ans. B

Kurtosis is a statistical measure that describes the shape of a probability distribution, indicating whether it is peaked or flat compared to a normal distribution.

What is the term for the proportion of the total area under a normal distribution curve between two specific values?
1. Confidence level
2. Z-score
3. Area under the curve
4. Percentile

Ans. C

The term “Area under the curve” represents the proportion of the total area under a normal distribution curve between two specific values, indicating the probability of observing data within that range.

What does the term “Type I error” represent in hypothesis testing?
1. Rejecting the null hypothesis when it is true
2. Failing to reject the null hypothesis when it is false
3. The power of the test
4. The probability of making a Type II error

Ans. A

Type I error represents the error of rejecting the null hypothesis when it is true, leading to a false positive conclusion in hypothesis testing.

What is the term for the measure of the strength of a relationship between two variables that varies from -1 to 1, with 0 indicating no linear relationship?
1. Standard deviation
2. Interquartile range
3. Pearson’s correlation coefficient
4. Mode

Ans. C

Pearson’s correlation coefficient is a measure of the strength of a linear relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear relationship.

What does the term “confidence interval” represent in statistics?
1. The range of data values in a dataset
2. The likelihood of making a Type I error in hypothesis testing
3. The range of values within which a parameter is estimated to fall with a specified level of confidence
4. The margin of error in a confidence interval

Ans. C

A confidence interval represents the range of values within which a parameter is estimated to fall with a specified level of confidence, typically denoted by a confidence level.

Which statistical test is used to determine if there is a significant association between two categorical variables?
1. Student’s t-test
2. Chi-squared test
3. Wilcoxon signed-rank test
4. Analysis of Variance (ANOVA)

Ans. B

The Chi-squared test is used to determine if there is a significant association between two categorical variables, testing the independence of variables in a contingency table.

What is the term for a measure of the average distance of data values from the mean in a dataset?
1. Interquartile range
2. Mode
3. Standard deviation
4. Range

Ans. C

Standard deviation is a measure of the average distance of data values from the mean in a dataset, indicating the degree of dispersion or variability.

What is the term for the measure of how data values tend to cluster around the median in a dataset?
1. Range
2. Skewness
3. Interquartile range
4. Mode

Ans. C

The interquartile range is a measure of how data values tend to cluster around the median in a dataset, representing the middle 50% of the data distribution.

What is the term for a statistical measure that quantifies the strength and direction of a relationship between two variables in a linear model?
1. Chi-squared statistic
2. P-value
3. Pearson’s correlation coefficient
4. Standard error

Ans. C

Pearson’s correlation coefficient is a statistical measure that quantifies the strength and direction of a linear relationship between two variables, providing a value between -1 and 1.

What does the term “p-value” represent in hypothesis testing?
1. The probability of making a Type I error
2. The margin of error in a confidence interval
3. The significance level or threshold for rejecting the null hypothesis
4. The strength of a linear relationship between two variables

Ans. A

The p-value represents the probability of making a Type I error, which is the error of incorrectly rejecting the null hypothesis when it is true.

What is the term for a statistical measure that describes the proportion of true positive results out of all actual positive cases in a classification problem?
1. Sensitivity
2. Specificity
3. Precision
4. Accuracy

Ans. A

Sensitivity is a measure that describes the proportion of true positive results out of all actual positive cases in a classification problem, indicating the model’s ability to detect positives correctly.

What is the term for a method used to impute missing values in a dataset by replacing them with estimated values based on other data points?
1. Outlier detection
2. Feature selection
3. Hypothesis testing
4. Imputation

Ans. D

Imputation is a method used to replace missing values in a dataset with estimated values based on other data points, ensuring completeness for analysis.

What is the term for a measure of the proportion of the total area under a probability distribution curve to the left of a specific value?
1. Z-score
2. Confidence level
3. Percentile
4. P-value

Ans. C

The term “Percentile” represents the measure of the proportion of the total area under a probability distribution curve to the left of a specific value, indicating the position of a value within a distribution.

What is the term for the probability of correctly rejecting the null hypothesis in hypothesis testing?
1. Type I error
2. Type II error
3. Power
4. Confidence level

Ans. C

Power is the probability of correctly rejecting the null hypothesis in hypothesis testing, indicating the test’s ability to detect a true effect.

What is the term for a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values?
1. Variance
2. Range
3. Standard error
4. Interquartile range

Ans. B

The range is a measure of the spread of data values in a dataset, representing the difference between the maximum and minimum values.

Which statistical distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence?
1. Normal distribution
2. Poisson distribution
3. Exponential distribution
4. Binomial distribution

Ans. B

The Poisson distribution is used to model the number of events occurring in a fixed interval of time or space, given an average rate of occurrence, such as the number of emails received per hour.

What is the term for the measure of the average value of a set of data points?
1. Median
2. Mode
3. Mean
4. Interquartile range

Ans. C

The mean is the measure of the average value of a set of data points, calculated by summing all values and dividing by the number of data points.

What does the term “significance level” represent in hypothesis testing?
1. The probability of making a Type I error
2. The likelihood that a sample is representative of the population
3. The range of values within which a parameter is estimated to fall
4. The proportion of a population captured by a sample

Ans. A

The significance level in hypothesis testing represents the probability of making a Type I error, typically denoted by alpha (α).

What is the term for a method used to estimate population parameters based on sample data, taking into account sampling variability?
1. Hypothesis testing
2. Confidence interval
3. Regression analysis
4. Bootstrapping

Ans. B

A confidence interval is a method used to estimate population parameters based on sample data, taking into account sampling variability and providing a range of possible values for the parameter.

What is the term for a statistical measure that describes the proportion of true negative results out of all actual negative cases in a classification problem?
1. Sensitivity
2. Specificity
3. Precision
4. Accuracy

Ans. B

Specificity is a measure that describes the proportion of true negative results out of all actual negative cases in a classification problem, indicating the model’s ability to correctly identify negatives.

What is the term for a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model?
1. R-squared (R^2)
2. Coefficient of determination
3. Pearson’s correlation coefficient
4. Standard error of the estimate

Ans. A

R-squared (R^2) is a measure of the proportion of total variation in a dependent variable explained by independent variables in a regression model, also known as the coefficient of determination.

