Dataset and Null Hypothesis

The set of numbers or values related to one area of study is called a dataset. For each of the variables, the data set assigns values. Every value is named datum. It may contain nominal data too. A dataset is taken by sampling a population. In statistics, a population is a group of objects that are obtained from an experiment.

Example: The test scores of the students in a particular class.

The different types of datasets are discussed below.

1] Numerical data sets: It is also known as quantitative data as it is usually expressed in terms of numbers. Arithmetic operations can be performed on numerical datasets.

2] Bivariate data sets: A dataset with 2 variables is called a bivariate dataset. It explains the relationship between the 2 variables.

3] Multivariate data sets: A dataset with multiple variables is called a multivariate dataset. It is obtained from a function of three or more than three variables.

4] Categorical data sets: The dataset that represents the attributes of an object is called categorical datasets. A qualitative variable is present in the categorical dataset which can take exactly 2 values. It is also called a dichotomous variable.

5] Correlation data sets: In correlation data sets, the values exhibit some relationship with each other.

Measures of central tendency for datasets

A statistic that denotes the center point of a dataset is called a measure of central tendency. Mean, median and mode are the most frequently used measures of central tendency.

1] Arithmetic mean: It is also known as arithmetic average. It can be found by adding up all the observations and dividing them by the total number of observations. It incorporates all the observations in the data. The location of the center is identified accurately by the mean in a skewed distribution. As the mean shifts farther from the center, the distribution results to be more skewed.

2] Median: The middlemost value of data is the median. It divides the given data into 2 branches or parts. The first step in finding the median is to arrange the given dataset from the smallest to the largest number. The median of a dataset is the middle number in the set. If the number of observations in the dataset is odd, then the median is the middle value whereas if the number of observations in the dataset is even, the median is the mean of the two middle values. If the data is skewed or the dataset consists of outliers, then the median is affected in a smaller way.

3] Mode: The most frequently occurring value in the dataset is called the mode. Mode is used for discrete, ordinal and categorical data.

Properties of a dataset

1] The description of spread among the different members of the data.

2] The center of the data.

3] The amount of data skewness.

4] Details about the probability distribution of the dataset.

5] The relationship between the variables in the dataset (correlation).

6] The number of outliers that exist in the dataset.

Applications of datasets

a] They are used in the demonstration of an ​​example of the Poisson Regression.

b] They are useful in exhibiting paired t-tests, mixed between-within ANOVA and ANOVA with repeated measures.

c] It is mainly used for ANOVA.

There exist more applications of datasets, however, a few are mentioned above for reference. For more concepts related to statistics like mean deviation, quartile deviation, correlation coefficient, null hypothesis, coefficient of determination etc, refer to BYJU’S. It consists of a detailed explanation of the topic along with solved examples for better understanding.

Related posts