## Calculating Basic Statistics

Whenever we are exploring a new dataset, the very first thing to do is calculate some basic statistics: number of observations, mean or average, minimum, maximum, median and standard deviation.

This helps us get an overview of our data quickly.

We will illustrate this in a dataset consisting of the height of a sample of 18-year-old males (in cm). In this case, the measured height of each student is our value or observation.

Our first statistic, the ** number of observations or sample size** is easy to get but important: we usually require a minimum of 30 before deciding the statistical test that should be applied later.

The ** mean or average** is the sum of all observations divided by the number of observations. In several contexts this represents a good estimate of how our data looks like.

The ** minimum and maximum** are useful for determining the range of the data, that is, the set of possible values that we will find in our dataset. They can be calculated with the

**min()**and

**max()**functions in a similar way.

The ** median** is the value that is right in the middle. That is, if we order from smallest to largest, the median is the value such that 50% of the values are above it and 50% of the values are below.

Sometimes, we will find that the median and the mean are very similar. This can be an indication that there is some symmetry in our data: for every large observation there is also a small observation, in similar proportions. Whenever the median and the mean are different, this means that there is a certain skew in our data, suggesting perhaps the presence of outliers or unusual observations.

The ** standard deviation** measures the spread of the data. That is, on average, how far are the values in our dataset from the mean. The larger the standard deviation, the bigger the spread. A small value of the standard deviation suggests that all the observations are similar to each other and to the average.