TutorChase logo
IB DP Maths AI SL Study Notes

4.1.1 Measures of Central Tendency

Mean

Definition and Formula

The mean, commonly known as the average, is a measure that is frequently used to find the central tendency of a data set. It is calculated by adding all the numbers in a dataset and then dividing by the count of the numbers. Mathematically, it is expressed as:

Mean = (Sum of all values) / (Number of values)

Detailed Explanation

The mean is particularly useful when all data points in the dataset are similar to each other and there are no outliers (extremely high or low values). This is because the mean takes into account all values in the dataset, making it highly sensitive to any outliers. For instance, if we were to calculate the average income in a community, a few extremely high incomes would raise the mean, potentially providing a skewed representation of the general income level.

Example 1: Calculating the Mean

Imagine a dataset representing the ages of a group of people: 21, 22, 23, 24, and 25.

To find the mean age:

  • Add all the ages together: 21 + 22 + 23 + 24 + 25 = 115
  • Divide by the number of ages: 115 / 5 = 23

Thus, the mean age is 23.

Considerations and Limitations

  • Influence of Outliers: The mean is susceptible to being influenced by outliers. A single, significantly higher or lower value can drastically affect the mean.
  • Applicability: The mean is most applicable to ratio and interval data, where the distances between points are meaningful and consistent.

Median

Definition and Formula

The median is the middle value in a dataset when the numbers are arranged in order (either ascending or descending). If there is an even number of observations, the median will be the average of the two middle numbers.

Detailed Explanation

The median is particularly useful for providing a more accurate representation of the dataset’s centre, especially in datasets containing outliers, as it is not affected by extremely large or small values. It divides your data into two halves, with one half falling below the median and the other half above it. This characteristic makes the median a better indicator than the mean for skewed distributions.

Example 2: Finding the Median

Consider a dataset of seven numbers: 11, 15, 16, 21, 23, 25, 26.

Since the numbers are already in ascending order, the median is the middle number, which is 21.

Example 3: Median with Even Data Points

If we have eight numbers: 11, 15, 16, 21, 23, 25, 26, 28.

The median is the average of the two middle numbers, i.e., (21 + 23) / 2 = 22.

Considerations

  • Data Type: The median can be used for ordinal, interval, and ratio data.
  • Unaffected by Extreme Values: The median remains unaffected by extremely high or low values, providing a more accurate representation of the dataset’s centre.

Mode

Definition

The mode is the value that appears most frequently in a data set. A data set may have one mode, more than one mode, or no mode at all.

Detailed Explanation

The mode is particularly useful for identifying the most common item in a dataset and is the only measure of central tendency that can be used with nominal data, which have categories rather than numbers. It is possible to have datasets where no number repeats, known as no mode, or datasets where two numbers repeat equally, known as bimodal.

Example 4: Identifying the Mode

Consider a dataset of numbers: 3, 7, 7, 11, and 15.

The mode of this dataset is 7, as it appears more frequently than the other numbers.

Example 5: Bimodal Data

If we have the numbers: 3, 7, 7, 11, 11, and 15.

This dataset is bimodal since two values (7 and 11) appear most frequently.

Considerations

  • Data Type: The mode can be used with nominal, ordinal, interval, and ratio data.
  • Usefulness: The mode is especially useful when it is important to know which is the most common item in the dataset.

Applications in Real-World Scenarios

Example 6: Analysing Test Scores

Consider a maths teacher analyses the test scores of her class. She finds the following scores: 65, 85, 89, 90, 92, 92, 93, 95, and 100.

  • Mean: (65 + 85 + 89 + 90 + 92 + 92 + 93 + 95 + 100) / 9 = 89
  • Median: The middle score is 92.
  • Mode: The score 92 appears most frequently.

The teacher notices that the mean, median, and mode are relatively close, indicating a symmetric distribution of scores without significant outliers.

Example 7: Analysing Salaries

A small IT company analyses the salaries (in thousands) of its employees: 40, 45, 45, 50, 52, 54, 60, 65, and 70.

  • Mean: (40 + 45 + 45 + 50 + 52 + 54 + 60 + 65 + 70) / 9 = 53.44
  • Median: The middle salary is 52.
  • Mode: The salary 45 appears most frequently.

The HR manager observes that the mean is slightly less than the median, indicating a few lower salaries are pulling the average down, but not drastically, suggesting a fairly even distribution of salaries across the company.

FAQ

Absolutely, a dataset can have no mode when no number repeats or when all numbers repeat with the same frequency. This is often the case with continuous data or when the dataset is particularly small or diverse. Having no mode can suggest that there is no value that occurs significantly more frequently than others, indicating a uniform distribution where each data point is equally likely to occur. In practical terms, it implies that there is no single most common value in the dataset, which might suggest a high level of variability or diversity within the data.

Yes, a dataset can have more than one mode, and this is interpreted based on the number of peaks in the data distribution. When a dataset has two values that are tied as the most frequently occurring, it is described as bimodal. If there are three values tied as most frequent, it is termed trimodal. Generally, a dataset with two or more modes is referred to as multimodal. The presence of multiple modes can indicate the existence of multiple different groups within your data. For example, a bimodal distribution of test scores might suggest two different groups of students: one that performed well and another that performed poorly, each group having its own mode.

The mode provides insight into the shape of the distribution by indicating the most frequently occurring value(s) in the dataset. When a dataset has one mode (unimodal), it suggests a single prominent peak in the distribution. If it has two modes (bimodal), it indicates two prominent peaks, which might suggest that the data consists of two different groups. A distribution with more than two modes (multimodal) might suggest multiple groups within the data. Furthermore, the mode can also provide insight into the symmetry of the distribution. For instance, in a perfectly symmetrical distribution, the mode will be equal to the mean and median. Understanding the mode(s) helps in identifying the general tendency of the dataset and can provide insights into the characteristics of the populations from which the data were drawn.

The mean is often considered a poor representative of the central tendency in a skewed distribution because it is heavily influenced by outliers or extreme values. In a positively skewed distribution, where there are some notably high values, the mean tends to be dragged towards these high values, making it larger than the median and not accurately reflecting the central point of the data. Similarly, in a negatively skewed distribution, where there are some notably low values, the mean is pulled downwards. This sensitivity to extreme values can provide a misleading representation of the dataset, especially when the mean is used for making generalisations about the data, as it may not accurately reflect the majority of the observations.

The median remains unaffected by outliers and skewed data because it solely depends on the order of values, not their magnitude. When calculating the median, we only consider the middle value (or the average of the two middle values) after arranging the data in ascending or descending order. Even if there are extremely high or low values (outliers) in the dataset, the median will not be influenced by them since it does not involve the actual values in its calculation, just their position in the ordered set. This characteristic makes the median a robust measure of central tendency in distributions with outliers or skewed data, providing a more accurate reflection of the dataset’s centre.

Practice Questions

The ages of a group of 10 friends are: 21, 22, 23, 24, 25, 26, 27, 28, 29, and 30 years. Calculate the mean, median, and mode of their ages.

The mean is calculated by adding all the ages together and dividing by the number of ages. So, the mean age is (21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30) / 10 = 255 / 10 = 25.5 years. The median is the middle value when all the ages are arranged in order. Since there are 10 ages (an even number), the median is the average of the 5th and 6th values, which are both 25 and 26. So, the median age is (25 + 26) / 2 = 25.5 years. The mode is the value that appears most frequently. In this case, no age appears more than once, so there is no mode.

A maths teacher has recorded the scores of 7 students in a test as follows: 15, 20, 15, 18, 20, 15, and 22. Calculate the mean, median, and mode of these scores.

To find the mean score, we add all the scores together and divide by the number of scores. Thus, the mean score is (15 + 20 + 15 + 18 + 20 + 15 + 22) / 7 = 125 / 7 = 17.86 (rounded to two decimal places). To find the median, we arrange the scores in ascending order: 15, 15, 15, 18, 20, 20, 22. The median score, being the middle score, is 18. The mode, being the most frequently occurring score, is 15 since it appears three times, more frequently than the other scores. This dataset is unimodal as it has one mode.

Hire a tutor

Please fill out the form and we'll find a tutor for you.

1/2
About yourself
Alternatively contact us via
WhatsApp, Phone Call, or Email