Statistics is the study of collecting, analyzing, interpreting, presenting and organizing data. It involves methods to gather information from datasets, summarize findings using measures like averages or percentages and draw conclusions or make predictions based on that information. This chapter includes various methods of measuring central tendency such as arithmetic mean, median, mode, empirical formula, bar graphs, histogram, pie chart, frequency polygon and ogive.
Data refers to information or facts that are collected, observed or recorded. It can be in the form of numbers, words, measurements, or observations about people, events, things, or phenomena.
Raw data refers to the original, unprocessed information collected directly from observations, measurements, or recordings.
Frequency refers to the count or number of times a particular value, category, or event occurs in a dataset. The frequency of data is represented by f.
For example The alphabet ‘s’ appears three times in the word statistics, thus the frequency of ‘s’ is 3.
Cumulative frequency refers to the total of frequencies or counts of values within a dataset. It represents the sum of frequencies up to a certain point in a data distribution. As each value is added, the cumulative frequency continuously increases.
For example: The shoe sizes of ten students of Class IX are 6, 8, 9, 7, 6, 7, 9, 6, 10, 8.
Cumulative frequency distribution table:
Shoe Size |
Frequency |
Cumulative Frequency |
6 |
3 |
3 |
7 |
2 |
3 + 2 = 5 |
8 |
2 |
5 + 2 = 7 |
9 |
2 |
7 + 2 = 9 |
10 |
1 |
9 + 1 = 10 |
Notice that the final cumulative total will consistently match the total for all observations because each frequency has already been included in the preceding total.
Grouped data refers to a method of organising a large set of numerical data into intervals or ranges, rather than listing individual values. This grouping allows for easier analysis and presentation of data when dealing with a wide range of values.
Class intervals are the ranges or divisions into which the data is grouped. They are created by grouping data values into categories or intervals of equal width or size.
The lower limit is the smallest value or the starting point of a class interval. It defines the lowest value included in a particular interval.
The upper limit is the largest value or the endpoint of a class interval. It defines the highest value included in that interval.
For Example, in a grouped frequency distribution where data is grouped into intervals if an interval is defined as 25- 35, the lower limit is 25 and the upper limit is 35.
The class mark of a class interval is the middle value within that interval. It is calculated as the average of the lower and upper limits of the interval.
Class Mark = (Lower limit + Upper limit) / 2
For example, What will be the class mark of the given class: 10 − 20?
Lower limit = 10
Upper limit = 20
Class mark = (Lower limit + Upper limit) / 2
= (10 + 20) / 2
= 30 / 2
= 15
Measures of central tendency are numerical expressions that represent the characteristics of a dataset. There are many types of statistical averages such as arithmetic mean or mean, median and mode.
The arithmetic mean is the arithmetic average of all the values in a dataset. It's calculated by adding up all the values and dividing by the number of values.
The mean of n observations is
Mean = (Sum of all observations) / (Total no. of observations)
Example: The weight (in kgs) of 5 students is 45.5, 52, 55, 65 and 49.5. What is the arithmetic mean of their weight?
a) 53.8 kg
b) 53.6 kg
c) 53.2 kg
d) 53.4 kg
Answer: d) 53.4 kg
Explanation: The arithmetic mean is the arithmetic average of all the values in a dataset.
Mean = (Sum of all observations) / (Total number of observations)
= (45.5 + 52 + 55 + 65 + 49.5) / 5
= 267 / 5
= 53.4 kg
The arithmetic mean for a given discrete frequency distribution can be obtained by using one of the three methods:
(i) Direct method
The formula for finding the mean by the direct method is
Where
x is the variate.
f is the frequency.
Σf_{i}x_{i} is the sum of the product of each x and its frequency f.
Σf_{i} is the total of all frequencies.
i varies from 1 to n
(ii) Assumed Mean method
The formula for finding the mean by the assumed mean method is
Where
x is the variate.
f is the frequency.
A is the assumed mean.
deviation (d_{i}) = x_{i} − A
Σf_{i}d_{i} is the sum of the product of each deviation d and its frequency f.
Σf_{i} is the total of all frequencies.
i varies from 1 to n
Example: The weight of 40 students of a class is given below:
Weight (in kg) |
55 |
57 |
59 |
61 |
63 |
No. of students |
8 |
11 |
9 |
7 |
5 |
What is the mean weight of the students using the assumed mean method?
a) 58.5 kg
b) 58.25 kg
c) 58.75 kg
d) 58 kg
Answer: a) 58.5 kg
Explanation: Steps for finding the mean using the assumed mean method are:
a. Create a four-column frequency table.
(i) Enter the variate (x_{i}) values in the first column from the left.
(ii) Record the frequency (f_{i}) of each variate in column (a) in the second column from the left.
b. Select a number, 'A' (ideally from the variate ‘x_{i}’ values that are provided in the first column). In this case, 'A' is referred to as the assumed mean.
To obtain the deviation 'd_{i},' subtract the assumed mean 'A' from each value of variate 'x_{i}' in the first column.
Thus, deviation (d_{i}) = x_{i} − A
In the third column, record the values of each deviation (d = x − A) together with the matching frequencies.
c. To obtain the values of f_{i}d_{i}, multiply the frequency (f_{i}) in the second column by the matching deviation (d_{i}) in the third column.
Record the values of fidi in the fourth column and against the corresponding values of deviations 'di'.
d. Determine ∑f_{i}d_{i}, the total of all the values of f_{i}d_{i} in the fourth column.
Also, ∑f_{i} = n, the sum of all values of frequency ‘f_{i}’.
e. The following formula gives the required mean using the assumed mean method:
Let assumed mean (A) = 59
Thus,
Weight (in kgs) |
No. of Students |
di = x_{i} − A |
f_{i}d_{i} |
55 |
8 |
− 4 |
− 32 |
57 |
11 |
− 2 |
− 22 |
59 |
9 |
0 |
0 |
61 |
7 |
2 |
14 |
63 |
5 |
4 |
20 |
Σf_{i} = 40 |
Σf_{i}d_{i} = − 20 |
Mean = A + ^{Σfidi}⁄_{Σfi}
= 59 + ^{(-20)}⁄_{40}
= 59 - 0.5
= 58.5 kg
(iii) Step-deviation method
The following formula gives the required mean using the step-deviation method:
Where
x is the variate.
f is the frequency.
A is the assumed mean.
t_{i} = (x_{i} − A) / h
Σf_{i}t_{i} is the sum of the product of each t and its frequency f.
Σf_{i} is the total of all frequencies.
h is the biggest integer that divides (x_{i} − A)
i varies from 1 to n.
(i) Direct Method
Steps:
The formula for finding the mean by the direct method is
Mean = ^{Σfixi}⁄_{Σfi}
Where
x is the variate.
f is the frequency.
Σf_{i}x_{i} is the sum of the product of each x and its frequency f.
Σfi is the total of all frequencies.
i varies from 1 to n.
(ii) Assumed Mean Method
Steps:
The formula for finding the mean by the assumed mean method is
Mean = A + ^{Σfidi}⁄_{Σfi}
Where
x is the variate.
f is the frequency.
A is the assumed mean.
deviation (d_{i}) = A − x_{i}
Σf_{i}d_{i} is the sum of the product of each deviation d and its frequency f.
Σf_{i} is the total of all frequencies.
Any number can be taken as the assumed mean but to make the calculations easier, it should be taken from the middle of the values of x.
(iii) Step-deviation method
According to this method,
Mean = A + (^{Σfiti}⁄_{Σfi} × h)
Where
A is the assumed mean.
t_{i} = (x_{i} − A) / h
h = class size
i varies from 1 to n
When data is arranged in an order, the middle value is known as the median. The data can be arranged in ascending or descending order.
Let there be n terms and they are arranged in ascending or descending order.
(i) If n is odd, then median = (^{n + 1}⁄_{2})^{th} term
(ii) If n is even, there are two middle terms, that is (^{n}⁄_{2})^{th} term and (^{n}⁄_{2} + 1)^{th} term.
Thus, the median is the arithmetic mean of these two terms.
Example: What is the median of 11, 7, 14, 22, 9, 5 and 12?
a) 5
b) 12
c) 11
d) 9
Answer: c) 11
Explanation: We know that the median is the middle value in a dataset when arranged in ascending or descending order.
Thus, arranging the given terms in ascending order according to their magnitudes.
5, 7, 9, 11, 12, 14, 22
Since there are an odd number of values, then median = (^{n + 1}⁄_{2})^{th} term
Where
n is the total number of terms.
n = 7
Thus, median = (^{7 + 1}⁄_{2})^{th} term
= (8 / 2)^{th} term
= 4^{th} term
Thus, median = 11
Example: The ages of 35 children in a society are given below.
Age (in years) |
12 |
13 |
14 |
15 |
16 |
No. of Children |
9 |
10 |
5 |
4 |
7 |
What is the median age?
a) 15 years
b) 16 years
c) 14 years
d) 13 years
Answer: d) 13 years
Explanation: Construct the cumulative frequency table
Age (x) |
No. of children (f) |
Cumulative frequency (cf) |
12 |
9 |
9 |
13 |
10 |
19 |
14 |
5 |
24 |
15 |
4 |
28 |
16 |
7 |
35 |
Total number of children = 35
i.e. n = 35, which is odd.
Thus, median = [(n +1) / 2]^{}^{th} term
= (36 / 2)^{th} term
= 18^{th} term
= age of 18^{th} child
According to the table obtained above, the age of each child from 10th child to 19th child is 13 years.
Age of 18^{th} child = 13 years
Median age = 13 years
Where
l = lower limit of median class
h = class size
n = number of observations
f = frequency of median class
cf = cumulative frequency of class preceding the median class
To find the median class, find the cumulative frequencies of all the classes and
n / 2. Now, locate the class whose cumulative frequency is greater than (and nearest to) n / 2. This is called the median class.
Example: A survey regarding the heights (in cm) of 45 girls of Class XII of a school was conducted and the following data was obtained:
Height (in cm) |
No. of girls |
Below 145 |
3 |
Less than 150 |
7 |
Less than 155 |
19 |
Less than 160 |
30 |
Less than 165 |
37 |
Less than 170 |
45 |
What is the median?
a) 156.591 cm
b) 155.591 cm
c) 156.581 cm
d) 155.581 cm
Answer: a) 156.591 cm
Explanation: The frequency distribution table with the given cumulative frequencies becomes:
Class Interval |
Frequency |
Cumulative frequency |
Below 145 |
3 |
3 |
145 - 150 |
4 |
7 |
150 - 155 |
12 |
19 |
155 - 160 |
11 |
30 |
160 - 165 |
7 |
37 |
165 - 170 |
8 |
45 |
We know that
Here, n = 45
→ n / 2 = 45/2
= 22.5
This observation lies in the class interval 155 - 160.
→ l (lower limit) = 155
→ h (class size) = 5
→ f (frequency of the median class) = 11
→ cf (cumulative frequency of the preceding class, i.e. 150 - 155) = 19
→ Median = 155 + ^{22.5-19}⁄_{11} × 5
= 155 + 3.511 × 5
= 155 + ^{17.5}⁄_{11}
= 155 + 1.591
= 156.591 cm
The mode is the value that appears most frequently in a set of data. It's the number that occurs most often.
Example: What is the mode of the data: 2, 3, 6, 4, 3, 2, 3, 4, 3, 6, 2, 7, 3?
a) 2
b) 3
c) 7
d) 6
Answer: b) 3
Explanation: We know that the mode is the value or values that appear most frequently in a dataset.
In the given dataset, 3 appears the maximum number of times that is 5 times.
Thus, mode = 3
Example: Consider the given frequency distribution:
Number |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
Frequency |
11 |
5 |
13 |
7 |
17 |
10 |
8 |
6 |
What is the mode?
a) 7
b) 11
c) 9
d) 12
Answer: b) 11
Explanation: The mode is the value or values that appear most frequently in a dataset.
From the given data, the frequency of the number 11 is maximum.
Thus, mode = 11
Where
l = lower limit of modal class
h = size of the class interval
f_{1} = frequency of the modal class
f_{0} = frequency of the class preceding the modal class
f_{2} = frequency of the class succeeding the modal class
Modal class is the class with the maximum frequency.
Example: The data for the number of family members in a household in a locality is given below:
Family Size |
1 - 3 |
3 - 5 |
5 - 7 |
7 - 9 |
9 - 11 |
11 - 13 |
No. of families |
8 |
5 |
10 |
3 |
2 |
2 |
What is the mode of this data?
a) 5.933
b) 5.667
c) 5.833
d) 5.733
Answer: c) 5.833
Explanation: Here the maximum class frequency is 10.
Thus, the modal class is 5 - 7.
We know that
Where
l (lower limit) = 5
h (class size) = 2
f_{1} (frequency of the modal class) = 10
f_{0} (frequency of the class preceding the modal class) = 5
f_{2} (frequency of the class succeeding the modal class) = 3
There is an empirical relationship between the three measures of central tendency:
3 Median = Mode + 2 Mean
Example: What is the value of median if the values of mean and mode are 24 and 35 respectively?
a) 28.67
b) 27.67
c) 27.33
d) 28.33
Answer: b) 27.67
Explanation: We are given Mode = 35 and mean = 24
We know that 3 Median = Mode + 2 Mean
→ 3 Median = 35 + 2 (24)
= 35 + 48
= 83
→ Median = 83 / 3
= 27.67
Quartiles divide a dataset into four equal parts or quarters. There are three quartiles - Q_{1}, Q_{2} (also the median), and Q_{3} - representing specific points in a dataset when arranged in ascending order.
When the lower half, before the median, is divided into two equal parts, the value of the dividing variate is called the lower quartile.
Let n terms be arranged in ascending order,
→ If n is even, then Q_{1} = (n / 4)th term.
→ If n is odd, then Q_{1} = [(n + 1) / 4]th term.
When the upper half, after the median, is divided into two equal parts, the value of the dividing variate is called the upper quartile.
Let n terms be arranged in ascending order,
→ If n is even, then Q_{3} = (3n / 4)^{th} term.
→ If n is odd, then Q_{3} = [3(n + 1) / 4]^{th} term.
It is the difference between the third quartile (Q_{3}) and the first quartile (Q_{1}).
Inter-Quartile Range = Q_{3} − Q_{1}
→ Since Q_{3} > Q_{1}, the inter-quartile range is always positive.
Example: What is the interquartile range for the data: 12, 5, 8, 17, 22, 15, 9, 11?
a) 9
b) 6
c) 7
d) 8
Answer: c) 7
Explanation: Inter-Quartile Range is the difference between the third quartile (Q_{3}) and the first quartile (Q_{1}).
Arrange the given data in ascending order.
5, 8, 9, 11, 12, 15, 17, 22
Thus, n = 8, which is an even number.
If n is even, then Q_{1} = (n / 4)^{th} term.
→ Q_{1} = (8 / 4)^{th} term
= 2^{nd} term
= 8
If n is even, then Q_{3} = (3n / 4)^{th} term.
→ Q_{3} = (3(8) / 4)^{th} term
= 6^{th} term
= 15
Inter-Quartile Range = Q_{3} − Q_{1}
= 15 − 8
= 7
It refers to the visual depiction of data using graphs, charts, diagrams or other visual tools. Common types of graphical representations include bar graphs, histograms, line graphs and pie charts.
A histogram is a graphical representation of the distribution of numerical data, presented as a series of adjacent rectangles or bars. It displays the frequency of data within specified intervals along a continuous range.
In a histogram:
→ The horizontal axis represents the numerical range or intervals of the data.
→ The vertical axis shows the frequency of data points falling within each interval.
→ Bars are drawn adjacent to each other with widths representing the intervals and heights indicating the frequency of values within those intervals.
→ The bars have no gaps between them, as they represent continuous data ranges.
The histogram representing the salary distribution of employees of ABC Corporation is shown below:
A frequency polygon is a graph that represents the frequency distribution of a dataset. It is created by connecting the midpoints of the tops of the bars in a histogram or the plotted points of a frequency table using straight-line segments.
In a frequency polygon:
→ The horizontal axis typically represents the variable being measured (such as values or intervals).
→ The vertical axis represents the frequency.
→ Points are plotted above the midpoint of each interval or value in the frequency distribution.
→ These points are connected by straight line segments to form a polygonal line, emphasising the pattern in the data's frequency distribution.
The frequency polygon representing the engine size of cars is shown below:
An ogive, also known as a cumulative frequency curve, is a graphical representation that displays the cumulative frequencies of a dataset.
In an ogive:
→ The horizontal axis represents the variable being measured (values or intervals).
→ The vertical axis represents the cumulative frequency.
→ Points are plotted and connected to form a curve or line, indicating the cumulative total of frequencies up to that point.
→ The curve gradually rises, showing the increasing cumulative frequency as values progress.
The ogive of age of people attending library reading is shown below:
In this section, you will find interesting and well-explained topic-wise video summary of the topic, perfect for quick revision before your Olympiad exams.
***COMING SOON***
>> Join CREST Olympiads WhatsApp Channel for latest updates.
CREST Olympiads has launched this initiative to provide free reading and practice material. In order to make this content more useful, we solicit your feedback.
Do share improvements at info@crestolympiads.com. Please mention the URL of the page and topic name with improvements needed. You may include screenshots, URLs of other sites, etc. which can help our Subject Experts to understand your suggestions easily.