SCATTER DIAGRAM

Representing Paired Data Using a Scatter Diagram

Prep Time (hours) on the x-axis and Marks (20) on the y-axis

As a teacher, I know that the number of hours a student prepares for a test and the score obtained in that test are related.

After correcting a test paper (marked out of 20), I asked the students approximately how many hours they had prepared for the test. Based on their responses, I obtained the following data.

The first step in analysing this data is to represent it graphically. So, I plot the number of hours of preparation on the x-axis and the marks obtained on the y-axis.

This type of graph, where each pair of values is represented by a point, is called a scatter diagram.

Since real-life data is not perfect like theoretical data, the points do not lie exactly on a straight line. However, by observing the scatter diagram, we can study the general pattern or trend present in the data.

DATA Observed Values
Prep Time (hours) Marks (20)
514
27
717
311
615
16
818
413

Each row is one student. We treat each pair as a single point on the scatter diagram.

GRAPH Scatter Diagram

Click the "x" on the above graph to see the coordinates

Scatter diagram Paired data plotted as points on a graph is called a scatter diagram. This graph Prep Time (hours) is on the x-axis and Marks (20) is on the y-axis, so each student gives one point. Scatter diagram Paired data plotted as points on a graph is called a scatter diagram. This graph Prep Time (hours) is on the x-axis and Marks (20) is on the y-axis, so each student gives one point.
TREND

Understanding Trend in a Scatter Diagram

A trend is the overall direction the points appear to follow in a scatter diagram.

When studying a scatter diagram, we do not focus on individual points. Instead, we observe the entire pattern formed by the points.

Even when points are scattered, they may still show a clear direction. This overall direction is called the trend.

The three graphs below show the most common types of trends found in real data.

UPWARD TREND Positive relationship

As the x-values increase, the y-values also tend to increase.

DOWNWARD TREND Negative relationship

As the x-values increase, the y-values tend to decrease, even though the points are scattered.

NO CLEAR TREND No relationship

The points do not follow any clear direction. Knowing x does not help in predicting y.

NOISE AND OUTLIERS

Noise and Outliers in a Scatter Diagram

Why real data scatters and why some points stand far away from the pattern

NOISE IN REAL DATA

Consider two students who studied the same number of hours, but still obtained different marks in the exam.

This happens because marks are not affected by study time alone. They are also influenced by factors such as:

  • understanding of the subject
  • memory and recall
  • sleep and physical condition
  • health on the day of the exam
  • exam pressure and stress

Practically it is very difficult to find how many parameters control a dependednt variable (some are unknown for us and some are intentionally hidden because data collection becomes difficult).

Because of this, the marks do not increase perfectly with study time. The points spread out around the main direction instead of lying exactly on a straight line. This spreading of points is called scatter, and it is caused by noise.

NOISE IN MODELLING

While building a mathematical model, we usually include only a few important variables. Many other factors remain unknown or unmeasured.

These missing parameters introduce variation in the output. This unexplained variation is also called noise.

OUTLIER

An outlier is a data point that lies far away from the general pattern or trend of the data.

In a scatter diagram, most points cluster around a main direction. An outlier appears as a point that does not follow this pattern.

An outlier may occur due to:

  • a recording or measurement error
  • an unusual situation affecting the observation
  • a student performing much better or worse than expected for that level of preparation

It is important to identify Outliers because they can:

  • distort the trend
  • affect calculations such as averages and regression lines
  • lead to misleading conclusions if not identified
Noise occurs naturally because we have not considered all parameters that affect the dependent variable, while outliers are extreme values that may arise due to errors or unusual conditions. Noise occurs naturally because we have not considered all parameters that affect the dependent variable, while outliers are extreme values that may arise due to errors or unusual conditions.

Noise

Outlier

Need help with mathematics? Contact SJMathTube.

Need help with mathematics?

School, college or engineering — if you are stuck, confused or need help, reach out to us. We will guide you step by step and make concepts simple.

Choose how you want to contact us:

Formulae

Available Formula Sheets

Latest
LATEST CONTENT Auto
Home