DATA SCIENCE FOUNDATION
From Laws and Equations to Data and Prediction
Long back, many philosophers, scientists, and thinkers observed that the universe does not behave randomly.
They noticed that patterns exist everywhere. Some patterns are easy to observe, while others remain hidden
because of limitations in technology or human understanding.
Examples of patterns:
- Day follows night.
- Seasons repeat year after year.
- Plants grow in stages.
- Human height increases with age and then stabilizes.
- Life begins, grows, weakens, and ends.
These regularities convinced early thinkers that the universe is governed by laws.
Because of this belief, scientists aimed to discover exact rules behind natural phenomena.
They believed that once these rules were discovered, nature could be described using precise mathematical equations.
This belief gave rise to theoretical models.
What is a theoretical model?
A theoretical model is built from first principles. It assumes ideal conditions and aims for exact accuracy.
Many foundational scientific laws were developed using this approach.
PHYSICS
In physics, the time period of a simple pendulum is
\[
T = 2\pi\sqrt{\frac{L}{g}}.
\]
This formula assumes small oscillations, no air resistance, and a perfectly rigid string.
THERMODYNAMICS
In thermodynamics, the ideal gas law
\[
PV = nRT
\]
assumes ideal gas behavior, which never exists perfectly in reality.
MECHANICS
In mechanics, Newton’s second law
\[
F = ma
\]
works well for everyday motion, even though it ignores relativistic effects.
CHEMISTRY
In chemistry, the rate law
\[
\text{Rate} = k[A]
\]
assumes uniform temperature and no side reactions.
ECONOMICS
In economics, models like
\[
Q = a - bP
\]
assume rational consumers and perfect information.
PSYCHOLOGY
In psychology, Fechner’s law
\[
S = k\log R
\]
assumes a predictable relationship between stimulus and perception.
These models are elegant and powerful. However, they rely on assumptions that rarely hold exactly in real life.
While developing theoretical models, scientists are forced to simplify reality. Many parameters are assumed to be zero.
Many effects are ignored. Many interactions are treated as negligible.
Why assumptions are used
This is not because scientists are careless. Without assumptions, the mathematics becomes impossible to handle.
Air resistance is ignored. Friction is neglected. Noise is assumed absent. Variables are treated as independent.
As a result, the final equations look clean and beautiful. On paper, everything works perfectly.
But when applied to real world systems, predictions often deviate from reality.
Human behavior, biological systems, social interactions, and markets are too complex.
Small ignored factors accumulate and interact. Pure theory alone becomes insufficient.
After the 1970s, computers developed rapidly. Storage became cheaper. Processing power increased.
Data collection became easier. This changed the strategy of scientific modeling.
THE SHIFT
Instead of forcing reality into perfect equations, researchers began asking a practical question.
What if we model relationships approximately using observed data?
This marked the rise of empirical modeling.
Empirical models do not start with equations. They start with data. Observations are collected. Patterns are studied.
Relationships are approximated. The goal is not perfection, but usefulness.
Modern systems that use empirical models:
- Smartphone platforms analyze how long users watch videos and what they skip.
- Online stores analyze browsing and purchase history.
- Navigation systems analyze traffic patterns.
- Healthcare systems analyze patient data.
- Financial systems analyze spending and repayment behavior.
In none of these cases is there an exact physical law. Instead, approximate models are learned from data.
This way of thinking leads to regression and predictive modeling.
Key idea
Regression and predictive modeling are used when we want to study how one variable changes with another
and when we want to estimate unknown values using known data. In data science, most models are empirical.
An empirical model is built from observed data only. There is no fixed law or theoretical equation controlling
the relationship. The relationship is learned from data.