Measure Theoretic Probability explained


In this article, we explain, with undergraduate level mathematics language, how we could formulate the concept of Probability with the aid of Measure Theory in Real Analysis. It is intended for audiences with a certain level of mathematical and statistical background, but please feel free to continue reading if you are interested in this topic.

Measure Theory in Probability

So, the first question that might be asked is that, why do mathematicians incorporate Measure Theory into Probability? There are actually 2 main reasons to do so:

1. The existence of non-measurable sets

It can be mathematically shown that under usual definition of measures and with axiom of choice, there does exist subsets of uncountable sets (say, [0, 1]) in the Reals that is non-measurable. Interested reader could refer to [1] for a concrete method to construct such a set. The existence of these non-measurable sets implies that sometimes it is not possible to us to define a 'proper' probability to every subset in the sample space we want.

2. False Dichotomy of Discrete and Continuous Random Variables

It is, in some ways, a false dichotomy if we try to classify every random variable into two classes named 'Discrete Random Variables' and 'Continuous Random Variables'. In fact, we can easily construct a random variable that is neither discrete nor continuous in the sense that we learnt in the elementary Statistics or Probability classes.

For example we might let X be a random variable with a law of N(0,1) and Y be another random variable that P(Y = 1) = P(Y = 0) = 1/2. If we define the third random variable Z s.t. P(Z = Y) = P(Z = X) = 1/2, then it becomes ambiguous whether it is discrete or continuous (or both).

However, since we developed different theories and formulas for discrete and continuous Random Variables, the lack of Measure Theory makes it hard for us to deal with this type of Random Variables.

What's next?

Now, I hope you are convinced that Measure Theory could help us develop a rigorous theory in Probability by filling the flaws of our elementary theories. In the next section, we outline the basic concepts of Probability Triples. We will skip some of the technical parts in the constructions of Probability Triples. Again, we refer interested readers to [1] if you are somehow not convinced.

Probability Triples explained

A Probability Triple is defined to be a triple consisting of

  1. a sample space (usually denoted by 'omega'),
  2. a sigma-algebra, which includes all 'measurable' subsets of our sample space, and finally
  3. a Probability measure which assigns each element of our sigma-algebra to a number in [0, 1].

1. Sample space

It is literally the non-empty set when we draw samples from. Sometimes the sample space is not even specified. For example, if we wish to construct a uniform distribution on [0, 1], it is pretty obvious that [0, 1] as a subset of the Reals will be our sample space.

2. Sigma-algebra (Sigma-field)

We are not digging into the hidden layers in categorising of subsets in the sample space. There are indeed many notions to study (e.g. semi-algebra, algebra, etc.) if we have to, say, construct a probability triple from scratch on [0, 1] containing all intervals. Here we will instead focus on the concept of sigma-algebra, which is defined by subsets of the sample space such that it includes the empty set, the whole sample space, and is closed under complement, countable union and (thus) countable intersection.

The sigma-algebra can be thought of the set including all 'measurable sets' that we wish to assign a probability measure on, which might not include all subsets of the sample space as mentioned.

There are many possible sigma-algebras given a sample space. If we are given a sample space of [0, 1], two common sigma-algebras to be considered are

  • the Borel sigma-algebra, the smallest sigma-algebra containing all open subsets of [0, 1], and
  • the sigma-algebra constructed on all open intervals by the famous Extension Theorem in Measure Theory.

They could be different in many aspects such as completeness of measure and cardinality, but we shall not worry too much in our context.

3. Probability Measure

We define it as a function P mapping elements of our sigma-algebra to [0, 1]. It is essentially assigning the 'probability' of each measurable subset of the sample space. It has several important restrictions on P that is worth mentioning, but could probably be trivial to some of you:

  • the probability of the empty set is zero and that of the whole sample space is one, and
  • countable additivity on disjoint subsets.

There are also other important properties of P that could be derived from the restrictions, like the countable sub-additivity, principle of inclusion-exclusion, etc., which we will leave it for you to explore.

Random Variables as functions

With Probability Triples defined, we now shall be able to define our familiar notion of random variables.

Given an underlying probability triple, a (real-valued) random variable X is defined to be a map from the sample space to the Reals that is 'measurable'. A measurable map is simply defined to be such that the subset {t in the sample space: X(t) <= t} is measurable, i.e. an element of the sigma-algebra we have. (Note that the measurability of a set and a function are two distinct concepts, though they are highly related.) It is important here to note that for any given Probability Triple, it is possible to define many different random variables.

You can see random variables as functions, but you should be careful not to confuse this function with another very commonly used function in probability theories, the distribution (or law), which is essentially a push-forward measure that is induced by our Random Variable [2]. Don't panic if you are confused by those notions - we will have another article explaining about pdf and push-forward later.


By now you should be able to grasp the basic concept of understanding probability under the spectrum of Measure Theory. We have briefly introduced the necessity of Measure Theory, and the concepts of Probability Triples and Random Variables.

It has been my intention not to make things rigorous here as I always believe interested readers should consult textbooks and fill in the details themselves.

In the next writing I will be delving into the concept of Expectations and we will derive a unified definition of expectation of general random variables with the need to consider discrete and continuous cases separately.


[1] Rosenthal, J. S. (2016). A first look at rigorous probability theory. World Scientific.

[2] Peyré, G., & Cuturi, M. (2019). Computational optimal transport: With applications to Data Science. now.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store