For the layperson, it's probably most helpful to think of the density function \(f\) associated to a random variable \(X\) as the function you integrate to compute probabilities. Similarly, the expected value \(E(X)\) is thought of as the average value that \(X\) takes. Often \(E(X)\) is defined already in terms of the density function, and it's not clear to the beginner why \(\int_{\mathbb R} x f(x) \ dx\) should compute the expected value of \(X\). What should be perhaps a bit more obvious is that if you integrate \(X\) over the entire probability space, with respect to the given probability measure, then you obtain the average value of \(X\). This is indeed how the expectation of \(X\) is typically defined in a more analytical setting.

Given an arbitrary measure space \((\Omega, \mathcal B, P)\), a measurable space \((\Omega, \mathcal B')\), and a \((\mathcal B, \mathcal B')\)-measurable map \(X \colon \Omega \to \Omega'\), the pushforward \(P_X(E):=P(X^{-1} (E))\) is a measure on \(\Omega'\). It is easy to verify that if \(f \colon \Omega' \to \mathbb R\) is any measurable function, then \(\int_{\Omega'} f \ dP_X = \int_{\Omega} (f \circ X) \ dP\).

Now we us restrict ourselves to the context of a probability space \((\Omega, \mathcal B, P)\). A random variable \(X \colon \Omega \to \mathbb R\) is \((\mathcal B, \mathcal B')\)-measurable, where \(\mathcal B'\) denotes the Borel subsets of \(\mathbb R\). The expectation \(E(X)\) of \(X\) is defined to be \(\int_{\Omega} X \ dP\). In light of the proposition above, with \(1_{\mathbb R} \circ X\) replacing \(f \circ X\) (where \(1_{\mathbb R}\) is the identity function on \(\mathbb R\)), we have

\[\begin{equation*} E(X) = \int_{\Omega} X \ dP= \int_{\Omega} 1_{\mathbb R} \circ X \ dP =\int_{\mathbb R} 1_{\mathbb R} \ d P_X= \int_{\mathbb R} x \ d P_X. \end{equation*}\]

The pushforward measure \(P_X\) gives rise to, and is determined by, a function called the distribution function of \(X\), defined by \(F(t):=P_X( (-\infty, t]) = P(X \leq t)\). The density function \(f\) of \(X\) is taken to be the Radon-Nikodym derivative \(d P_X/ d \lambda\), where \(\lambda\) is the usual Lebesgue measure on \(\mathbb R\). This means that for any Borel set \(B\), the density \(f\) satisfies

\[\begin{equation*} P(X \in B) = \int_{X^{-1}(B)} \ dP = \int_B f \ d \lambda = \int_B f \ dx. \end{equation*}\]

One of the properties of the Radon-Nikodym derivative is that if \(g \colon \mathbb R \to \mathbb R\) is \(P_X\)-measurable, then \(\int_{\mathbb R} g \ d P_X = \int_{\mathbb R} g \cdot (d P_X/d \lambda) \ d \lambda = \int_{\mathbb R} g \cdot f \ d \lambda\). The identity function on \(\mathbb R\) is \(P_X\)-measurable, so

\[\begin{equation*} E(X) = \int_{\mathbb R} x \ d P_x = \int_{\mathbb R} x \cdot f(x) \ dx, \end{equation*}\]

which recovers the usual definition of the expectation in terms of the density.