Why?
Say we have a stick of length 1, and we break it at point $x$. Then we take that stick of length $x$, and break that at point $y$.
What is $\mathbb{E}(Y|X)$?
Here are some useful properties related to conditional expectation.
\begin{align} &\text{(1) } \mathbb{E}\left( h(X) Y|X \right) = h(X) \, \mathbb{E}(Y|X) &\text{"taking out what is known"} \\\\ &\text{(2) } \mathbb{E}(Y|X) = \mathbb{E}(Y) &\text{if } X,Y \text{ are independent} \\\\ &\text{(3) } \mathbb{E}\left( \mathbb{E}(Y|X) \right) = \mathbb{E}(Y) &\text{Iterated Expectation, or Adam's Law} \\\\ &\text{(4) } \mathbb{E}\left( (Y - \mathbb{E}(Y|X) \, h(X) \right) = 0 &\text{residual is uncorrelated with } h(X) \\\\ &\text{(5) } \operatorname{Var}(Y) = \mathbb{E}\left( \operatorname{Var}(Y|X) \right) + \operatorname{Va}r\left( \mathbb{E}(Y|X) \right) &\text{EVvE's Law} \\\ \end{align}Here is a pictorial explanation to aid your intuition.
A vector could be anything (point, function, cow); as long as it follows the axioms of vector space, then anything can be treated as a vector.
So let us show that the residual $Y - \mathbb{E}(Y|X)$ is uncorrelated with any function $h(X)$:
\begin{align} \operatorname{Cov}\left( Y - \mathbb{E}(Y|X) , h(X) \right) &= \mathbb{E}\left( (Y - \mathbb{E}(Y|X)) \, h(X) \right) - \mathbb{E}\left(Y-\mathbb{E}(Y|X)\right) \, \mathbb{E}\left(h(X)\right) \\ &= \mathbb{E}\left( (Y - \mathbb{E}(Y|X)) \, h(X) \right) - \left[\mathbb{E}(Y) - \mathbb{E}(Y) \right] \, \mathbb{E}\left(h(X)\right) &\text{ linearity, Adam's Law} \\ &= \mathbb{E}\left( (Y - \mathbb{E}(Y|X)) \, h(X) \right) - 0 \\ &= \mathbb{E}\left( (Y - \mathbb{E}(Y|X)) \, h(X) \right)\\ &= \mathbb{E}\left( Y \, h(X) \right) - \mathbb{E}\left( \mathbb{E}(Y|X) \, h(X) \right) \\ &= \mathbb{E}\left( Y \, h(X) \right) - \mathbb{E} \left( \mathbb{E}(Y \, h(X))|X) \right) & \text{if we can take out, we can put back} \\ &= \mathbb{E}\left( Y \, h(X) \right) - \mathbb{E}\left( Y \, h(X) \right) & \text{Adam's Law} \\ &= 0 &\quad \blacksquare \end{align}And so the residual $Y - \mathbb{E}(Y|X)$ is indeed uncorrelated with any function $h(X)$
Returning to Property 3, let's do the discrete case (but the continuous case is analogous).
Since $\mathbb{E}(Y|X)$ is just a function of $X$, we can call it by another name, say $g(X)$.
\begin{align} \mathbb{E}\left( \mathbb{E}(Y|X) \right) &= \mathbb{E}\left( g(X) \right) \\ &= \sum_x g(x) \, P(X=x) &\text{by LOTUS, definition} \\ &= \sum_x \mathbb{E}(Y|X=x) \, P(X=x) \\ &= \sum_x \left[ \sum_y y \, P(Y=y|X=x) \right] P(X=x) \\ &= \sum_y \sum_x y \, P(Y=y, X=x) \\ &= \sum_y y \sum_x P(Y=y, X=x) \\ &= \sum_y y P(Y=y) \\ &= \mathbb{E}(Y) &\quad \blacksquare \end{align}Conditional variance is defined thusly:
\begin{align} \operatorname{Var}(Y|X) &= \mathbb{E}(Y^2|X) - \mathbb{E}(Y|X)^2 &\text{or alternately} \\\\ &= \mathbb{E}\left[ (Y - \mathbb{E}(Y|X))^2 | X \right] \end{align}Let $g(X) = \mathbb{E}(Y|X)$; this will make things a bit clearer.
\begin{align} \operatorname{Var}(Y|X) &= \mathbb{E}\left[ (Y - \mathbb{E}g(X))^2 | X \right] \\ &= \mathbb{E}\left[ Y^2 - 2Y \, g(X) + g(X)^2 | X \right] \\ &= \mathbb{E}(Y^2|X) - 2\mathbb{E}(Y\,g(X)|X) + \mathbb{E}(g(X)^2 | X) \\ &= \mathbb{E}(Y^2|X) - 2 \, g(X) \, \mathbb{E}(Y|X) + \mathbb{E}(g(X)^2 | X) \\ &= \mathbb{E}(Y^2|X) - 2 \, g(X) \, g(X) + g(X)^2 \\ &= \mathbb{E}(Y^2|X) - 2 \, g(X)^2 + g(X)^2 \\ &= \mathbb{E}(Y^2|X) - g(X)^2 \\ &= \mathbb{E}(Y^2|X) - \mathbb{E}(Y|X)^2 &\quad \blacksquare \\ \end{align}EVvE's Law states that $\operatorname{Var}(Y) = \mathbb{E}\left( \operatorname{Var}(Y|X) \right) + \operatorname{Var}\left( \mathbb{E}(Y|X) \right)$.
Graphically, conditional variance deals with both the variance within a sub-groups $\mathbb{E}\left( \operatorname{Var}(Y|X) \right)$, and the variance amongst the groups $\operatorname{Var}\left( \mathbb{E}(Y|X) \right)$.
In order to prove EVvE's Law, we will do the following to make things simpler:
Then:
\begin{align} \mathbb{E}\left( \operatorname{Var}(Y|X) \right) &= \mathbb{E}\left[ \mathbb{E}(Y^2|X) - (\mathbb{E}(Y|X))^2 \right] &\text{ for the first part} \\ &= \mathbb{E}(Y^2) - \mathbb{E}(g(X))^2 \\ \\ Var\left( \mathbb{E}(Y|X) \right) &= \operatorname{Var}(g(X)) &\text{ for the second part} \\ &= \mathbb{E}(g(X))^2 - (\mathbb{E}(g(X))^2 \\ \\ \operatorname{Var}(Y) &= \mathbb{E}(Y^2) - \mathbb{E}\left(g(X)\right)^2 + \mathbb{E}(g(X))^2 - (\mathbb{E}\left(g(X)\right)^2 \\ &= \mathbb{E}(Y^2) - (\mathbb{E}( \mathbb{E}(Y|X) ))^2 \\ &= \mathbb{E}(Y^2) - (\mathbb{E}(Y))^2 &\quad \blacksquare \end{align}Suppose we are studying infectious disease in a certain state. Due to circumstances (lack of resources and/or time), rather than taking samples across the state, we will randomly select a city and study a random sample of $n$ people there.
Let $X$ be the number of infected people in the sample.
Let $Q$ be the proportion of infected people in the randomly selected city. Keep in mind that different cities will have different proportions, hence $Q$ is a random variable.
Find $\mathbb{E}(X)$ and $\operatorname{Var}(X)$.
But to do this, we need to make an assumption about the distribution of $Q$. Given its flexibility, computational convenience and the fact that it is the conjugate prior to the binomial distribution, we will assume $Q \sim \operatorname{Beta}(a,b)$.
It should be clear then that we are assuming that $X|Q \sim \operatorname{Bin}(n, Q)$. A hypergeometric might also work, but since $n$ is probably small compared to the total population size, and since we are sampling without replacement, we can choose to use the binomial along with the Beta distribution.
Remember that conditioning is the soul of statistics, and so we will condition on the proportion of infection $Q$ of our randomly selected city.
Thinking conditionally, we have:
\begin{align} \mathbb{E}(X) &= \mathbb{E}\left( \mathbb{E}(X|Q) \right) \\ &= \mathbb{E}( nQ) &\text{expected value of }\operatorname{Bin}(n,Q) \\ &= n \, \mathbb{E}(Q) \\ &= n \, \frac{a}{a+b} &\text{expected value of }\operatorname{Beta}(a,b) \end{align}Again thinking conditionally, we have:
\begin{align} \operatorname{Var}(X) &= \mathbb{E}\left( \operatorname{Var}(X|Q) \right) + \operatorname{Var}\left( \mathbb{E}(X|Q) \right) &\text{ by EVvE's Law} \\ \\ \mathbb{E}\left( \operatorname{Var}(X|Q) \right) &= \mathbb{E}\left( n \, Q \, (1-Q) \right) &\text{ for the first part} \\ &= n \, \mathbb{E}\left( Q \, (1-Q) \right) \\ &= n \, \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \, \int_{0}^{1} q \, (1-q) \, q^{a-1} \, (1-q)^{b-1} \, dq &\text{LOTUS} \\ &= n \, \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \, \int_{0}^{1} q^{a} \, (1-q)^{b} \, dq \\ &= n \, \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \, \frac{\Gamma(a+1)\Gamma(b+1)}{\Gamma(a+b+2)} &\text{that is another }Beta \\ &= n \, \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)} \, \frac{a\Gamma(a)b\Gamma(b)}{(a+b+1)(a+b)\Gamma(a+b)} \\ &= \frac{n \, a \, b}{(a+b+1)(a+b)} \\ \\ Var\left( \mathbb{E}(X|Q) \right) &= Var(n \, Q) &\text{ for the second part} \\ &= n^2 \, \operatorname{Var}(Q) \\ &= n^2 \, \frac{\mu(1-\mu)}{a+b+1} &\text{where } \mu = \frac{a}{a+b} \\ \end{align}View Lecture 27: Conditional Expectation given an R.V. | Statistics 110 on YouTube.