# Talk:Chain Rule for Probability

So what's your point? --prime mover (talk) 22:09, 2 September 2022 (UTC)

I asked if this is mathematics. Do you really accept this proof? For me:
$\condprob A B = \dfrac {\map \Pr {A \cap B} } {\map \Pr B}$

is the definition, and what I read here is an informal justification e.g. for kids.

Why should we not cater for "kids", as you contemptuously call them? --prime mover (talk) 21:28, 3 September 2022 (UTC)
I tried to say this formulation lacks the mathematical formality. But this style may have its own sense. It looks so casual because it starts with the axiomatic framework $\struct {\Omega, \Sigma, \Pr}$ but the proof does not rely on the axioms at all. Some refactoring is required. --Usagiop (talk) 22:21, 3 September 2022 (UTC)

As much as I appreciate the work put down in writing this theorem, I have to agree with Usagiop. This is not a strict mathematical proof, it is a justification of a definition. My sources give:

2001: Pierre Brémaud, Markov Chains, §2.2:
$\condprob A B \stackrel{ \mathrm{def} }{ = } \dfrac{ \map \Pr { A \cap B } }{ \map \Pr B }$

2000: Michael Sørensen, En introduktion til sandsynlighedsregning, §1.4:
The conditional probability of $B$ given $A$ ... is defined by:
$\condprob B A = \dfrac{ \map \Pr { A \cap B } }{ \map \Pr A }$

I think that Definition:Conditional Probability should be changed to give $\condprob A B = \dfrac{ \map \Pr { A \cap B } }{ \map \Pr B }$ as the definition. Then explan that the conditional probability is undefined when $\map \Pr B = 0$, for obvious reasons. --Anghel (talk) 12:16, 3 September 2022 (UTC)
Compromise. How about 2 definitions with an equivalence proof? --prime mover (talk) 21:29, 3 September 2022 (UTC)
Maybe we need to distinguish the modern probability theory (founded by Kolmogorov) from ancient probability theories like some in the greek mathematics. I guess, those definitions and proofs were accepted before the foundation of the modern probability theory.--Usagiop (talk) 17:35, 3 September 2022 (UTC)
Maybe is this style related to Cox's theorem?--Usagiop (talk) 18:49, 3 September 2022 (UTC)
As far as I know, the argument given here is pretty close to Kolmogorov's rationale for defining $\condprob A B$. It is not ancient theory, the sources given are from 1986, 1988 and 2014. But all sources I know of define $\condprob A B$ directly ( or accept is as an axiom of probability).
My earlier comment was perhaps too harsh. I still think Definition:Conditional Probability should be updated. Maybe we should give two definitions, one that accepts the definition as an axiom, and one that explains it like here. --Anghel (talk) 20:45, 3 September 2022 (UTC)
Yes, I agree that giving two definitions are the best. This presumably older definition is also interesting. We need to separate the frameworks, too. The problem here is mixing two different frameworks. Explicitly:
Let $\EE$ be an experiment with probability space $\struct {\Omega, \Sigma, \Pr}$.
Let $A, B \in \Sigma$ be events of $\EE$ such that $\map \Pr B > 0$.
Those are a mixture. It should be in the axiomatic style:
Let $\struct {\Omega, \Sigma, \Pr}$ be probability space.
Let $A, B \in \Sigma$ such that $\map \Pr B > 0$.
and in the older style:
Let $\EE$ be an experiment.
Let $A, B$ be events of $\EE$ such that $\map \Pr B > 0$.

--Usagiop (talk) 22:11, 3 September 2022 (UTC)

I want to define as Definition 1: $\map \Pr {A \mid B} = \expect {1_A \mid \map \sigma B} = \expect {1_A \mid \set {\O, B, B^c, \Omega} }$. This chain rule falls out by direct consequence of Conditional Expectation Conditioned on Event of Non-Zero Probability by taking $X = 1_A$. We can then give this chain rule as Definition 2. Any objections to this? I believe this would be the "modern" definition, and allows you to compute conditional probabilities with respect to general $\sigma$-algebras too. Caliburn (talk) 13:33, 1 December 2022 (UTC)

Though this introduces a technicality: $\map \Pr {A \mid B}$ is then a random variable (!!!!), but it is almost surely constant so can be treated as a constant without much issue. Caliburn (talk) 13:37, 1 December 2022 (UTC)
OK but then Definition 1 and Definition 2 are not equivalent. --Usagiop (talk) 14:09, 1 December 2022 (UTC)
Why? There would be one part of the page for "Conditioned on Sigma-Algebra" and then "Conditioned on Event". You could treat the problem about technically having a random variable that is almost surely constant with a bit more care, but I don't see a huge roadblock there. Caliburn (talk) 14:15, 1 December 2022 (UTC)
I think you should try. By the way, $\expect {1_A \mid \map \sigma B}$ isn't almost surely constant. I is only constant on $B$ with that value, isn't it?--Usagiop (talk) 18:25, 1 December 2022 (UTC)
I mean:
$\map {\expect {1_A \mid \map \sigma B} } \omega = \begin{cases} \dfrac {\map \Pr {A \cap B} } {\map \Pr B} & : \omega \in B \\ \dfrac {\map \Pr {A \cap B^c} } {\map \Pr {B^c} } & : \omega \in B^c \end{cases}$
--Usagiop (talk) 18:31, 1 December 2022 (UTC)
Oh yes I see, they are two different things. One of them gives the probability of $A$ given that we know whether $B$ happens or not, and the other is the probability of $B$ given $B$ does happen. I need to find a source for this anyway, I'm not sure where to find a proper treatment. Caliburn (talk) 18:45, 1 December 2022 (UTC)