May 30, 2023
It occurs to me that the probability bounds most commonly used in both randomized algorithms as well as in quantitative analysis are few in number. I briefly summarize their intended usage here.
Take the principle of inclusion-exclusion for the set of events . We have
For a non-negatively valued distribution, take any integer . Should we collapse all the mass above into , and disregard the mass below into , we would have expected value . Necessarily, this is less than the original expected value due to the shifting and removal of masses. Hence,
And thus from the Markov bound we have upper-bounds on the tails of any distribution, simply by taking its square to make it non-negative. Notably, we will want to recenter the distribution first for ease-of-computation.
For some function whose second-derivative is monotone—i.e. it is either convex or concave—we have for some two points on the function, any point in between must always lie below or above the curve, depending on if is convex or concave.
For concave function , we have that every point on a line between two points on must lie below . Thus,
This is expectation over any distribution—and thus covers any point on the line between the two points.
Notably, the sample standard deviation is seen on the LHS—and necessarily has non-negative bias.
Similar to the Chebyshev, the Chernoff bounds provide a series of exponential bounds for distribution tails. These are derived from the Markov bound on the moment generating functions, noting that they are monotone with respect to the exponent .
Let us then recall the moment generating function of a distribution :
We note that is monotonic in and so we have the symmetric bounds
Then, as mentioned, we apply the Markov bounds
And now we must become creative with choosing any for these inequalities.
The Cauchy-Schwarz inequality states that for vectors , inner product and norms denoted by the double-bar we have
The bars on the LHS are a little unclear to me at the moment. However, taking the expectation of both sides and moving the root leads to an inequality useful in probabilities:
Amplification is a trick similar to square-root decomposition, to balance out the sides of an imbalanced equality (or inequality in this case). We begin with the trivial:
where the LHS is close to the LHS of Cauchy-Schwarz, but the RHS still has some ways to go. Instead, we take some and modify :
It remains to minimize the RHS to strengthen the inequality, which we do by taking its derivative w.r.t. :
There is further depth to explore here (obviously), but that is beyond the scope of this note.