In mathematical statisticsthe Kullback—Leibler divergence also called relative entropy is a measure of how one probability distribution is different from a second, reference probability distribution. In contrast to variation of informationit is a distribution-wise asymmetric measure and thus does not qualify as a statistical metric of spread - it also does not satisfy the triangle inequality. In the simple case, a Kullback—Leibler divergence of 0 indicates that the two distributions in question are identical.

In simplified terms, it is a measure of surprise, with diverse applications such as applied statistics, fluid mechanicsneuroscience and machine learning. The Kullback—Leibler divergence was introduced by Solomon Kullback and Richard Leibler in as the directed divergence between two distributions; Kullback preferred the term discrimination information. Equivalently by the chain rulethis can be written as. Most formulas involving the Kullback—Leibler divergence hold regardless of the base of the logarithm.

Kullback [2] gives the following example Table 2. The Kullback—Leibler divergence is a special case of a broader class of statistical divergences called f -divergences as well as the class of Bregman divergences.

It is the only such divergence over probabilities that is a member of both classes. Although it is often intuited as a way of measuring the distance between probability distributionsthe Kullback—Leibler divergence is not a true metric. However, its infinitesimal form, specifically its Hessiangives a metric tensor known as the Fisher information metric. Arthur Hobson proved that the Kullback—Leibler divergence is the only measure of difference between probability distributions that satisfies some desired properties, which are the canonical extension to those appearing in a commonly used characterization of entropy.

Unfortunately it still isn't symmetric. There is a relation between the Kullback—Leibler divergence and the " rate function " in the theory of large deviations. The logarithm in the last term must be taken to base e since all terms apart from the last are base- e logarithms of expressions that are either factors of the density function or otherwise arise naturally.

The equation therefore gives a result measured in nats. A special case, and a common quantity in variational inferenceis the KL-divergence between a diagonal multivariate normal, and a standard normal distribution with zero mean and unit variance :. Even so, being a premetricit generates a topology on the space of probability distributions. Pinsker's inequality entails that. The Kullback—Leibler divergence is directly related to the Fisher information metric.

This can be made explicit as follows. Specifically, up to first order one has using the Einstein summation convention. More formally, as for any minimum, the first derivatives of the divergence vanish. Another information-theoretic metric is Variation of informationwhich is roughly a symmetrization of conditional entropy.

It is a metric on the set of partitions of a discrete probability space. Many of the other quantities of information theory can be interpreted as applications of the Kullback—Leibler divergence to specific cases.

The self-informationalso known as the information content of a signal, random variable, or event is defined as the negative logarithm of the probability of the given outcome occurring. When applied to a discrete random variablethe self-information can be represented as [ citation needed ]. The mutual information[ citation needed ].

The Shannon entropy[ citation needed ].We recall and extend some sufficient conditions for the convex comparison of martingale measures in a one period setting, based on the elasticity of the pricing kernel. We show that the minimal entropy martingale measure MEMM and the Esscher martingale measure are comparable in the convex order, and which one is dominating depends on the sign of the risk premium on the underlying.

If it is positive, then the MEMM gives a lower price to each convex payoff. We show how the comparison result can be extended to the multiperiod i. The relationship between stochastic orders and option prices is well established in the financial literature, at least starting with the seminal paper [ 14 ].

Convex ordering of martingale measures is underlying many papers related to the comparison of option prices under different models, such as [ 2310 ] in continuous time and [ 51719 ] in discrete time, where the same techniques are applied for computing option pricing bounds, which are provided by extremal martingale measures. The purpose of this note is to show that in a one period setting the minimal entropy martingale measure MEMM henceforth, introduced in [ 8 ] and the Esscher martingale measure in the original sense of [ 9 ] are always comparable in the convex order, and which one is dominating depends on the sign of the risk premium of the underlying; in the typical case of a positive risk premium the MEMM gives lower prices to each convex payoff in particular, it gives lower prices to each plain vanilla call and put.

This result can be easily extended to the multiperiod i. In Section 2 we briefly review and generalize some sufficient conditions for the convex ordering of martingale measures; in Section 3 we state our main result on the comparison of the Esscher and the MEMM measures; finally in Section 4 we show how the comparison can be generalized in the multiperiod i.

This criterion is quite useful in the applications since in the Black-Scholes case the pricing kernel has constant elasticity. In the following theorem we generalize this result and state several similar sufficient conditions for the convex ordering of densities:. Hence all pricing kernels have to cut in at least two points; each sufficient condition excludes that there are more than two cuts. The solutions of 6 and 7 may in general not exist; however we can state the following:.

Lemma 1. Proposition 1. Assuming that the solutions h 1 and h 2 of 6 and 7 exist, they are both strictly increasing as a function of the riskfree logreturn r. This enables us to state our main comparison result:.

Theorem 2. In this case the analytical expressions of the densities are identical to the one period case with Nh 1 and Nh 2 replacing h 1 and h 2and hence the two measures are ordered as in the thesis of Theorem 2.

If we try to remove the hypothesis of independent logreturns, the problem is that the representation 8 does not hold anymore; it is no more true that the density of the MEMM is a product of one period entropy minimizing densities.

In contrast, the multiperiod Esscher measure is always by construction defined as a product of one period Esscher densities. We can however state a positive general result that shows that if two densities are defined as products of one period densities, then convex ordering of the one period factors implies the convex ordering of product densities:.

This Lemma shows that in the general, non i. Skip to main content Skip to sections. This service is more advanced with JavaScript available. Advertisement Hide. Convex ordering of Esscher and minimal entropy martingale measures for discrete time models. Download chapter PDF. In [ 7 ] the authors also provide another sufficient condition based on the elasticities of the pricing kernels, defined as Open image in new window.By using our site, you acknowledge that you have read and understand our Cookie PolicyPrivacy Policyand our Terms of Service.

Mathematics Stack Exchange is a question and answer site for people studying math at any level and professionals in related fields. It only takes a minute to sign up. That is, what extra assumption do we need to make or is there another result that gives the below?

Please give full details and steps. Then the expectation of this process is infinite at any nonzero time - just compute it explicitly using LOTUS.

Or, since you are using discrete time, you can use the random walk with iid normal 0,1 steps as the martingale, and the same convex function. By the way - you don't need the bound on L1 to be uniform in time. For example, you definitely want a simple random walk to be a martingale, but the L1 norms are increasing as time goes on. Not sure if that's what you intended or not. Sign up to join this community.

Convex ordering of Esscher and minimal entropy martingale measures for discrete time models

