Logical Entropy and Negative Probabilities in Quantum Mechanics

The concept of Logical Entropy, $S_L = 1- \sum_{i=1}^n p_i^2$, where the $p_i$ are normalized probabilities, was introduced by David Ellerman in a series of recent papers. Although the mathematical formula itself is not new, Ellerman provided a sound probabilistic interpretation of $S_L$ as a measure of the distinctions of a partition on a given set. The same formula comes across as a useful definition of entropy in quantum mechanics, where it is linked to the notion of purity of a quantum state. The quadratic form of the logical entropy lends itself to a generalization of the probabilities that include negative values, an idea that goes back to Feynman and Wigner. Here, we analyze and reinterpret negative probabilities in the light of the concept of logical entropy. Several intriguing quantum-like properties of the logical entropy are derived and discussed in finite dimensional spaces. For infinite-dimensional spaces (continuum), we show that, under the sole hypothesis that the logical entropy and the total probability are preserved in time, one obtains an evolution equation for the probability density that is basically identical to the quantum evolution of the Wigner function in phase space, at least when one considers only the momentum variable. This result suggest that the logical entropy plays a profound role in establishing the peculiar rules of quantum physics.


Introduction
As its title suggests, this work sits at the crossroad of three different topics: (a) an alternative definition of entropy, (b) the extension of standard probabilities to negative values, and (c) the relevance of the first two items to our understanding of quantum mechanics.Here, we will introduce each topic separately, before bringing them together in the following sections.

Logical entropy
"Logical entropy" is a concept introduced by David Ellerman in a series of works spanning the last decade [1,2]; see also Ellerman's paper in this Special Issue.Succinctly, logical entropy is based on the concept of distinctions.If a certain set U is partitioned into a number n of subsets B i (such that [ n i¼1 B i ¼ U ), each endowed with a probability p i of finding an element of U in that subset, then the probability that in two independent draws one will obtain elements in distinct subsets B i and B j6 ¼i is: p i (1 À p i ).This is precisely the concept of distinction, i.e., the ability establish that two independent draws are different from one another.
Summing over all n subsets, we obtain the total number of distinctions, which is the definition of the logical entropy S L : where we used the fact that P i p i = 1.The subsets B i can possibly contain one single element, in which case S L represents the probability that two consecutive draws yield different elements of U. In this work, we will mainly consider this case, unless otherwise stated.It is clear that 0 S L 1.The lower bound is reached when one element has probability p i = 1, while for all others p j6 ¼i = 0.For equal probabilities (p i ¼ 1  n , "i), one gets: S L ¼ 1 À 1 n ! 1, when n ? 1.
Following [3,4], one can also define the information I as the complement of the entropy to unity: This quantity reflects the knowledge we have of the state of a physical system, being maximum when we know its state with certainty, and minimum when all states are equally probable. 1The information I has the nice property of being the square of a norm in R n , actually the Euclidean norm.This connection to Euclidean geometry allows one to use standard geometrical concepts when making use of the logical entropy.For instance, one can define the scalar product: p Á q ¼ P n i¼1 p i q i between two probability distributions {p i }, {q i }, and their Euclidean distance d(p, q) as: Of course, the logical entropy definition (1) implies very different properties from the standard Shannon-Von Neumann entropy In particular S VN is additive, while S L is not, at least not in the standard fashion, see [4,5].For a system known with certainty, both entropies yield, S VN = S L = 0, but for maximal uncertainty S VN = logn, whereas S L ¼ 1 À 1 n .Again we emphasize that, in contrast to the Shannon-Von Neumann entropy, the logical entropy S L represents both a probability (of obtaining different results in two consecutive draws, as mentioned above) and a norm in the Euclidean space R n .These facts have important consequences, as we will see in the next section.
Although Ellerman [1,2] provided a solid and fruitful probabilistic interpretation of this definition of entropy, the formulae (1) and ( 2) are not new.Quite the contrary, they have been discovered and rediscovered many times in the past, in very different areas of research.In biology and ecology, S L is known as the Gini-Simpson index [6][7][8], which quantifies the diversity of species in an ecosystem.It was used by Polish mathematicians (and then by Alan Turing himself) to find patterns in messages generated by the Enigma machine during World War 2 [9].In statistical mechanics, S L is a special case of the Tsallis entropy [10] with index q = 2.In quantum physics, a version of S L was used to quantify our knowledge of the state of a quantum system [3,11].It was also shown to be particularly adapted to the Wigner phase-space representation of quantum mechanics [4].

Negative probabilities
The very definitions of S L and I lend themselves to the natural generalization whereby the probabilities p i can take negative values.This is in analogy with vectors in R n , which can indeed have negative components, although their norm remains positive.
Negative probabilities have a long history of interest, especially among physicists struggling to make sense of some of the weird properties of quantum mechanics.Feynman [12] was one of the first to ponder the meaning of negative probabilities in a quantum context (although he published his ideas in 1987 in a volume in honor of David Bohm, he states there that he developed these reflections some twenty years earlier).For Feynman, negative probabilities should be considered as a useful bookkeeping tool just like negative numbers. 2 As an example, he mentions a man starting a day with five apples, giving away ten at midday and earning eight in the evening.The initial (5) and final (3) numbers of apples owned by the man are both positive and thus unambiguous to interpret.But if we take the numbers at face value, the man will have À5 apples some time in the afternoon, which does not quite make sense unless we postulate that one is allowed to count the number of apples only in the morning and in the evening, but not in the middle of the day.Hence, negative probabilities are allowed as long as they intervene in contexts where they cannot be observed directly.All this is reminiscent of the limitations on measuring some quantities, which are intrinsic to quantum physics [13,14].
Of course, negative probabilities had appeared in quantum mechanics even earlier, when Wigner [15] introduced his celebrated pseudo-probability distribution in the classical phase space ("Wigner function"), which almost always takes negative values.Indeed, the negativity of a Wigner function can be used as a tool to quantify the degree of quantumness of a particular state, as was done even experimentally [16].
Negative probabilities have also been studied in a fundamental mathematical context [17][18][19][20] and for applications to financial modeling [21].A thorough, if not very recent, review on the topic of negative probabilities in physics was published in 1986 [22], and contains quotations from several eminent scientists on this somewhat controversioal problem.

Quantum mechanics
The earliest relationship between negative probabilities and quantum mechanics dates back to Wigner [15], who in 1932 introduced a pseudo-probability distribution in the phase space (x, p) which possesses many of the properties of classical probability distributions (for instance, it can be used to compute averages using the classical formula), except non-negativity.The Wigner function w(x, p, t) can describe both pure and mixed quantum states and evolves in time according to an integro-differential equation similar to the classical Liouville equation.Wigner functions have proven exceedingly useful in a variety of domains, ranging from condensed matter and nanophysics, to quantum plasmas and quantum optics (see [23] for a review).
The Wigner equation conserves in time not only the total probability R R w(x, p, t)dx dp, but also the integral of the square of the Wigner function: R R w 2 (x, p, t)dx dp.Note that higher powers R R w r dx dp, with r > 2, are not conserved, in contrast to the classical Liouville equation, for which the conservation property is valid for any value of r.Some time ago, the present author suggested that one uses as the definitions of entropy and information [4], where h is Planck's constant (this is necessary to render the integral term in the above expression non-dimensional).Equation ( 5) can be viewed as the continuous counterpart of equation ( 1), i.e., its extension to an infinite dimensional space.Also note that the logical entropy can be expressed in terms of the trace of the density operator, as More recently, negative probabilities have been explored in various quantum mechanical contexts, such as indistiguishability [24], quantum computation [25], and contextuality [26].Besides, an operational interpretation of negative probabilities has been proposed by Abramsky and Brandenburger [27,28].In [28], they propose a simple scenario to illustrate pedagogically the use of negative probabilities in quantum mechanics, by considering a system comprising two-bit registers.
The rest of this work is devoted to the study of the properties of the logical entropy (1) and information (2) when one relaxes the requirement that p i !0, "i.It will be claimed that the logical entropy constitutes the natural framework for the introduction of negative probabilities.Interestingly, by combining the definition of logical entropy with negative probabilities, one can recover many properties that are typical of quantum systems.
The main result obtained here is that, simply by requiring the logical entropy to be conserved in time, one obtains an evolution equation for the probability density that is virtually identical to the evolution equation of the Wigner function in physics, at least when one considers only the momentum variable.This remarkable result suggest that the logical entropy plays a profound role in establishing the peculiar rules of quantum physics.

Finite-dimensional spaces
We consider a set of n outcomes, each endowed with probability p i .The probabilities satisfy where 0 R 1. Then the logical entropy and information are, respectively, S L = 1 À R 2 and I = R 2 .The number R can be interpreted as the Euclidean norm of the vector p = (p 1 , . . .p n ) in R n : ||p|| = R. Geometrically, equations ( 6) and ( 7) represent respectively a hyperplane and a hypersphere of radius R in R n , and their intersection yields the probability distributions {p i } satisfying those equations.
In analogy with quantum physics, we shall call pure states the probability distributions for which R = 1 (corresponding to maximum information and minimum entropy) and mixed states those for which R < 1.Indeed, Wigner functions for pure and mixed quantum states satisfy precisely these properties, when the entropy is defined as in equation ( 5).If we request all probabilities to be nonnegative, then the only pure states are those for which p i = 1 and p j6 ¼i = 0, that is, the ith outcome can be predicted with certainty.However, if we admit negative probabilities, there exist other pure states with some p i < 0 which still satisfy equations ( 6) and ( 7) with R = 1.
To dissipate all ambiguities, here we are not dealing with "probability amplitudes" as in quantum mechanics.Probability amplitudes are complex quantities, while our p i s are real numbers, albeit potentially negative.Our approach is the same as the one based on Wigner functions (also real quantities), which represent quantum states with real, but signed, numbers.
In the rest of the present section, we will focus on the cases n = 2, which is trivial and does not admit negative probabilities, and n = 3, which is much richer.The infinite-dimensional case will be treated in Section 3.

General properties for n = 2 and n = 3
For n = 2, the solution is given by the intersection of the straight line and the circle shown in Figure 1.It is clear that for R 1, only positive solutions are allowed.Solving equations ( 6) and (7) yields No solutions exist for R < ffiffi ffi 2 p =2 (dashed straight line tangent to the circle).For this value of R, one obtains p 1 = p 2 = 1/2, which is the maximally mixed state (with largest entropy S L = 1/2).
The case n = 3 is depicted schematically in Figure 2a for the special case R = 1 (pure states).It is evident that there are three pure states with nonnegative probabilities: (1, 0, 0), (0, 1, 0) and (0, 0, 1), which represent certainty for one of the three possible outcomes.These states form an orthonormal basis which we denote by e i .However, there exist an infinity of other pure states with negative probabilities.These are the states that lie on the circle given by the intersection between the sphere of radius R and the plane p defined by the three vectors e i .Actually, all pure states, except e 1 , e 2 and e 3 , feature some negative probabilities.A simple example is the state: p = (2/3, 2/3, À1/3).
A view of the plane p is shown in Figure 2b.The circles represent the intersections of the plane and the sphere, for different values of the radius R. Points that lie outside the equilateral triangle ABC (with sides a ¼ ffiffi ffi 2 p ) have negative probabilities.The circumscribed circle, corresponding to R = 1, has radius r e ¼ a= ffiffi ffi . For information I = R 2 smaller than unity, i.e. for mixed states, there are some positive and negative solutions (thin red circle in Fig. 2b).Further decreasing R, we reach the situation of the inner circle of radius r i ¼ r e =2 ¼ ffiffi ffi 6 p =6, for which all probabilities are positive.To determine the value of R corresponding to r i , we consider the cone of vertex O and base radius r i (see Fig. 2c).The height k of the cone is the distance between the origin O and the plane p, which turns out to be . From this, we deduce that the radius R corresponding to the inner circle in Figure 2b , the sphere is tangent to the plane p, and the only solution is p 1 = p 2 = p 3 = 1/3, corresponding to maximum entropy S L = 2/3.For smaller R, there are no solutions.
In summary, defining the radii , we obtain that: For R pos < R R max , there exist some negative-probability solutions; For R min R R pos , there exist only positive-probability solutions; For R = R min : maximum entropy solution p 1 = p 2 = p 3 = 1/3; For R < R min , there exist no solutions.
The above considerations can be easily extended to n > 3, yielding: , with maximum entropy solution: p i = 1/n, "i.We note that for large n, one has R min % R pos $ 1= ffiffi ffi n p .Therefore, almost all existing solutions will display some negative values.6) is represented by the dashed straight line, while the entropy constraint ( 7) is represented by the blue quarter circle of radius R. Solutions are given by their intersections.
Finally, we stress that, from this simple example with n = 3, negative probabilities arise very naturally if the p i are requested to satisfy the two equations ( 6) and (7), which fix the total probability and total entropy (or information) of the system.Indeed, for a mixed state such as described by the thin red circle in Figure 2b, it would be odd to retain only the positive-probability solutions (inside the triangle) and discard the negative ones (outside the triangle).Hence, the entropy definition (7) calls for the acceptance of negative probabilities on the same footing as positive ones.

Maximization with constraints
We would like to maximize the entropy S L (minimize the information) with a constraint.This is analogous to the statistical mechanics problem of finding the equilibrium probability distribution that maximizes entropy for given energy, which yields the Maxwellian distribution if one uses the Shannon-Von Neumann entropy.Let us call X our constraint, which has the mean value m hXi = P i p i X i .The functional F to be minimized is given by the information I augmented by two constraints on the total probability and the average of X: where k and l are Lagrange multipliers.Setting the variation of F to zero, i.e.: one gets The Lagrange multipliers are determined by using the constraints: P i p i = 1 and As an example, we take again n = 3 and X i = (À1, 0, 1).This choice yields k = 2/3, l = Àm, and the "equilibrium" probability distribution:

The total information is
As it must be smaller or equal to unity, we have a constraint on the maximum mean value allowed for the variable X: m 2 ffiffi 3 p m max % 1:15.For m = m max , we obtain the pure state for which p 1 < 0. Indeed, p 1 is negative whenever 2/3 < m < m max .For m < 2/3 all probabilities become positive and for m = 0 we recover the maximally mixed state with all probabilities equal to 1/3.Similar considerations apply for the symmetric cases with negative m.
The above situation can be viewed as that of a die with three faces.For m = 0 the die is even, and all faces are equally probable.Hence, |m| may be interpreted as an index of unevenness of the die.Classically, i.e.only allowing positive probabilities, the most uneven die is obtained for m = 2/3, yielding the state p = (0, 1/3, 2/3) (for m = À2/3 the roles of p 1 and p 3 are interchanged), which has information I = 5/9.But if we admit negative probabilities, m can be increased up to m max ¼ 2 ffiffi 3 p , which gives the state of equation (11), with information I = 1.

Interpretation
The existence of negative probabilities induces some nonstandard properties that are reminiscent of the paradoxes encountered in quantum physics.For example, let us consider a pure state p with n = 3 and assimilate the three possible outcomes to the colors of marbles drawn from a bag: red (R), blue (B) and green (G).Like for all pure states, the probability to get the same color in two consecutive draws is I = 1, while the probability to get different colors is S L = 0. Let us suppose that we draw a number of marbles, but do not look at their colors for the moment (Fig. 3a).Then we look at the second and third marble and observe that they have the same color (as they should), namely red.Subsequently, we look at the sixth and seventh marble and notice they are both blue (Fig. 3b).
So far, all is in agreement with our expectations.But what would have happened if we had first looked at marbles number 3 and 6 (Fig. 3c)?According to the previous "experiment", they should be of different colors (red and blue), but this is not allowed by the probability distribution of a pure state.Hence, we should find that they have the same color, which is in contradiction with the experiment (b) on the figure.We are forced to conclude that the marbles do not have a predefined color prior to the observation, something that is typical for quantum objects [29][30][31].
As a second example, let us consider two probability distributions p ¼ 2 3 ; 2 3 ; À 1 3 À Á and q ¼ À 1 3 ; 2 3 ; 2 3 À Á , which we can be visualized as two different bags containing, respectively, red (R), blue (B) and green (G) marbles in different proportions.They are both pure states and orthogonal to each other, pÁq = P i p i q i = 0.The latter property means that the outcomes of the two bags are perfectly anticorrelated, i.e. if the outcome of the first bag is R then that of the second bag must be not R (denoted R).We draw pairs of marbles from each bag.From the second bag, the probability of drawing a pair of red marbles is: Prob q ðRRÞ ¼ q 2 1 ¼ 1

9
. Since the outcome of the first bag is perfectly anticorrelated with that of the second bag, this number should also represent the probability of not drawing a pair of red marbles from the first bag.However, if we compute the same probability using the distribution p of the first bag, we obtain: Prob p ðRRÞ ¼ , which is manifestly different.This example shows that the following two procedures are mutually exclusive: (i) drawing one marble from bag 1 and another from bag 2, which gives perfectly anticorrelated results; (ii) drawing two marbles from either bag, which yields perfectly correlated results.If two experimentalists draw a marble from each bag and then communicate their results, they always observe anticorrelation.However, once they have done so, they cannot use this knowledge to predict their next draw by using the correlation property of each bag, because the latter is valid only if pairs of marbles are observed together.
(Remember that the logical entropy quantifies distinctions between two draws, but says nothing about single draws.Indeed the outcome of a single draw is meaningless, as its probability can be negative; only pairs of consecutive draws are meaningful.)Similarly, if one experimenter observes BB in one bag and communicates this result to the second experimentalists, the latter cannot use it to predict that her next draw will be BB, because the anticorrelation property holds only as long as both elements of the draw are still unknown.

Dynamics
The probability distribution p(t) should evolve in a way that preserves both the total probability (of course) and the total information or entropy.In 3D this is possible only if the vector p performs a rotation around the axis perpendicular to the plane p and going through the origin O (see Fig. 2a).This can be viewed as a rotation around the vector Þ, which yields the evolution equation where Â denotes the standard 3D cross product.However, the representation using the vector product cannot be readily extended to dimensions n > 3, so it is more useful to write the above equation in matrix form: where M = {M ij } is the antisymmetric matrix satisfying M ij = ÀM ji and P i M i j = P j M ij = 0.The latter conditions guarantee that the total probability and the total information are indeed conserved during the evolution.
The above matrix form of the evolution equation ( 13) is readily adapted to higher dimensions, and will be generalized to infinite dimensional systems (continuum) in the next section.

Generalities
The logical entropy and information can be generalized to an infinite-dimensional system, i.e. in the continuum.
We define the probability density f(z), with z 2 R, normalized so that Then the logical entropy and the information are defined as follows [4]: where the constant h has the same dimensions as z, and f has the dimensions of h À1 .The so-defined information is basically the L 2 norm in the space of real square-integrable functions.
Given the arbitrariness of the constant h, it is not automatic that 0 S L 1: some very peaked functions of z may yield an entropy that is negative, or equivalently an information greater than unity.Hence, we require that 0 S L 1, and restrict the space of allowed probability densities to those whose entropy satisfies this condition.
A useful bound on f(z), which is reminiscent of the bound on Wigner functions [23], can be obtained as follows.Let us consider pure states (I = 1) and write This can be reformulated as Setting the integrand equal to zero yields: which is an integral equation for f(z).Finally, using the Cauchy-Schwartz inequality, we get from which we deduce the bound Obviously, the above bound limits the peakedness of f(z) for a given value of h.For a mixed state with information I < 1, the bound becomes: max jf j ¼ ffiffiffiffi ffi 2I p =h.For instance, if the probability density is a Gaussian with standard deviation r: f ðzÞ ¼ e Àz 2 =2r 2 =ð ffiffiffiffiffi ffi 2p p rÞ and we require that I = 1, we obtain This value yields exactly the maximum of equation (15), showing that the bound is saturated for a Gaussian distribution of unit information (pure state).For r > h 2 ffiffi p p , we have I < 1, i.e. a mixed state.All this is similar to a bound that can be obtained on the quantum Wigner function w(x, p) [23], where x and p are respectively position and momentum: max x,p |w(x, p)| = 2/h, where here h is Planck's constant.The additional factor ffiffi ffi 2 p is due to the fact that the maximization is done in the 2D phase space (x, p) instead of the 1D space (z) considered above.These considerations establish a suggestive link between the present results and the properties of quantum mechanics, on which we will further elaborate in the forthcoming subsections.

Dynamics
The time evolution of the probability density f(z, t) must preserve both the total probability and the entropy, hence it has to be a rotation in the appropriate functional space.In analogy with the finite-dimensional case, see equations ( 12) and ( 13), we write the general evolution equation for f(z, t) as where M must be antisymmetric: M(z, z 0 ) = ÀM(z 0 , z).In order to preserve the total probability in time, one should also have: R M(z, z 0 )dz = 0 = R M(z, z 0 )dz 0 , which follows immediately upon integrating (16) over z.Further, by multiplying equation ( 16) by f(z, t) and integrating, we obtain The last equality follows because the function u(z, z 0 ) f(z, t)M(z, z 0 )f(z 0 ,t) is such that uðz; z 0 Þ ¼ Àuðz 0 ; zÞ, hence it is odd with respect to the diagonal of the (z, z 0 ) plane, and integration over all such planes yields zero.The two-variable function M(z, z 0 ) can be conveniently written as M(z, z 0 ) = m(zz 0 ), where m(f) is a single-variable odd function: m(f) = Àm(Àf).The so-constructed M(z, z 0 ) satisfies all the properties mentioned in the preceding paragraph.Hence, we rewrite: We have included explicitly the constant h in the evolution equation for further comparison with Wigner functions.With this choice, m has the dimensions of an inverse time.
We now write m(f) in terms of its Fourier transform If mðkÞ ¼ ÀmðÀkÞ then it follows that m(f) is indeed an odd function.As the Fourier transform of an odd real function is purely imaginary, we also have that m(f) is real, as intended.Note that k is dimensionless.Let us now write the odd function mðkÞ as follows, without loss of generality: where a is a constant.Inserting all these definitions into the evolution equation ( 17), one gets In the next section we will show that this equation is basically identical to the quantum evolution equation of the Wigner function.
4 Relationship to quantum mechanics Equation ( 19) was built purely on the two assumptions that the total probability and the logical entropy should be conserved in time.It is therefore striking that this equation bears a close resemblance to the evolution equation for the Wigner function w in quantum mechanics [15,23], as will be discussed shortly.
The Wigner formalism is a representation of quantum mechanics in the classical position-momentum phase space (x, p), which is strictly equivalent to the more usual Schrödinger or Heisenberg pictures.The state of a quantum system, either pure or mixed, is defined by a real function w(x, p, t).The Wigner function is constructed from the wave function for a pure quantum state or from the density matrix for a mixed state.The Wigner function possesses many of the properties of standard probability distributions.For instance, it can be used to compute the average of a phase-space variable A(x, p) as: hAi = R R w(x, p)A(x, p)dx dp, where we have assumed the normalization R R w(x, p)dx dp = 1.However, w can take negative values, which precludes the possibility of interpreting it as a true probability density.
The Wigner function evolves in time according to an integro-differential equation that reads as: where V(x) is the potential energy.Interestingly, the above evolution equation preserves in time both R R wdx dp and R R w 2 dx dp, but not higher powers of w.This fact has motivated choosing the logical entropy as the natural definition of entropy in Wigner's quantum mechanics [4].Now, we consider a Wigner function concentrated near a position x = a and write: wðx; p; tÞ ¼ wðp; tÞdðx À aÞ, where d is the Dirac delta function.We also define X(x) 2pV(x)/h, which has the dimensions of an inverse time.Substituting into equation (20) and integrating over x yields which is identical to equation (19) with the correspondence z M p.
It is quite remarkable that, based on the sole assumption that the probability density f(z, t) preserves the total probability and the information (or entropy), we were able to construct an evolution equation (19) that is identical to the evolution equation of the Wigner function.In other words, the quantum evolution appears to stem uniquely from the property of conservation of the logical entropy (apart from the trivial conservation of total probability).This fundamental role played by the quantity R R w 2 dx dp had already been noticed in earlier works [4,32].An important caveat is that the probability density f(z, t) depends only the only variable z (plus time), whereas the Wigner function depends on the two phase-space variables x and p.For that reason, we had to consider a Wigner function that is localized in space (Dirac delta function) in order to establish the equivalence with the Wigner evolution equation.This is a significant difference, because it means overlooking a crucial feature of quantum physics, namely the existence of conjugate variables like position and momentum, whose simultaneous measurement is forbidden by the Heisenberg uncertainty principle.
In order to recover the full Wigner equation, we should work with probability distributions which, in the finitedimensional case, depend on two indexes, such as p ij , i.e., a matrix or tensor.The appropriate norm here appears to be the Frobenius norm j p j jj ¼ ffiffiffiffiffiffiffiffiffiffiffiffiffi P i;j p 2 ij q , with the information defined as I = j p j jj 2 .Then, in order to establish an evolution equation that preserves the norm, one would need to define a rotation of the tensor p ij in the appropriate space.The generalization to an infinite dimensional space should lead to an evolution equation for a two-variable probability density f(z 1 , z 2 , tÞ, which will have to be compared to the full Wigner equation (20) for w(x, p, t).This extension is left for future work.

Conclusions
In this work, we made use of the definition of logical entropy and information to extend the notion of probability to negative values.Although negative probabilities have been considered extensively in the past (and often dismissed as unphysical), we argued that they fit nicely within the framework of the logical entropy.Indeed, rejecting negative probabilities would appear as rather arbitrary and odd if one trusts the definition of logical entropy.
Our strategy was to posit that all normalized probability distributions {p i } for which the logical entropy lies in the interval [0, 1] are allowed, irrespective of the sign of the p i s.Of course, the constraint on the entropy limits the absolute negative values that can be taken by the probabilities.
We also pointed out that the logical information has a straightforward interpretation as the square of the Euclidean norm of the probability vector in R n , or the L 2 norm in the case of a continuous probability density.This simple geometric property is extremely fruitful to derive various interesting properties.In particular, the set of allowed probability distributions may be seen as the intersection of a hypersphere and a hyperplane in R n .
In order for the total probability and entropy to be conserved in time, the probability vector must rotate in the appropriate space, and this rotation is defined by an antisymmetric matrix.We next generalized this rotation to the infinite dimensional case (continuum).Quite remarkably, this leads to an evolution equation for the probability density f(z, t) that is virtually identical to the Wigner equation for a quantum system, at least when one considers only the momentum variable.These findings highlight the fundamental role played by the logical entropy in the mathematical structure of quantum mechanics.
Our future program is to prove that the full Wigner formulation of nonrelativistic quantum mechanics may be deduced from just two simple postulates: (i) conservation of the total probability R R w(x, p, t)dx dp and (ii) conservation of the logical information h R R w 2 (x, p, t)dx dp.For this, one should extend the present derivation to probability densities that depend on two variables, namely position and momentum.Once realized, this program would establish an alternative axiomatic foundation to nonrelativistic quantum mechanics.

Figure 1 .
Figure 1.Schematic representation of the case n = 2.The total probability constraint (6) is represented by the dashed straight line, while the entropy constraint (7) is represented by the blue quarter circle of radius R. Solutions are given by their intersections.

Figure 2 .
Figure 2. Schematic representation of the case n = 3.(a) The solutions of equations (6) and (7) lie on the circle given by the intersection of the sphere of radius R (here represented for the pure state with R = 1) and the plane p passing through the points A, B and C. The dashed triangle ABC, lying on the plane p, has all sides equal to a ¼ ffiffi ffi 2 p .The inner circle is inscribed into the triangle.(b) View of the plane p with the circumscribed and inscribed circles of radii r e and r i , which correspond, respectively, to values R = 1 and R ¼ 1= ffiffi ffi 2 p .The thin red circle is an intermediate case where positive and negative probabilities coexist (the latter lie outside the triangle).(c) Circular cone with apex at the origin O and basis circle of radius r i .The height k of the cone is the distance between the origin and the plane p.

Figure 3 .
Figure 3. (a) Nine particles are drawn from a probability distribution p, corresponding to a pure state, but they are not yet observed.(b)We look at particles 2-3, which turn out to be both red, and then look at particles 6-7, which turn out to be both blue.(c) Had we drawn particles 3-6, we would have expected them to be of same color, but this is in contradiction with the "experiment" of row (b).