Propagation of spiking moments in linear Hawkes networks

The present paper provides exact mathematical expressions for the high-order moments of spiking activity in a recurrently-connected network of linear Hawkes processes. It extends previous studies that have explored the case of a (linear) Hawkes network driven by deterministic intensity functions to the case of a stimulation by external inputs (rate functions or spike trains) with arbitrary correlation structure. Our approach describes the spatio-temporal filtering induced by the afferent and recurrent connectivities (with arbitrary synaptic response kernels) using operators acting on the input moments. This algebraic viewpoint provides intuition about how the network ingredients shape the input-output mapping for moments, as well as cumulants. We also show using numerical simulation that our results hold for neurons with refractoriness implemented by self-inhibition, provided the corresponding negative feedback for each neuron only mildly alters its mean firing probability.


Introduction
Immense efforts in neuroscience have been invested in measuring neuronal activity as well as the detailed connectivity between neurons. Such studies have been too often conducted separately, despite the fact that neuronal activity and synaptic connectivity are deeply intertwined. Indeed, the synaptic connectome determines the neuronal activity, while the latter reshapes the connectome through activity-dependent plasticity. To better understand the intricate link between activity and connectivity at the neuronal level, it is important to build tractable network models that relate one to the other. This paper examines how the neuronal activity is determined by the synaptic connectivity in a network. More precisely, we investigate how the spiking statistics -described via statistical moments or cumulants-propagates from an input population of neurons to an output population of recurrently-connected neurons, see Fig. 1A. Their firing probability depends on upstream neurons, as represented in Fig. 1B. To formalize this relationship, one needs to decide on a model for the neuronal dynamics. From the large class of existing neuronal models, we chose the simplest possible model in order to remain tractable. Indeed, detailed biophysical models such as the Hodgkin-Huxley model or the conductance-based models nicely describe the membrane potential dynamics around the action potential but are harder to study when embedded in a network. Their complexity often requires numerical simulation for their study and optimization strategies are still under debate, see for example (Brette, Rudolph, Carnevale, Hines, Beeman, Bower, Diesmann, Morrison, Goodman, Harris, Z Lai and de Kamps 2017). It can be argued that only the timing of the action potential that

Results
Let us consider an input population of m neurons whose spiking activity is denoted by the vector of functions 1 x(t) = (x 1 (t), · · · , x m (t)) where x j (t) is a superposition of Dirac deltas at spike times, i.e. x j (t) = f δ(t − t x j,f ) and t x j,f is the f th firing time of the input neuron j. As illustrated in Fig. 1A, this input population together with some driving intensity function λ(t) = (λ 1 (t), · · · , λ n (t)) feed a network (output) population of n neurons whose activity is denoted by y(t) = (y 1 (t), · · · , y n (t)), which are also a superposition of Dirac deltas, i.e. y i (t) = f δ(t − t y i,f ). A Hawkes process formalizes how the output spikes y(t) are generated from the history of input spikes F x t = {x(s)|s < t}, the history of output spikes F y t = {y(s)|s < t} 2 and the driving intensity λ(t). In the literature Hawkes processes are often defined by a n−dimensional counting process N y (t) = (N y 1 (t), . . . , N y n (t)), where N y i (t) gives the number of spikes from 0 to t for the network neuron i, i.e. N y i (t) = t 0 y i (t ′ ) dt ′ or equivalently y i (t) = dN y i (t)/dt. The heart of Hawkes' theory lies in the conditional intensity ν i (t) that determines the probability of an event (here a spike), via the increment of the counting process dN y i (t) = N y i (t + dt) − N y i (t) for neuron i in an infinitesimally small bin size dt: Alternatively, the increment dN y i (t) at each time can be seen as resulting from a Poisson process with the conditional intensity ν i (t): For further detail, see (Brémaud and Massoulié 1996) or (Daley and Vere-Jones 1988, Section 6.3) for the general theory on related point processes constructed using conditional  Figure 1: Overview of the present study. A: This schematic diagram at the top represents a Hawkes network, where nodes are the individual neurons that emit (or fire) spikes, borrowing the terminology in neuroscience. The afferent and recurrent connectivity are described by the kernel functions γ ik and ǫ ij , respectively. The goal of the present work is the characterization of the mapping between the moments of the input and output spike trains (i.e. their correlation structure). They are represented by the matrices and cubes, respectively representing the second-and third-order moments that are formally tensors with "spatial" coordinates (over neurons) and temporal variables. The dashed gray arrows represent cross-order contributions from the input to the output moments. B: This diagram depicts the average firing intensity of the downstream neurons in the output population due to a spike fired by the dark gray neuron (assuming that ǫ ij = 0 for i = j + 1). The black curves represent the increase in average conditional intensity ν i y at the light gray neurons following the spike in neuron 1, which is given by the convolution of the synaptic kernels ǫ ij of the corresponding connections. C: Similar diagram to panel B for a neuron with a self-connection with kernel ǫ (thick solid black curve). The effective recurrent kernel ǫ (dashed gray curve) is given by the superposition of ǫ with its self-convolutions (thin solid gray curves). Generalizing, we can calculate the effective recurrent kernel in the multivariate case for interconnected neurons. It corresponds to the Green function of the network in the context of linear dynamics. 4 intensities. A consequence of this property is that dN y i (t) = ν i (t)dt at all times, hence y i (t) = dN y i (t)/dt = ν i (t). Here the conditional expectation denoted by the angular brackets is taken on the increment dN y i (t) -or equivalently on y i (t)-at time t given the history of x and y and the value of λ at time t (as in Eq. (1)). In the following, we denote the conditioning variables as subscripts of the angular brackets, for instance y i (t) y,x,λ . The interested reader can find further detail referred to (Daley and Vere-Jones 1988, Section 7.2, Example 7.2). In this paper we use the following definition for Hawkes process: Definition 1 (Hawkes Process) The Hawkes process is a n-dimensional point process y(t) whose conditional intensity ν : R → R n + is driven by a time-dependent intensity λ : R → R n + and depends upon both the past input spiking activity F x t and its own past spiking activity F y t : is a matrix of "synaptic" kernels γ ik : R → R + that describe the causal effect from the input neuron x k on the network neuron y i . These functions are equal to zero for all t ≤ 0. Similarly ǫ = {ǫ ij } n,n i,j=1 : R → R n×n + is a matrix of kernels ǫ ij : R → R + , each corresponding to the recurrent interraction from neuron y j to neuron y i .
The definition in Eq. (3) has the implicit assumption that the right-hand side is always nonnegative, which is satisfied for our choice of γ and ǫ here. Recall that this is not guaranteed in general when these kernels have negative values. In such cases one can apply a rectifying function to ensure that ν i (t) ≥ 0, which brings nonlinearity in the formalism (Brémaud and Massoulié 1996;Galves and Löcherbach 2016;Ocker, Josić, Shea-Brown, and Buice 2017;Gao and Zhu 2018;Raad, Ditlevsen, and Löcherbach 2018). Note that the convolution operator * is a matrix convolution (see Eq. (6) below). Note also that in this paper we omit the summation symbol in line with Einstein's convention for tensor calculus.
Remark 1 (Atomic contributions and contraction of indices) Note that for an infinitesimally small dt, the increment dN y i (t) for the output neuron i and dN x k (t) for the input neuron k can take only 2 values: 0 or 1. In that case, we have for any p ∈ N + Following, atomic contributions arise from the point-process nature of spike trains when taking expectations of products of input spike trains x i (t) or output spike trains y i (t) for all possible redundancies in the time variables together with the "spatial" coordinates.
Remark 1 can be directly used to compute moments for independent Poisson neurons. For example, the second order moment for the an input population of independent Poisson neurons yields where we use Eq. (4) for the second term corresponding to t 1 = t 2 and k 1 = k 2 . Note that the second term of Eq. (5) dominates the first term, which is related to the Dirac delta function δ(t 2 − t 1 ) due to the limit dt → 0 in the denominator that remains after simplifying In the remainder we refer to terms involving Kronecker deltas δ k 1 k 2 and Dirac deltas δ(t 2 − t 1 ) as contractions of indices (here for 1 and 2). We extend the standard convolution to a matrix form, which involves a matrix multiplication as in Eq. (3).
Definition 2 (Matrix convolution) For the kernel matrix ǫ and vector y, the i th element of the matrix convolution is given by Notation 1 (Moments of order p) Let k = (k 1 , · · · , k p ) denote a set of p coordinates k r ∈ I m = {1, · · · , m}. The moment of order p of the input population evaluated at times t = (t 1 , · · · , t p ) is defined as x kr (t r ) x .

(7)
Similarly, the moment of order p of the output population for the coordinates i = (i 1 , · · · , i p ) ∈ I p n and the time variables t = (t 1 , · · · , t p ) is defined as Note that the mathematical expectation corresponds to three sources of stochasticity, as indicated by the superscript. Note that, due to the recurrent connectivity, the dependency of y on itself also concerns the past activity.
Remark 2 (Symmetry of moments) The moments X p k (t) and Y p i (t) have many symmetries. For the example of the input moments, any permutation Π of I p such that the transformed coordinates Π(k) = (k Π(1) , · · · , k Π(p) ) and Π(t) = (t Π(1) , · · · , t Π(p) ) leaves X p k (t) invariant: Definition 3 (Generalized spatio-temporal delta function) Letδ k (t) be the generalized delta function defined for the set of coordinates k = (k 1 , · · · , k p ) and times t = (t 1 , · · · , t p ), which combines the Kronecker and Dirac delta functions as Note that for p = 2, one recovers the product of the standard Kronecker delta with the Dirac delta:δ k 1 ,k 2 (t 1 , t 2 ) = δ k 1 ,k 2 δ(t 1 − t 2 ). Note also that when the lower index k is omitted, we will assume that k 1 = · · · = k p (i.e. single neuron case).
Example 1 (Moment for a single spike train with oscillatory intensity) Before presenting the general result, we provide an illustrative example to fix ideas and help the reader with concepts and notation. Case p = 2: For a single (input) neuron driven by a deterministic intensity function µ, the contraction in the 2nd-order moment corresponds to the condition t 1 = t 2 without "spatial" coordinates here, simplifying Eq. (5): 6 Case p = 3: The 3rd-order moment for a single spike train is given by This expression exhibits two "extreme" cases where all time variables are equal t 1 = t 2 = t 3 corresponding to the two Dirac delta δ(t 2 − t 1 )δ(t 3 − t 1 ) =δ(t 1 , t 2 , t 3 ) for the partition {1, 2, 3} , and where they are all distinct giving µ(t 1 )µ(t 2 )µ(t 3 ) for {1}, {2}, {3} . In addition, the three remaining terms involve a contraction for 2 out of the 3 variables. Numerical simulation: Fig. 2 illustrates the moments for p = 2 and 3 with a single spike train driven by a driving oscillatory intensity. Note that "spatial" coordinates k in the above equation are simply ignored, together with the Kronecker deltas. Fig. 2B, C and E highlight the atomic contributions along the various "diagonals" where the time variables coincide. Away from those subspaces, the spike densities are much lower, as can be seen in the scaling of values in the middle and right plots of Fig. 2C and E. Note that the main diagonal for p = 3 is slightly larger than that for p = 2, as autocorrelation effects cumulate.
Proposition 1 (Moments for inputs driven by deterministic intensity functions) Let P p = P(I p ) denote the set of all partitions Φ of the set I p = {1, · · · , p}. If the input neurons are independent from one another and driven by intensities µ k (t), then the input moment of order p with coordinates k = (k 1 , · · · , k p ) at times t = (t 1 , · · · , t p ) can be expressed as where S spans the disjoint subsets of Φ whose union is I p , with k S = {k r , r ∈ S} and t S = {t r , r ∈ S}. In addition, each driving intensity µ appears with a representative index, here taken as the minimumŠ = min(S). Recall the conventionδ k S (t S ) = 1 when S is a singleton.
Remark 3 The grouping of indices from a given subset S in Eq. (13) is a direct consequence of the contraction highlighted in remark 1, resulting in an atomic contribution where the paired spatial coordinates and temporal variables related to S are involved in the generalized delta functionδ.
Proof of Proposition 1: Eq. (13) can be obtained using the moment generating function via its p th derivative for order p, as was done in previous work in similar contexts (Ocker, Josić, Shea-Brown, and Bu Daley and Vere-Jones 1988, Section 5.2). Here we provide a proof by induction, which highlights the key observation that every combination of contractions can be described by a partition. Let assume that Eq. (13) is valid for all orders 2 ≤ p ′ ≤ p − 1. Now considering the order p with given coordinates k and time variables t in X k (t), we denote by S * the set of order indices in I p−1 such that coordinates and times are identical to their counterpart for p, namely S * = {r ∈ I p−1 , k r = k p and t r = t p }. Using the probabilistic independence as before, we can write:  Figure 2: Input moments for a spike train. A: Spike raster (top plot) for 50 simulations using a driving oscillatory intensity (bottom plot). B: 2nd-order moment (left plot) averaged over 10000 simulations, where darker pixels indicate a higher spike density. The middle and right diagrams illustrate the decomposition into a contribution due to rate correlation (co-fluctuations) and to atomic contributions (diagonal in thick black), respectively. Each contribution corresponds to a partition of I 2 = {1, 2}, as indicated below. C: Example slices of the moment in panel B as indicated by the solid/dotted lines in the diagrams, along the diagonal t 1 = t 2 (left diagram and plot) and for a fixed t 1 (right diagram and plot; here the atomic contribution is not represented for the theoretical prediction). The prediction curves (dashed) are calculated using Eq. (13). Note the difference in scaling for the y-axis. D: Decomposition of the 3rd-order moment using the partitions of I 3 = {1, 2, 3}, similar to panel B. The thick black lines indicates the diagonal planes and diagonal line for all possible contractions. E: The left diagram and plot correspond to the main diagonal of the 3rd-order moment with t 1 = t 2 = t 3 . The right diagram and plot correspond to the diagonal plane with t 1 = t 2 .
In the second line, we have used the hypothesis for order p − |S * | where |S * | is the number of elements in S * for the indices that are not in S * , as well as the contraction for all elements in S * using Eq. (4). The previous expression is valid for each S * ⊂ I p containing p, which is determined by k and t. We conclude by observing that the above dichotomy of partitions Φ actually spans the whole set P(I p ) = P p : which accounts for all possible configurations of k and t. This is also related to the decomposition of the Bell number -giving the number of partitions Φ ∈ P p -in the sum of the Stirling numbers of the second kind s p,q -giving the number of partitions Φ that have q groups. They satisfy the relationship s p,q = s p−1,q−1 + qs p−1,q for all 2 ≤ q ≤ p − 1 (corresponding to the above dichotomy), as well as the "boundary" condition s p,q = 1 when q = 1 or q = p. The autocorrelation terms can be represented using diagram representations (Shchepanyuk 1995).

Network with afferent connectivity
Now that we have introduced definitions and concepts that will be useful to characterize the high-order moments, we turn to the case of a network with afferent connections, but no recurrent connections. The following theorem is the first of our two core results. We denote the total driving intensity function of the network neurons that lumps together the driving intensity and the input influx by Notation 2 (Moment for the driving intensities λ) To account for possibly stochastic functions λ (e.g. a Cox process), we define the corresponding moment of order p for the coordinates i ∈ I p n at times t = (t 1 , · · · , t p ) as Notation 3 (Moment for the filtered input) As with λ, we define the moments of order p of the filtered input x (with afferent kernels γ) for the coordinates i ∈ I p n at times t = (t 1 , · · · , t p ) as Definition 4 (Tensor convolution operator) Let α ij : R → R n,m be a matrix of kernels. We define the 2p-dimensional tensor that replicates the matrix α for all pairs of indices (i r j r ): with i = (i 1 , · · · , i p ) ∈ I p n , j = (j 1 , · · · , j p ) ∈ I p m and t = (t 1 , · · · , t p ). For a p-order tensor T p j with coordinates j, the tensor convolution ⊛ between α p ij and X j evaluated at times t gives the following tensor of oreder p: The second line is a reformulation to stress that the convolutions of α are applied on each of the p dimensions -as indicated above each asterisk-on the tensor T p , followed by the summation for the tensor product (similar to a matrix product), in line with Eq. (6).
In essence, this convolution operator involves the same joint "multiplication" on paired spatial and temporal dimensions (related to k i and t i , the temporal convolution being seen as a function multiplication operator) as the matrix convolution in Eq. (6), but extended on all dimensions of the tensor. In particular, this operation is linear.
Property 1 By using the tensor convolution operator defined above, the moments of the filtered inputs can be convenient expressed as with i = (i 1 , · · · , i p ) ∈ I p n , k = (k 1 , · · · , k p ) ∈ I p m and t = (t 1 , · · · , t p ).
Proof of Property 1: Let x p k (t) = p r=1 x kr (t r ) be the p th order tensor associated to the input spike train x. The moments of the filtered input (see Eq. (18)) can be expressed as where the last line is obtained from the linearity of the convolution operator and from the definition of the tensor convolution operator defined in Eq. (20).
Theorem 1 (Input-output mapping for afferent connectivity) Consider an uncoupled Hawkes network (definition 1) whose neurons are excited by both inputs x (via afferent connections) and driving intensities λ, which are probabilistically independent. The moment M y,ǫ=0 i of order p of the network population depends on all smaller-order input moments X q of the input population as well as moments for the driving intensities Λ r (with 0 ≤ q, r ≤ p): where the moments Γ and Λ are defined in Eqs. (17) and (21), respectively. Here we have definedΦ = {Š, S ∈ Φ}, the set of minimaŠ = min(S) over all groups S in the partition Φ. Note that the superscript of the moment indicates the current situation when the network population is decoupled (i.e. ǫ = 0).
Proof of Theorem 1: Provided the statistics of the inputs x and driving intensities λ is known, the spiking activity of the network neurons is determined by the intensity function ν ǫ=0 i in Eq. (16). Similar to Eq. (13) in Proposition 1, the Poisson nature of the spiking of the network neurons thus gives the following expression for the unconnected neurons with spike 10 trains y: In the previous expression, the contractions basically extend the moment of smaller order |Φ| ≤ p for the intensity ν ǫ=0 to the order p. Note that the last line is obtained using the assumption that ν ǫ=0 (3) with ǫ ij = 0 for all i and t. The product involving the sum of λ iŠ + γ iŠ k * x k gives 2 |Φ| terms with |Φ| being the number of elements inΦ. Now we develop this product to isolate the contributions originating from the input moments of the same order on the one hand, and from the driving intensities on the other hand, using the fact that they are statistically independent. To this end, we use the following expression that converts a product of a sum into a sum of products: where A and B can be empty sets. In our case, A ⊂ I p is the subset of indices belonging toΦ that concern input neurons in Eq. (24), while B =Φ \ A is the subset of indices that concern λ. Because the random variables x and λ are independent, this gives Note that the expression in Eq. (26) can be rewritten by grouping together the moments of order |A| = q and |B| = r under the form where A p is an operator that considers all possible combinations with the delta functions, see Appendix A.
Remark 4 When the network of unconnected neurons is not driven by an external intensity (λ = 0), Eq. (26) can be simplified as Even in this general case, the output moment Y p,ǫ=0 of order p is an intricate function of the moments of orders r ≤ p, i.e. it depends on X r via Γ r with r = |Φ|. This contrasts with the fact that the moment Γ p only depends on the corresponding moment X p of the same order, see Property 1. Conversely, in the absence of spiking inputs (γ = 0) and when the driving intensities λ i (t) are deterministic, the moments Λ simply come from the multiplication of the intensity functions:

Network with recurrent connectivity
The last step is to consider connections determined by ǫ between the network neurons, the second half of our core result.
Definition 5 (Effective recurrent kernel) Let ǫ : R → R n×n denote the effective recurrent kernel and be defined as is the n th order convolution.
Recall that the convolution is defined for kernel matrices, see Eq. (6). Because ǫ ij (t) = 0 for t ≤ 0 and all pairs (i, j) (due to the causality requirement), ǫ ij (t) = 0 as well for t ≤ 0. This effective recurrent kernel is the equivalent in the time domain to the matrix inverse of the identity minus the "spatio-temporal" connectivity in the Fourier domain (Hawkes 1971a).
Property 2 The effective recurrent kernel ǫ satisfies the following self-consistency equation: where κ ij (t) =δ ij (t, 0) − ǫ ij (t). Therefore, ǫ can be thought as the inverse of κ for the convolution operator.
Proof of Property 2: By convolving the ǫ kernel with the effective recurrent kernel ǫ, we find (omitting the time variables) which we reorganize to factorize ǫ, obtaining Eq. (32).
Example 2 (Single neuron with self-connection and with driving intensity λ) We firstly present an illustrative version of our proof by induction for a single neuron with self-feedback and driven by a deterministic intensity λ in the cases 1 ≤ p ≤ 3. In this example · · · = · · · y as there is no other source of stochasticity. Note that p = 2 corresponds to Hawkes' results (Hawkes 1971a) with moments instead of (auto)covariances. The motivation is providing a concrete case for stepping from orders p to p + 1, which is formalized in the proof below. Cases p = 1 and p = 2: The first-order moment for p = 1 corresponds to the mean firing rate and can be calculated from the driving intensity function λ by solving the self-consistency equation given by the second line of Eq.
(3) using the equality for the intensity y(t) = ν(t) : For the second order, the point is to take into account the effects of spikes upon the future spiking probability, with the effect of the self-feedback loop. Assuming t 1 ≤ t 2 (gray semi-plane in Fig. 3A), we can develop y(t 2 ) in yy (t 1 , t 2 ) using Eq.
(3). This holds because the intensity function ν(t 2 ) requires the knowledge of past spiking activity y(u) with u < t 2 , as illustrated by the dark gray arrow in Fig. 3A, moving toward the diagonal t 1 = t 2 . This development gives Note that ν is inside the angular brackets on the right-hand side of the first line, because ν(t 2 ) and y(t 1 ) are not independent when the difference in the time variables lies within the range of ǫ. The last line is simply a rewriting using a specific notation with a line above multivariate functions to indicate the order of the functions with respect to the time variables, which will be useful for this example. In addition, we use the notation introduce in Eq. (20) where 2 * indicates the convolution performed on the second time variable t 2 and the Dirac delta δ 21 (t 2 ) := δ(t 2 −t 1 ) is a redundant expression as a function of t 2 , while keeping the information about t 1 . The solution yy (t 1 , t 2 ) must satisfy Eq. (35) for all t 1 ≤ t 2 , which is a Wiener-Hopf equation. The atomic contribution (Dirac delta) acts as a "boundary condition" when t 2 → t 1 . Our strategy is the following: we propose a solution for the moment of order p = 2 and verify that it satisfies the required Eq. (35). As the solution is fully symmetric in t 1 and t 2 , this implies that the solution is also valid on the complementary space t 2 ≤ t 1 , being eventually valid for all (t 1 , t 2 ) ∈ R 2 . The putative 2nd-order moment is: Note that our notation does not require the time variables, allowing for compact writing. We use the equality in Eq. (32) on ǫ 2 * to obtain For the first term of the right-hand side in the upper line, the convolution by ǫ * ǫ on the second variable t 2 has been rewritten by moving ǫ out, while the rest is in fact yy in Eq. (36).
In the second term, the convolution by the Dirac on t 2 and we obtain two terms involving ǫ * λ(t 1 ) = y (t 1 ), see the solution for the 1st-order moment in Eq. (34). Together, these three terms are the right-hand side of Eq. (35), which is thus satisfied. Note also that ǫ(t) = 0 for t < 0 (reflecting causality of the overall "feedback' kernel), which implies that the operator ǫ 1 * ǫ 2 * applied on the 2-dimensional function under the overline only "spreads" the function mass towards future (see Fig. 3B). Case p = 3: Following the previous section, we extend the calculations to the case p = 3 in order to prepare for the generalization to arbitrary p ≥ 2. As with p = 2, we consider the ordering t 1 ≤ t 2 ≤ t 3 (gray subspace in Fig. 3C), which allows the development of the third time variable as was done in Eq. (35) yyy (t 1 , t 2 , t 3 ) = ǫ 3 * yyy (t 1 , t 2 , t 3 ) + yy λ(t 1 , t 2 , t 3 ) + yy δ 32 (t 1 , t 2 , t 3 ) , Multivariate convolution for autocorrelation term Figure 3: Schematic diagrams supporting the calculations for the 2nd-and 3rdorder moments. A: The development in Eq. (35) corresponds in expressing y(t 2 ) as a function of the past history. This requires that t 2 > t 1 , as illustrated by the gray upper triangle of the plane. The dark-gray arrow indicates the "direction" of the development towards the past network activity (related to the convolution by ǫ), which is necessary to evaluate the firing probabilities involved in the moment. B: Schematic representation of the twofold convolution involved in Eq. (36) for the calculation of the second-order moment. The Dirac delta correspond to a function that is non-zero on the diagonal t 1 = t 2 only, as represented by the gray dashed line. The effect of the first convolution on t 1 "spreads" the diagonal function towards the "future" in the horizontal direction. Then, the convolution on t 2 "spreads" the whole towards the "future" in the vertical direction, resulting in a symmetric function. Note that the result is distinct from outer product of the time vectors ( ǫ * λ)( ǫ * λ). C: Similar diagram to panel A to indicate the subspace for the condition t 1 ≤ t 2 ≤ t 3 and represent the development of the moment for p = 3 in Eq. (38).
with the Dirac corresponding to the "boundary condition" when t 3 → t 2 , corresponding to the "lower" tilted plane of the gray subspace to which points the dark gray arrow in Fig. 3C. Note that this involves only the atomic contribution δ 32 (δ 21 is in yy corresponding to (t 1 , t 2 )), the other δ 31 alone is not possible in this space. See also the discussion in Example 1 for the second-order input moments. Now we pursue the calculations without the time variables in arguments, as before for p = 2. The putative symmetric solution is which involves the contractions for all partitions of {1, 2, 3}, in a similar fashion to Eq. (23). We use again Eq. (32) as in Eq. (37) to obtain the convolution of ǫ with yyy on t 3 and regroup the other terms where the convolution with t 3 vanishes because of the Dirac in order to use the expression of the 2nd-order moment in Eq. (36), namely ǫ 1 * ǫ 2 * (λλ + λδ 21 ) = yy : What remains to be seen is that the condition t 1 ≤ t 2 ≤ t 3 implies that δ 31 = 0 always: when t 1 = t 3 , in fact we have t 1 = t 2 = t 3 , which corresponds to δ 21 δ 31 . This means that the last term in Eq. (40) vanishes and Eq. (38) is satisfied. The symmetry argument ensures the validity over all (t 1 , t 2 , t 3 ), as will be formalized below. Numerical simulation: The upper plot in Fig. 4A illustrates that the rhythm of the output spiking is altered by the recurrent self-connection. This comes from the fact that, for an excitatory self-connection, output spikes momentarily increase the firing probability, as can be seen when comparing the green curve with the dotted black curve in the bottom plot. The output first-order moment in Fig. 4B (solid gray curve for the simulation and dashed black curve for the prediction) is above the input first-order moment related to the underlying driving intensity λ (dotted black curve). Note also the shift to later time.
The decomposition of the second-order moment in Fig. 4C illustrates that the effect of autocorrelations (right plot) spreads from the diagonal due to the self-connection. The main diagonal for p = 2 in Fig. 4D has larger values than the curve for p = 1 in Fig. 4B. In Fig. 4E, the main diagonal for p = 3 (gray curve in the left plot) is even larger, indicating that effects due to autocorrelation cumulate (as for input moments in Fig. 2). The slice of the output third-order moment (right matrix in Fig. 4E) has smaller value, but note the high spike density along the diagonal of the right matrix due to the spreading of atomic contributions by the recurrent kernel ǫ.
Theorem 2 (Input-output mapping for recurrent connectivity) The moment Y p i of order p of the Hawkes process (definition 1) of the network population can be expressed as The effects of the recurrent connectivity on the input moments are determined by spatiotemporal filtering described by the effective recurrent kernel ǫ p defined similarly to Eq. (19) on the moment for uncoupled neurons in Eq. (23).
Proof of Theorem 2: Compared to Example 2, we consider the general case where inputs and/or external intensities drive the network neurons via ν ǫ=0 in Eq. (16). Let introduce the conditional moment M p i (t) of order p defined as  Output moments for a single neuron with self-connection. A: Spike raster (top plot) for 50 simulations for a neuron, similar to Fig. 2. In the bottom plot, the driving oscillatory intensity λ (dotted black curve) is compared with the firing intensity ν (green curve), which is affected by the neuron's firing. B: 1st-order moment (solid gray curve) with theoretical prediction (dashed black curve). The dotted black curve indicate the driving intensity λ. C: The two left plots represent the Input-output mapping for the 2nd-order moment, averaged over 10000 simulations (darker pixels indicate a higher spike density). The two right plots illustrate the decomposition into a contribution due to rate correlation (cofluctuations, "naive" contribution) and that due to autocorrelation. Note that the right plot corresponds to Fig. 3B. The equations above refer to the terms in Eq. (37). D: Simulation (gray curve) and theoretical prediction (dashed black curve) of the diagonal of the matrix for the output moment in panel C. E: Example slices for the 3rd-order moment, as indicated by the left diagram (color coded). All prediction curves are calculated using Eqs. (37) and (40).
where the conditioning is over the input activity x and the driving intensities λ. Note that the statistical averaging over x and λ of the conditional moment gives the (unconditional) moment defined in Eq. (8) To demonstrate Eq. (41), we prove by induction the following result on M p i (t), which straightforwardly leads to the expression in Theorem 2 by taking the same statistical averaging over x and λ as done above: where the conditional moment of order p in the absence of recurrent coupling (ǫ = 0) is defined as M p,ǫ=0 In Eq. (43) the effect of the past spiking activity of y due to the recurrent connectivity ǫ is taken care of by all ǫ, considering ν ǫ=0 to be "deterministic" from the viewpoint of y provided x and λ are known. The conditioned moment M p i (t) in Eq. (43) must obey the constraints imposed by the dynamics in Eq. (3). Under the condition on the time variables t 1 ≤ · · · ≤ t p , we can develop for y ip (t p ) using the past activity of (y 1 (t), · · · , y p (t)) for t < t p and the intensity ν ǫ=0 ip (t p ). Let i = (i 1 , · · · , i p ) denote the coordinates and t = (t 1 , · · · , t p ) the time variables. The p th order correlation of the output population can be expressed as Note that the generalized delta corresponds to the "boundary condition" t p = t p−1 , as done in the above examples to moments. A similar condition for the time lag was used in the case of covariances (Hawkes 1971a;Gilson, Burkitt, Grayden, Thomas, and van Hemmen 2009b). By using the development of ν i (t) = (ǫ ij * y j ) (t) + ν ǫ=0 i (t), see Eqs. (3) and (16) and by setting i ′ = (i 1 , · · · , i p−1 ) which contains the p − 1 first elements of i, and similarly t ′ = (t 1 , · · · , t p−1 ), we have where the conditioned moment of order p − 1 appears in the right-hand side. Therefore, we can use Eq. (43) for the order p − 1: Note that the tensor convolution p−1 ⊛ applies to the first p − 1 indices j ′ = (j 1 , · · · , j p−1 ) of the tensor of dimension p. In the third line of Eq. (47), we only retain the partitions that contribute to the summation under the condition t 1 ≤ · · · ≤ t p . To do so we define the subset P 0 p−1 ⊂ P p−1 of ordered partitions Φ, where the groups S ∈ Φ consist of all successive indices betweenŠ = min(S) and max(S) (equal for singletons). Following, we integrate the elements in the squared brackets to the sum by augmenting the partitions Φ ∈ P 0 p−1 to partitions in P 0 p . Note that the passage from the second line to the fifth line in Eq. (47) also corresponds to taking ǫ = 0 in Eq. (45). Going back to Eq. (46), we isolate M p i (t) on the left-hand side: where κ ij (t) =δ ij (t, 0) − ǫ ij (t) (see Prop 2). Using the property of ǫ in Eq. (32), we obtain Note so far we have only established the validity of this result for t 1 ≤ · · · ≤ t p . As said above, this is equivalent of considering only ordered partitions P 0 p . The generalization to an arbitrary t = (t 1 , · · · , t p ) can be obtained by noting that an arbitrary t = (t 1 , · · · , t p ) can be mapped to an ordered version using permutations, say Π(t) = t ′ = (t ′ 1 ≤ · · · ≤ t ′ p ). The partition set P 0 p is thus replaced by Π(Φ), Φ ∈ P 0 p with Π(Φ) being the partition of the image indices via Π. Note that this covers entire set of all partitions P p when considering all possible permutations. This concludes the proof by induction.
Remark 5 (Large population size) In the limit of large population size (n → ∞) and in the absence of the driving intensities (λ = 0), the output moment of order p can simply be approximated by the single dominating term This corresponds to the partition Φ = {I p } and has a contribution of order n p whereas all other partitions Φ ′ = Φ give a contribution of order n p−|Φ ′ |+1 ≪ n p which is negligible.

Further examples
Example 3 (Interplay between afferent and recurrent connections) Here we consider three cases of a single neuron where the amplification determined by the weights is the same, namely the integral of γ * ǫ is identical across the three cases. In this way, the output neuron has the same firing rate in all cases and the point is the comparison of its spike-correlation structure. We rescale unitary kernels rescaled by the following weights w aff and w rec for γ and ǫ, respectively. The simulation results in Fig. 5 show that the distinct types of connectivity have strong influences on the output correlation structure. The combination of afferent and recurrent connections leads to a spreading of the density of the 2nd-and third-order moments. Although the afferent connection does not amplify the driving input in the right configuration, the autocorrelation of the input neuron contributes to a stronger correlation structure for the output neuron. Recall that, because all configurations have the same firing rate, the difference lies in the temporal distribution of the spikes, which are more bursty due to the recurrent connectivity. In other words, the presence of the input neuron with the afferent connection further strengthens the bursting. : Comparison between the output moments for various connectivity configurations with a single output neuron. The top matrix row corresponds to the 2nd-order moment and the bottom row to a diagonal plane of the 3rd-order moment. The same driving oscillatory intensity is fed to the neuron on the left of each diagram, in particular directly to the output neuron for the middle column. We compare the following configurations: Left column: feedforward network with w aff = 2 (and w rec = 0); Middle column: feedforward network with w rec = 0.5; Right column: feedforward network with w aff = 1 and w rec = 0.5. As before, the plots are results averaged over 10000 simulations.
Up to now, we have used synaptic kernels with non-negative values, to ensure that the firing intensity in Eq. (3) is always non-negative, which is necessary in our calculations. Now we consider negative synaptic connections to see how our calculations hold despite violating this assumption.
Example 4 (Two neurons with refractory self-connections and mutual excitation) For the two neurons with self-inhibition in Fig. 6A, each output spike triggers a temporary decrease of their firing intensity, thereby implementing relative refractoriness. Fig. 6B compares the time courses of the two synaptic kernels for excitation (as used until now) and self-inhibition for refractoriness. In order to prevent "negative firing intensity" (corresponding to dotted gray curves below the horizontal line at 0 in Fig. 6C), we used a rectification function such that the firing intensity remains non-negative, namely ν ≥ 0 (solid gray curve). This translates to a non-linear non-negative-valued function on the right-hand side of Eq. (3), here taken as ν i (t) = [· · · ] + . As mentioned earlier, this situation violates the hypothesis behind our calculations and is expected to result in errors between the predicted moments and their empirical counterparts. As an example, Fig. 6D displays the 1st-order moments for the two neurons with w refrac = 0.1 (as in Fig. 6C), which match well the theory. In contrast, the discrepancy with the theory is larger for w refrac = 0.3 in Fig. 6E. Note that their 1st-order moments are different for the two neurons, because of the distinct excitatory weights that connect them. When increasing the refractory weight w refrac , the firing rate obtained in the simulation deviates from its expected value calculated from the theory: the predicted value is lower because our calculations involve "negative firing intensity", as shown for the mean firing rate over the two neurons in Fig. 6F. Similarly, the accuracy of the moments decreases for larger w refrac , corresponding to larger errors in Fig. 6G. Note that the error in Fig. 6G includes finite size effects, i.e. it involves the empirical error due to the simulation over a limited period of time. In general, such errors may also be influenced by the specific choice of the synaptic kernel.

Relationship with cumulants
We end with relating our results with previous work (Jovanović, Hertz, and Rotter 2015; Ocker, Josić, Shea-Bro that described the activity in Hawkes networks using cumulants instead of moments. These studies were limited to the case of neurons driven by deterministic intensities and focused on cumulants because of the theoretical tools that they applied to the present problem, respectively Hawkes branching process and field theory. Cumulants and moments are two manners to describe the spiking statistics and the genuine relationship between them comes from their generating functions (Balakrishnan, Johnson, and Kotz 1998;Daley and Vere-Jones 1988, Section 5.2). Let E α (ζ, k, t) be the moment generating function for the multivariate random variable α k (t) = (α k 1 (t 1 ), · · · , α kp (t p )): where ζ = (ζ 1 , · · · , ζ p ) T . This moment generating function can be used to express the p th order moment over the coordinates k and times t: The cumulant generating function for the random variable α is given by The excitatory kernel (top) corresponds to a double exponential (exp(−t/τ decay ) − exp(−t/τ rise ))/(τ decay − τ rise ) with τ rise = 1 ms and τ decay = 5 ms, as used before in other figures. The inhibitory kernel used for refractoriness (bottom) is a simple decaying exponential exp(−t/τ refrac )/τ refrac with τ refrac = 1.5 ms. C: Example firing intensity for the two neurons (1 at bottom and 2 at top) where spikes are indicated by crosses. Here the refractory weight is set to w refrac = 0.1. The driving oscillatory intensity λ is represented by the dotted black curve. D: 1st-order moment (solid gray curve) with theoretical prediction (dashed black curve) for the two neurons and w refrac = 0.1, averaged over 10000 simulations. E: Same as panel D for stronger refractoriness with w refrac = 0.3. F: Predicted (dashed black curve) and empirical (solid gray curve) mean firing rate as a function of the refractory weight w refrac . As before, the results correspond to the average over 10000 repetitions of the same network. G: Normalized error for the 1st-to 3rd-order moments when varying w refrac . It corresponds to the difference between the theoretical and empirical curves in panel C, squared and integrated over the 100 ms period. The normalization consists in dividing by the firing rate. The non-zero normalized error at w refrac = 0 comes from the finite number of simulation repetitions.
Notation 4 (cumulant) The cumulants of order p for the indices k = (k 1 , . . . , k p ) at times t = (t 1 , . . . , t p ) for the input x, for the driving intensity λ and the filtered inputs γ * x are defined respectively asX p k (t) = Property 3 The formal relationship between the moment X p k (t) of order p and cumulants X p ′ k ′ (t ′ ) of order p ′ ≤ p -here presented for the inputs-is given by where Φ are the partitions of I p composed of disjoint subsets S.

Proof of Property 3:
The present proof -inspired by previous work (Daley and Vere-Jones 1988;Balakrishnan, Johnson, and Kotz 1998)-relies on the following general result for the (partial) derivative of exp (f ) with respect to variables ζ = (ζ 1 , · · · , ζ p ) for an arbitrary function f without specified arguments: which involves all partitions Φ ∈ P p and the partial derivatives ∂ |S| f ∂ζ S of order |S| with respect to the variables ζ r whose indices r ∈ S. For p = 1 with ζ 1 , we have the univariate case To demonstrate Eq. (59), we assume the expression to be valid for p − 1 and derive it for p, using a proof by induction. Separating ζ p from the remaining variables ζ ′ = (ζ 1 , · · · , ζ p−1 ), we use Eq. (59) for p − 1: where the derivative with respect to ζ p applied to the product yields two terms. The second term corresponds to Eq. (60), which can be assimilated to the partition Φ ′ ∈ P p such that Φ ′ = Φ ∪ {p} . The first term actually gives |Φ| terms, one for each subset S of the product, which depends on the actual partition Φ. For each Φ, we construct |S| partitions Φ ′ ∈ P p by 22 adding the index p to one of the subsets S ∈ Φ. Because a partition Φ ∈ P p can only be of one of the two types, we end up with where Q p = {Φ ∈ P p , {p} ∈ Φ} is the set of all partitions of I p that contain the singleton {p}.
Corollary 1 A direct corollary of Proposition 3 is that, when the input neurons are independent and driven by intensities µ k (t k ), then the cumulant of order p of the input population x is given byX The proof simply consists in identifying the terms in Eq. (13) to the cumulants, where k and t are respectively replaced by k S and t S for each subset S. Now we examine the general situation of a network with afferent and recurrent connectivities, corresponding to the combined theorems for moments -see Eqs. (23) and (41).
Theorem 3 (Mappings for cumulants) The cumulants are related by the following mappings: Proof of Theorem 3: Eq. (65a) simply comes from the linearity of the filtering by γ. Another manner to prove it is to decompose the moment in terms of cumulants, as we do now to demonstrate Eq. (65c).
By rewriting Eq. (41) in terms of cumulants using Eq. (58), we have As before, we identify the terms for each S and Φ.
In contrast, Eq. (65b) is not straightforward and comes from the spiking nature of y driven by an intensity function ν ǫ=0 that possibly has high-order correlations (for example a Cox process). Basically, it is the extension of cumulants of smaller orders by delta functions for all possible partitions for each time variable of the smaller-order cumulant. For simplicity, we only show the result forΓ; note also that the additivity of the cumulant ensures the complete result. We rewrite Eq. (28) -that is the equivalent of Eq. (23) in the absence of λ-in terms of cumulants using Eq. (58): In Eq. (67) cumulantsΓ involve indices from distinct subsets S of the partition Φ, as they "combine" the minima inΦ according to Φ ′ . We now reorganize the expression to obtain a similar expression to the left-hand side, where the terms in the product over S have a generic expression with indices only in S. The product of generalized delta functions can be moved inside the sum over Φ ′ , yielding For each pair of partitions Φ and Φ ′ , we construct a partition Ψ ∈ P p , whose subsets T are the unions of subsets S corresponding to the same S ′ ∈ Φ ′ : In addition, we define a partition Ψ ′ T ∈ P(T ) for each T ∈ Ψ that splits T into the original subsets S ∈ Φ: The correspondence between the partitions is represented in Fig. 7 for a schematic example. Using Eq. (70) with S = T ′ ∈ Ψ ′ T for the each T , the first product in Eq. (68) can be rewritten as Because each S ′ ∈ Φ ′ =Φ is the subset of minimaΨ ′ T for the corresponding T = S, we similarly reformulate the second product We can thus factorize the two products in the right-hand side of Eq. (68) to obtain Last, the key observation is that each pair Φ ∈ P p and Φ ′ ∈ P(Φ) is uniquely associated with another pair made of a partition Ψ ∈ P p and its corresponding set of partitions {Ψ ′ T ∈ P(T )} T ∈Ψ : Figure 7: One-to-one mapping between partitions. The partition Ψ is constructed from the pair Φ and Φ ′ . To each element of T ∈ Ψ corresponds a partition Ψ ′ T that recovers the original subsets in Φ. See Eqs. (69) and (70) in the main text for the mathematical construction.
Here the subsets are indexed as in Eq. (75) and the partitions Ψ ′ T are represented using different gray contrasts.
As a consequence, the double summation over Φ and Φ ′ in Eq. (68) can be expressed as a summation over Ψ and over its corresponding sub-partitions, namely with the explicit enumeration of T r ∈ Ψ. With this substitution, Eq. (68) can be expressed as: Once again, we conclude by identifying the terms for each T and Ψ in the right-hand side and S and Φ in the left-hand side of Eq. (76).
Note that Eq. (65b) for cumulants resembles its counterpart Eq. (23) for moments, but in the case where the neurons are stimulated by both inputs and driving intensities, the corresponding cumulants are simply summed, whereas moments appear in a product.

Discussion
In this paper we analytically computed the statistics of neuronal activity in a recurrent network -described via moments and then transposed to cumulants-from the statistics of the input neuronal population. An important contribution of our study is the description of the propagation of spiking moments in feedforward networks (Theorem 1) and recurrently-connected networks (Theorem 2), which had not been explored before. Theorem 3 established the equivalent mappings for cumulants. Compared to recent studies for cumulants (Jovanović, Hertz, and Rotter 2015;Ocker, Josić, Shea-Brown, and Buice 2017), an important advantage of the operator viewpoint taken here is that it provides intuition about the spatio-temporal filtering induced by both afferent and recurrent connectivities. In particular, Fig. 5 shows that the combination of afferent and recurrent connectivities can lead to strong output correlation structure, hinting at nonlinear effects on the distribution of spikes. This can be explained by the recurrent connectivity, as well as the interplay between moment orders (here from low-order to high-order moments). Another interesting point is that moments do not require a stationarity assumption to derive their consistency equation, in contrast to covariances in the original Hawkes' formalism.
The main technical challenge comes from the spiking nature of neurons which forces us to consider all possible contractions, see Eq. (4). For rate-based neurons -still interacting through spatio-temporal kernels-or equivalently assuming that the population size is very large such that individual spikes have negligible effects, our results can be expressed in a much simpler way (see Remark 5). In this case, the output moments can be approximated by a nested convolution: a first convolution of the input moments with the feedforward kernel followed by a second convolution with the effective recurrent kernel. Quantifying the deviations from this approximation for neuronal population of finite size is left for future work. At the heart of the tractability in this study is the linearity assumption of the Hawkes process considered here, as the firing probability is proportional to the membrane potential that simply sums the synaptic inputs. This allows us to calculate exact results that do not rely on, e.g., the diffusion approximation. However, the linearity assumption obviously imposes limitations to the scope of the theory presented here. In particular, refractoriness (i.e. the reduction of spiking probability during the few milliseconds that follow an action potential) as well as inhibition, which is ubiquitous in the brain, cannot be exactly modeled in this linear framework. Discrepancies with the theory will of course depend on many factors such as the shapes and amplitudes of the synaptic kernels (especially for inhibitory connections) as well as the properties of the filtered inputs and driving intensities -which may interplay. Nonetheless, we have found that our formalism still holds when such mechanisms modeled by negative connectivity weights are not too strong (Fig. 6).
To circumvent those limitations, several studies included various forms of nonlinearities in the Hawkes process (Brémaud and Massoulié 1996;Galves and Löcherbach 2016;Chevallier 2017;Gao and Zhu 2018;Ferrari, Galves, Grigorescu, and Löcherbach 2018;Raad, Ditlevsen, and Löcherbach 2018) One can assume that the firing intensity explicitly depends on the time difference up to the previous spike in order to make the Hawkes process age-dependent (Raad, Ditlevsen, and Löcherbach 2018). This approach ensures that the stability is independent of the two classical stability conditions, i.e. (a) α-Lipschitz condition on the nonlinear intensity function and (b) the integral of the absolute value of the recurrent kernel should be smaller than 1/α (Brémaud and Massoulié 1996). However, the computation of moments and cumulants in this context is expected to be much harder. Mean-field approximations lead to analytical results (Toyoizumi, Rad, and Paninski 2009), but they are only valid in the limit of weak coupling. Another possibility is to rely on path-integral formulation and the related Feynman diagram formalism (Shchepanyuk 1995; Ocker, Josić, Shea-Brown, and Buice 2017; Chen, Shojaie, Shea-Brown, and Witten 2018), but it brings an additional complexity that requires further analysis to obtain an intuitive understanding of the combined effect of the feedforward and recurrent kernels in propagating spiking moments. In that respect, the linear Hawkes process already leads to complex crossovers between cumulants -as can be seen in Eq. (65b)-and many more are expected to appear for the nonlinear case. Note that gaining insight about this cross-talk between moment orders is important to investigate a network driven by correlated inputs or by driving intensities with correlation structure such as Cox processes (Lechnerová, Helisová, and Beneš 2008;Laier, Prokesova, and Jensen 2008).
In the context of neuroscience, our results can be applied to the field of synaptic plasticity under two conditions. First, the output neurons should be in the linear regime and secondly the learning rule should have a small learning rate (i.e. learning operates at a much slower time scale than The important point here is to understand that the construction ofΦ from A and B exactly spans the whole set of partitions P p . Note also that A and B can be empty sets. Then we simply group the subsets A of the same size q, and similarly B of the same size r: which gives a reformulation of Eq. (23) using the operator A p in Eq. (77).