FISHER INFORMATION MATRIX FOR SINGLE MOLECULES WITH STOCHASTIC TRAJECTORIES

Abstract. Tracking of objects in cellular environments has become a vital tool in molecular cell biology. A particularly important example is single molecule tracking which enables the study of the motion of a molecule in cellular environments by locating the molecule over time and provides quantitative information on the behavior of individual molecules in cellular environments, which were not available before through bulk studies. Here, we consider a dynamical system where the motion of an object is modeled by stochastic differential equations (SDEs), and measurements are the detected photons emitted by the moving fluorescently labeled object, which occur at discrete time points, corresponding to the arrival times of a Poisson process, in contrast to equidistant time points which have been commonly used in the modeling of dynamical systems. The measurements are distributed according to the optical diffraction theory, and therefore, they would be modeled by different distributions, e.g., an Airy profile for an in-focus and a Born and Wolf profile for an out-of-focus molecule with respect to the detector. For some special circumstances, Gaussian image models have been proposed. In this paper, we introduce a stochastic framework in which we calculate the maximum likelihood estimates of the biophysical parameters of the molecular interactions, e.g., diffusion and drift coefficients. More importantly, we develop a general framework to calculate the Cramér-Rao lower bound (CRLB), given by the inverse of the Fisher information matrix, for the estimation of unknown parameters and use it as a benchmark in the evaluation of the standard deviation of the estimates. There exists no established method, even for Gaussian measurements, to systematically calculate the CRLB for the general motion model that we consider in this paper. We apply the developed methodology to simulated data of a molecule with linear trajectories and show that the standard deviation of the estimates matches well with the square root of the CRLB. We also show that equally sampled and Poisson distributed time points lead to significantly different Fisher information matrices.

1. Introduction. The ability to track objects of interest, e.g., subcellular organelles and molecules, in cellular environments plays an important role in studying biological systems. In particular, single molecule tracking, which enables following subcellular processes at the single molecule level, has become a vital tool in cell biology [30,29,28]. Traditionally, microscopy studies were bulk studies and the information from such studies reflected the behavior of ensembles of molecules as opposed to individual ones [24]. Single molecule microscopy techniques have revolutionized the field of microscopy by providing quantitative information on the behavior of individual molecules in cellular environments, which were not available before through bulk studies [22,25]. In biological studies, single molecule tracking methods have been used to study intracellular trafficking of fluorescently labeled antibodies, e.g., prostate-specific membrane antigen (PSMA) antibodies [14,12,1], by analyzing the velocity and path of the fluorescent molecules.
In general, the motion of an object in cellular environments is subject to different types of forces, e.g., deterministic forces due to the environment and random forces due to random collisions with other objects [32,7]. It has been shown that the motion of a moving object in such environments can be modeled by stochastic differential equations (SDEs) [27]. In particular, in many biological applications, solutions of linear SDEs are good fits to experimental single molecule trajectories [10,9,8]. In a basic fluorescence microscope, a fluorescently labeled object of interest is imaged by a detector which detects the photons emitted by the object during the acquisition time. Since the detection process of the emitted photons is inherently a random phenomenon, the acquired measurements are stochastic in nature. These measurements, according to optical diffraction theory, can be modeled by different distributions. For example, a typical distribution for an in-focus molecule is an Airy profile [11], whereas, three dimensional distributions, e.g., Born and Wolf [6], are used instead for out-of-focus molecules. In some cases, it is possible and computationally beneficial to approximate these complex profiles with simple Gaussian models.
In many dynamical systems, the time points of the measurements are assumed to be uniformly distributed. However, the time points of detection of the photons are corresponding to the arrival times of a Poisson process [22,25]. This gives rise to nonuniform sampling of the continuous-time stochastic process that describes the motion of the object. This randomized non-uniform sampling also has significant fluctuations in parameter estimation.
In recent years, many methods have been developed to analyze the trajectories of a molecule in cellular environments. In most of these methods, the model for the motion of the molecule is assumed to be limited to a Brownian motion (pure diffusion) model described only by the diffusion coefficient, and only few of the available methods consider more general motion models. The methods developed to analyze pure diffusion models are mostly based on the mean square displacement approach [23], in which the diffusion coefficient is estimated by a linear regression of the mean square displacement of the Gaussian distributed observed locations of the molecule as a function of the time lag [5,20,19]. Mean square displacement-based methods are not the only approaches used to estimate the diffusion coefficient from a set of measurements. For example, Relich et al. [26] have proposed a method for the maximum likelihood estimation of the diffusion coefficient, with an information-based confidence interval, from Gaussian measurements. In all of these methods, the motion of a molecule is assumed as a pure diffusion model, and the measurements are modeled by an independent and identically distributed Gaussian random variable [17].
However, in general, the motion of a molecule is not limited to the pure diffusion model, and the diffusion coefficient is only one of the parameters that play role in the motion of the molecule. Also, the Gaussian assumption for the measurements is problematic in practice due to the fact that the Gaussian model is often not an accurate analytical model. In [2], Ashley and Andersson have proposed a simultaneous localization and parameter estimation algorithm for more complex motion models, such as confined [27] and tethered motions [21], which employs the expectation maximization algorithm in conjunction with sequential Monte Carlo methods [31]. Briane et al. [7] have developed a method for classifying the object trajectories in living cells into three types of diffusion: Brownian motion, subdiffusion (diffusion in a closed domain or in a crowded area) and superdiffusion (diffusion in a specific direction). In [10,9,8], the motion of a moving object has been described more generally by a linear SDE, and the parameters of the model has been estimated using a maximum likelihood estimation method. However, they do not consider randomness of the time points at which the measurements occur. Their proposed framework also does not allow for non-Gaussian measurements.
In this paper, we address the above limitations by considering a more general dynamical system with arbitrary distributed measurements, which occur at Poisson distributed time points, that allows for more general motion models for an object of interest. Here, the motion of an object in cellular environments is modeled by stochastic differential equations (SDEs), and the measurements are the detected photons emitted by the moving fluorescently labeled object. As mentioned earlier, these measurements can be modeled by non-Gaussian distributions. We develop a stochastic framework in which we calculate the maximum likelihood estimates of the biophysical parameters of the molecular interactions, e.g., diffusion and drift coefficients.
According to a well-known result from estimation theory, assuming that the estimator is unbiased, its standard deviation, is then at best equal to the square root of the CRLB, which is given by the inverse of the Fisher information matrix [22,25,11]. More importantly, in order to evaluate the performance of our proposed estimation method, we develop a general framework to calculate the Fisher information matrix of the unknown parameters of the general motion model. There are some cases that Gaussian approximations of measurements are very useful due to, for example, the ability of using computationally efficient algorithms in linear systems or Kalman filter formula. In particular, for Gaussian measurements, we calculate the Fisher information matrix by taking advantage of its relationship with the Kalman filter formula through a computationally efficient algorithm. To the best of our knowledge, even for Gaussian measurements, there currently exists no systematic methodology to evaluate the standard deviations of the estimates using the CRLB for the general motion model considered here.
To assess the performance of the proposed estimation method, we apply it to simulated data sets comprising linear two-dimensional (2D) trajectories of a molecule with Gaussian, Airy and classical model of Born and Wolf measurements. The results show that there is no systematic bias associated with the method. In addition, we show that the mean of the predicted distributions of the locations of the molecule is able to follow the true locations of the molecule for the all different types of measurements. In particular, for data sets comprising repeat trajectories of a molecule with Gaussian measurements, it is shown that the standard deviations of the diffusion and drift estimates are close to the square roots of their corresponding CRLBs. We also show that, in case that we have one detected photon, the Fisher information matrices obtained for an Airy and its corresponding approximating Gaussian profile are different from each other, and therefore, the use of the Gaussian approximation can be problematic in some applications. We show that equally sampled time points, which have been commonly used in most dynamical systems, and Poisson distributed time points can lead to significantly different Fisher information matrices. We further show that even the results obtained for different realizations of a Poisson process can vary notably. This paper is organized as follows. In Section 2, we present the statistical description of the acquired data, and derive a general formula for the likelihood function of the described data model. Section 3 is devoted to introduce linear stochastic trajectories and calculate the likelihood function in case that the object undergoing this type of trajectories. In Section 4, we propose a mathematical framework to calculate the maximum likelihood estimation of the parameters of interest, such as the parameters of the motion model of the molecule. Section 5 is devoted to calculate general expressions for the CRLB and Fisher information matrix relating to the parameter estimation problem.
In this paper, we use the following notation C l × R l [t] := {(r 1 , · · · , r l , τ 1 , · · · , τ l ) |r 1 , · · · , r l ∈ C, t 0 ≤ τ 1 < · · · < τ l ≤ t} , (1.1) where C := R 2 , t 0 ∈ R, and l = 1, 2, · · · . If there is no bound on t, we denote the set in Eq. (1.1) by C l × R l [∞] . 2. Fundamental data model. A basic setup of an optical system considered here is shown in Fig. 1, where an object is in the object space and its image is captured by a planar detector in the image space. In the fundamental data model, we assume that the microscopy image data is acquired under ideal conditions. It assumes the use of an image detector that has an unpixelated and infinitely large photon detection area. The detection of a photon is intrinsically random in terms of both the time and the location on the detector at which the photon is detected. In general, the temporal part of the detection of the emitted photons can be modeled as a counting process {N (τ ), τ ≥ t 0 }. Here, we assume that {N (τ ), τ ≥ t 0 } is a Poisson process referred to as the photon detection process that is characterized by the intensity function Λ(τ ), τ ≥ t 0 , referred to as the photon detection rate. The spatial component of the photon detection process is specified by random variables, referred to as the photon location variables, that describe the locations at which photons emitted by the object of interest are detected. Fig. 1. Schematic of an optical microscope. An object located in the object (focal) plane is imaged by an optical lens system and the image of the object is acquired by the planar detector in the image space. A 2D random variable X θ (τ ), τ ≥ t 0 , describes the location of the object in the object plane at time τ .
In the following definition, we define a spatio-temporal process referred to as the image detection process, which models the acquired data, for two different acquisition methods, one when the time interval over which photons are detected is given and the other when the total number of detected photons is given. For the fixed acquisition time, due to the stochastic nature of photon emission, the total number of detected photons varies for every image, while in the other case, the number of detected photons remains the same.
Definition 2.1. Let C := R 2 denote a non-pixelated detector of infinite size. Let Θ denote a parameter space that is an open subset of R n and let t 0 ∈ R. Let the one-dimensional (1D) random variables T 1 , T 2 , · · · , describe the time points of detection of the photons that impact the detector C, which are arrival times points associated with a Poisson process with intensity function Λ(τ ), τ ≥ t 0 . Let U 1 , U 2 , · · · , be 2D random variables that describe the locations of detection of the photons that impact the detector C. For l = 1, 2, · · · , let U l := (U 1 , · · · , U l ) T , U 0 = ∅, and T l := (T 1 , · · · , T l ) T , T 0 = ∅. Assume that the current location of the detected photon, given the current and previous time points, is independent of the future time points, i.e., for r ∈ C and t 0 ≤ τ 1 < τ 2 < · · · , p U l |T k r|τ 1 , · · · , τ k = p U l |T l r|τ 1 , · · · , τ l , for all k, l = 1, 2, · · · , k ≥ l, where, for random vectors X and Y , the conditional probability density function of X, given Y , is denoted by p X|Y .
1. For a fixed acquisition time interval [t 0 , t], an image detection process , t] is defined as a spatio-temporal process whose temporal part T [t] and spatial part U [t] describe the time points and the locations of detection of the photons that impact the detector C in the time interval [t 0 , t], respectively, i.e., for ω ∈ Ω, where Ω is the sample space, Given a fixed number L = 1, 2, · · · , of photons, an image detection process G L (U L , T L ) , C, Θ for a fixed number L of photons is defined as a spatio-temporal process whose temporal and spatial parts describe the time points and the locations of detection of the L photons that impact the detector C, respectively. Moreover, given T L = (τ 1 , · · · , τ L ) , t 0 ≤ τ 1 < τ 2 < · · · < τ L , G τ1,··· ,τ L (U L , T L ) , C, Θ is referred to as the image detection process at fixed time points τ 1 , · · · , τ L .
In Theorem 2.2, we state expressions for the probability/probability density functions of image detection processes for a fixed time interval and for a fixed number of photons in terms of the predicted distributions of the locations of the detected photons, given the previous locations and the current and previous time points of the detected photons in a recursive manner. We further show that each of these predicted distributions can be expressed in terms of a scaled and shifted version of the image of the object and the predicted distribution of the location of the object, given the previous locations and time points of the detected photons. All the proofs in the paper are placed in Appendix. We drop the parameter vector θ when it is clear from the context.
, C, Θ and G L (U L , T L ) , C, Θ be image detection processes for a time interval [t 0 , t] and for a fixed number L of photons, respectively. Let 1. Then, the probability of D [t] = ∅ and N (t) = 0 is given by and the probability density function p [t] of D [t] and N (t) is given by [t] , K = 1, 2, · · · , and p U l |T l ,D l−1 denotes the conditional probability density function of U l , given T l , D l−1 , with p U1|T1,D0 r 1 |τ 1 , d 0 := p U1|T1 r 1 |τ 1 .
2. Moreover, the probability density function p L of D L is given by Proof. See Section 6.1 in Appendix.
Note that, as can be seen in the above theorem, the probability density function of an image detection process for a time interval [t 0 , t] depends on the integral of the photon detection rate Λ(τ ), τ ≥ t 0 , over the time interval [t 0 , t], and the probability density function of an image detection process for a fixed number L of photons depends on the integral of the photon detection rate over the time interval [t 0 , τ L ], where τ L denotes the time point of the L th (last) detected photon.
The probability density function of the location at which a photon emitted by the object of interest is detected, is referred to as the image profile of the object. So far we have made no assumptions about the specific functional form of the image profile of the object. In many practical cases, the image profile can be described as a scaled and shifted version of the image function. In such cases, an image function describes the image of an object on the detector plane at unit lateral magnification. Also, in general, the trajectory of the object can be described by a random process. In the following definition, we define image detection processes driven by a stochastic trajectory of the object and the image function for a fixed time interval and for a fixed number of photons.
, C, Θ and G L (U L , T L ) , C, Θ be image detection processes for a time interval [t 0 , t] and for a fixed number L of photons, respectively. Let X(τ ), τ ≥ t 0 , denote a 2D random process that describes the 2D stochastic trajectory of the object. Also, let {f x } x∈R 2 defined on the detector C, be a family of image profiles of an object located at x ∈ R 2 in the object space. Assume that the current location of the detected photon, given the current location of the object, is independent of the previous locations and previous and current time points of the detected photons, i.e., for all x ∈ R 2 , ,T l ,D l−1 is the conditional probability density function of U l , given X(T l ), T l , D l−1 , and p U l |X(τ l ) denotes the conditional probability density function of U l , given X(τ l ). Assume that there exists an image function q: R 2 → R, such that for an invertible matrix M ∈ R 2×2 and , q, C, Θ and G L X, (U L , T L ) , q, C, Θ driven by the stochastic trajectory X and image function q for a time interval [t 0 , t] and for a fixed number L of photons are defined as the spatio-temporal processes G [t] and G L , respectively.
In Theorem 2.2, we expressed the probability density functions of image detection processes in terms of conditional probability densities p U l |T l ,D l−1 , l = 1, 2, · · · , of the locations of the detected photons, given the previous locations and the current and previous time points of the detected photons in a recursive manner. In particular, for an object with a deterministic trajectory or a stationary object, the conditional probability densities p U l |T l ,D l−1 , l = 1, 2, · · · , can be simplified. For an object with deterministic trajectory X(τ ) ∈ R 2 , τ ≥ t 0 , we have Also, for a stationary object with position X 0 ∈ R 2 , we have We next illustrate specific image functions that describe the image of a point source. According to optical diffraction theory, when a point source is in focus with respect to the detector, the intensity distribution of the image of the point source is described by an Airy profile given by [25] (see Fig. 2 where n a denotes the numerical aperture of the objective lens, λ denotes the emission wavelength of the molecule, and J 1 denotes the first order Bessel function of the first kind. The 2D Gaussian profile, on the other hand, has been widely used to approximate the Airy profile as where σ > 0. For an out-of-focus point source, the image function can be obtained by the classical Born and Wolf model given by [6] q z0 (x, y) = 4πn 2 a λ 2 where J 0 is the zeroth-order Bessel function of the first kind, n o is the refractive index of the objective lens immersion medium, and (0, 0, z 0 ), z 0 ∈ R, is the location of the object in the object space.
We next calculate p U l |T l ,D l−1 , l = 1, 2, · · · , for more general cases. In the following corollary to Theorem 2.2, by describing these conditional probability density functions in terms of the image function, we derive expressions for the probability density functions of the image detection processes driven by the stochastic trajectory X and image function q for a time interval [t 0 , t] and for a fixed number L of photons.
, q, C, Θ ) be an image detection process driven by the stochastic trajectory X and image function q for a time interval [t 0 , t] (or for a fixed number L of photons), respectively. The conditional probability density function p U l |T l ,D l−1 , l = 1, 2, · · · , in Eq. (2.1) (or in Eq. (2.2)) of Theorem 2.2 is given by , p pr l := p X(T l )|T l ,D l−1 denotes the predicted probability density function of the location of the object, and p pr1 x o |τ 1 , d 0 := p pr1 x o |τ 1 .
Proof. See Section 6.2 in Appendix.
As can be seen in the above corollary, the expression of the probability density function of the image detection process depends on the predicted probability density function p pr l , l = 1, 2, · · · , of the location of the object, given the previous locations of the detected photons and the current and previous time points. In the following section, we introduce linear stochastic trajectories and calculate p pr l , l = 1, 2, · · · , for them.
3. Linear stochastic trajectories. In general, the motion of an object in cellular environments is subject to different types of forces, e.g., deterministic forces due to the environment and random forces due to random collisions with other objects [32,7], and can be modeled as X(τ l+1 ) = h(X(τ l )) + W (τ l , τ l+1 ), l = 1, 2, · · · , τ 0 := t 0 ≤ τ 1 < · · · < τ l , (3.1) where the 2D random variable X(τ l ) denotes the location of the object at time τ l , h: R 2 → R 2 is a deterministic mapping function, and {W (τ l , τ l+1 ), l = 1, 2, · · · } is a sequence of 2D random variables with probability density functions p W (τ l ,τ l+1 ) . Also, it has been shown that the continuous-time motion of a moving object in such environments can be modeled by stochastic differential equations [27]. Here, we assume the following stochastic differential equation where the 2D random process X(τ ) describes the location of the object at time τ ≥ t 0 , F ∈ R 2×2 and G ∈ R 2×r are continuous matrix time-functions, and {B(τ ) ∈ R r , τ ≥ t 0 } is a random process [4]. For example, in case of anomalous diffusion [18,39], or single molecule radiation in disordered media [3], the model for the motion of the molecule is related to the Lévy stochastic processes.
In particular, in many biological applications, solutions of linear stochastic differential equations are good fits to experimental single-molecule trajectories [27,10,9,8].
1. Image detection processes , Φ, C, Θ and G L (X, W, Z) , (U L , T L ) , Φ, C, Θ driven by a stochastic trajectory X modeled by a linear system with the family of state matrix time-functions Φ, the process noise W , and the measurement process Z for a time interval [t 0 , t] and for a fixed number L of photons are defined as the spatio-temporal processes G [t] and G L , respectively. 2. Let {W g (τ l , τ l+1 ) := W (τ l , τ l+1 ), l = 0, 1, · · · } be a 2D white Gaussian sequence with mean zero and covariance matrix Q g (τ l , τ l+1 ) ∈ R 2×2 , Q g (τ l , τ l+1 ) > 0, and Z(X(T l )) = M X(T l ) + Z g,l , l = 1, 2, · · · , (3.5) where M ∈ R 2×2 is an invertible measurement matrix used in the definition of the image function (Eq. (2.3)), and {Z g,l , l = 1, 2, · · · } is a measurement noise sequence of independent 2D Gaussian random variables with mean zero and the same covariance matrix Σ g ∈ R 2×2 , Σ g > 0. Also, assume that the initial location X(t 0 ) of the object is Gaussian distributed with mean x 0 ∈ R 2 and covariance matrix P 0 ∈ R 2×2 , P 0 > 0. It is assumed that the initial location of the object and noise sequences are independent from one another. The image detection processes C, Θ and G L (X, W g , Z g ) , (U L , T L ) , Φ, M, C, Θ driven by the linear stochastic trajectory X, the family of state matrix time-functions Φ, the measurement matrix M , the Gaussian process noise W g and the Gaussian measurement noise Z g for a time interval [t 0 , t] and for a fixed number L of photons are defined as the spatio-temporal processes G [t] and G L , respectively.
In Corollary 2.4, we calculated the probability density function of the image detection process in terms of the image function q and the predicted probability density function p pr l , l = 1, 2, · · · , of the location of the object, given the previous locations of the detected photons and the current and previous time points. In the following theorem, for a linear stochastic trajectory and Gaussian process and measurement noise, we calculate these predicted distributions p pr l , l = 1, 2, · · · , recursively. Also, for a more general Markov motion model described by a first order system with arbitrary distributed process and measurement noise, we calculate these distributions recursively.
be an image detection process driven by a stochastic trajectory X modeled by a linear system with the family of state matrix time-functions Φ, the process noise W , and the measurement process Z for a time interval [t 0 , t] (or for a fixed number L of photons). Let D k := (U k , T k ) , k = 0, 1, · · · , and , be the predicted probability density function of the location of the object, and p pr1 x|τ 1 , d 0 := p pr1 x|τ 1 .
1. Then, p pr l , l = 1, 2, · · · , can be calculated through the following recursive formula where the filtered probability density function p f i l x|d l := p X(T l )|D l x|d l of the location of the object is given by in which f X(τ ) (r) = p Z(X(τ )) (r) , τ ≥ t 0 , r ∈ C, and the first predicted probability density p pr1 is given by be an image detection process driven by the linear stochastic trajectory X with a Gaussian initial condition, the family of state matrix time-functions Φ, the measurement matrix M , the Gaussian process noise W g , and the Gaussian measurement noise Z g for a time interval [t 0 , t] (or for a fixed number L of photons). Then, for l = 0, 1, · · · , ppr l+1 x|d l , τ l+1 = 1 2π det(P l l+1 ) 2.2. Moreover, the conditional probability density function p U l |T l ,D l−1 is given by Proof. See Section 6.3 in Appendix.

Maximum likelihood estimation.
The main purpose of the presented materials in the previous section is to provide a mathematical framework to estimate the parameters of interest, such as the parameters of the model that describes the motion of a moving object with stochastic trajectories, from the acquired data. In this paper, we use the maximum likelihood estimation approach as follows. For a general parameter estimation problem, denoting the acquired data byd ∈ R m , m = 1, 2, · · · , the maximum likelihood estimateθ mle of θ ∈ Θ is given bŷ where L denotes the likelihood function. In our specific problem, the acquired data for the fixed time interval [t 0 , t] acquisition case is denoted byd , C, Θ is given by, according to Theorem 2.2 (see also [33,34]), for θ ∈ Θ, 1) and the likelihood function L L of G L (U L , T L ) , C, Θ is given by , L = 1, 2, · · · . In Section 6.4, we provide an example to illustrate our results for the specific case that the motion model is described by a simple linear stochastic differential equation and the parameter vector contains the drift and diffusion coefficients.
In the following, we present and discuss the results of the proposed maximum likelihood estimation method when applied to simulated data sets containing linear stochastic trajectories of a single molecule.

Simulated parameters.
To analyze the performance of the proposed maximum likelihood estimation method, we simulated different data sets using parameters commonly used in single molecule experiments. Unless otherwise stated, the images of in-focus and out-of-focus molecules were generated with Airy and Born and Wolf profiles (Eqs. (2.6) and (2.8)), respectively, where n a = 1.4, λ = 520 nm, n o = 1.515, and z 0 = 1 µm. For the Gaussian measurement case, the image of a molecule in the image space was generated with a zero-mean Gaussian measurement noise with the probability density function given by Eq. (2.7), where σ = 70 nm, which is related to the corresponding Airy profile.
Furthermore, a measurement (magnification) matrix M = 100I 2×2 was assumed to map the object space to the image space.

Estimation results.
Using simulated data sets, we first examine the performance of the maximum likelihood estimation method used to estimate the parameters of the linear motion model of a moving molecule in terms of the bias of the method. The bias is assessed by the average of the deviations of the estimated parameter from the true value. For this purpose, we simulated 100 data sets, each containing a trajectory of an out-of-focus molecule simulated using Eqs. (6.12) and (3.4), with the Born and Wold measurement noise (Eq. (2.8)) and the parameters given in  , an example of a molecule trajectory in the object space and its image in the image space are shown. For these data sets, we calculated the maximum likelihood estimates of the diffusion and drift coefficients, separately. Therefore, we need to obtain the predicted distributions in the likelihood function expressions (Eq. (2.9)) through Eqs. (3.6) -(3.8), which in general is a computationally expensive problem. We approximate the predicted distributions using a sequential Monte Carlo algorithm proposed in [31]. The overall approach is explained in Section 6.5 in detail. In Figs. 3(c) and 3(d), the differences between the maximum likelihood estimates of the diffusion and drift coefficients and the true values are plotted. As can be seen, the deviations of the estimates from the ground truth are, overall, centered around 0 nm, which suggests that there is no systematic bias associated with our proposed method (the average of the diffusion coefficient deviations and the drift coefficient deviations are -0.0319 µm 2 /s and 0.0307 1/s, respectively).
We further investigate the predicted distribution p pr l , l = 1, 2, · · · , of the location of the molecule, given previous observations, for the molecule trajectory shown in Figs. are shown in Fig. 4(a) and 4(b). We also show the measurements transformed from the image space to the object space. For a better visual comparison, the means of the predicted distributions of locations of the molecule and the true locations for x-and y-coordinates are also shown over a shorter time interval in Figs. 4(c) and 4(d). As can be seen, the predicted locations are able to track the true locations of the molecule for both x-and y-coordinates. We also show the difference between the means of the predicted distributions of the locations of the molecule and the true locations of the molecule in Fig. 13 (see Section 6.19 in Appendix). We also applied the proposed method to trajectory data of an in-focus molecule simulated using an Airy profile, with the same standard deviation as the Born and Wolf data, and obtained the similar results (see Figs. 11, 12 and 14 in Section 6.6).
As mentioned, in some applications, it is useful to approximate the point spread function of an optical system with a Gaussian profile. We analyzed the error of the estimates for simulated data sets with Gaussian measurement noise, with the same standard deviation as the Born and Wolf data, and obtained the similar results (see Figs. 5, 6 and 15). In order to calculate the predicted locations of the molecule for Gaussian measurements, we took advantage of the relationship between the likelihood function and Kalman filter formulas (see Theorem 3.2). It improved the computational efficiency significantly.

Fisher information matrix and CRLB.
In any estimation problem, the performance of the estimator can be evaluated by calculating their standard deviations from the true parameter values. According to the Cramér-Rao inequality, the covariance matrix of any unbiased estimatorθ of an unknown vector parameter θ is bounded from below by the inverse of the Fisher information matrix I(θ), i.e., Cov(θ) ≥ I −1 (θ). Therefore, a benchmark on the standard deviation of estimates can be obtained by the square root of the inverse of the Fisher information matrix. Note that the Fisher information matrix only depends on the statistical nature of the acquired data and it is independent of the applied estimation technique. We first define the Fisher information matrix of an image detection process at fixed time points in Definition 5.1, and use it to calculate the Fisher information matrix of image detection processes for the fixed time interval and for the fixed number of photons in Theorem 5.2.

1.2.
Assume that the photon detection rate Λ is independent of θ. Then, I [t] can be calculated as where the Fisher information matrix I τ1,··· ,τ K of the image detection process at fixed time points τ 1 , · · · , τ K G τ1,··· ,τ K (U L , T L ) , C, Θ is given by (5.3) in which the Fisher information matrix I τ1,··· ,τ l U l |T l ,D l−1 calculated with respect to the conditional probability density function p θ U l |T l ,D l−1 at fixed time points T l = τ 1:l is given by with r 1:l := (r 1 , · · · , r l ) , τ 1:l := (τ 1 , · · · , τ l ), and I τ1 U1|T1 given by 2.1. The Fisher information matrix I L of G L is given by , and p θ L denotes the probability density function of D L . 2.2. Assume that the photon detection rate Λ is independent of θ. Then, I L can be obtained as Proof. See Section 6.7 in Appendix.
We next derive expressions for the Fisher information matrices of the image detection processes driven by the stochastic trajectory X and image function q for a time interval [t 0 , t] and for a fixed number L of photons in the following corollary to Theorem 5.2.
As mentioned in Section 2, for special cases of an object with a deterministic trajectory and a stationary object, the probability density function of the image detection process G τ1,··· ,τ K at fixed time points t 0 ≤ τ 1 < · · · < τ K is simplified as given by Eqs. (2.4) and (2.5), respectively. We next in Corollary 5.5 to Theorem 5.2 calculate the Fisher information matrix for these special cases, and show that the obtained results are consistent with the results presented in [22,37,35,36].
The material presented in Theorem 5.2 and Corollary 5.3 provides a mathematical framework to calculate the Fisher information matrix of image detection processes for a fixed time interval and for a fixed number of photons for a moving object with a general stochastic motion model. As mentioned before, in many biological applications, the motion of a small object in subcellular environments can be modeled by a linear stochastic differential equation. The solution of this linear stochastic differential equation can be modeled by a first order system with a Gaussian noise. In Corollary 5.6 to Theorem 5.2, we obtain recursive expressions for the Fisher information matrices for both image detection processes for a fixed time interval and fixed number of photons, in case that the dynamical system is described by a first order system with Gaussian process and measurement noise.
be an image detection process driven by a linear stochastic trajectory X with a Gaussian initial condition, the family of state matrix timefunctions Φ, the measurement matrix M , the Gaussian process noise W g , and the Gaussian measurement noise Z g for a time interval [t 0 , t] (or for a fixed number L of photons). Assume that the photon detection rate Λ, M and Z g are independent of θ. Let where P l−1 θ,l is obtained through Eqs. (3.10) and (3.11). Then, the Fisher information matrix I τ1,··· ,τ K in Eq. (5.2) (or Eq. (5.6)) of Theorem 5.2 can be calculated as where, for θ = (θ 1 , · · · , θ n ) ∈ Θ and l = 1, · · · , K, the i, j th , i, j = 1, · · · , n, entry θ,l C T , (5.14) with C := 0 2×2 M , where 0 2×2 denotes the 2 × 2 zero matrix.
Proof. See Section 6.10 in Appendix.
In Section 6.11, we provide an example to illustrate our results for calculating the Fisher information matrix for the specific case of a linear trajectory described in the example provided in Section 6.4.

CRLB and standard deviation of estimates for different photon counts.
We next evaluate the performance of the method in terms of the standard deviation of estimated parameters of the linear motion model for the sets of repeat trajectories. These data sets differ by the mean photon count which ranges from 250 to 1250. For a given data set, the time points of the detected photons are drawn from a Poisson process and are the same for the all trajectories. For these data sets, we calculated the maximum likelihood estimates of the diffusion and drift coefficients, separately. Also, for the given data set and time points, we obtained the CRLBs for the diffusion and drift coefficient by calculating the square root of the inverse of their corresponding Fisher information matrices at the fixed time points. It can be seen in Fig. 7(a) that the standard deviations of the estimates are close to the square roots of their corresponding CRLBs, and when the mean number of photons increases, the standard deviation of the estimates decreases. Also, the percentage differences between the standard deviations and the square roots of the CRLBs are shown in Fig. 7(b). The percentage difference is the difference between the standard deviation of the estimates and the square root of the corresponding CRLB, expressed as a percentage of the square root of the corresponding CRLB. As can be seen in Fig. 7(b), the differences between the standard deviations of the estimates and their respective CRLBs are at most around 10% of the limits of accuracy.

Fisher information matrix for non-Gaussian measurement noise.
So far, for computational purposes and taking advantage of the Kalman filter formulation, we have focused on computing the Fisher information matrix and CRLB only for Gaussian measurements. Although the Gaussian assumption is very useful in some applications, there are many cases for which this assumption can be problematic in practice due to the fact that the Gaussian model is often not a suitable approximation for an analytical image profile. As mentioned earlier, from optical diffraction theory, a typical point spread function for an in-focus molecule is given by the Airy profile. Also, for the out-of-focus scenario, the image function is given by a classical model of Born and Wolf [6]. Here, we computed the Fisher information matrix of both drift and diffusion coefficients for the Airy measurements case and compared the results with the Fisher information matrix obtained for the case that the Airy profile is approximated by a 2D Gaussian profile. The typical approximation of the Airy profile with α := 2πn a /λ by a 2D Gaussian profile with standard deviation σ yields a value of σ = 1.323/α [22]. We only focused on the one photon case, since computing the integrals of the Fisher information expression for the Airy profile case numerically requires a large number of samples and it is computationally expensive (see Section 6.12 in Appendix for the detailed computational procedure). As shown in Fig. 8, the difference between the Fisher information matrices of these two different profiles can be significant.

CRLB and Fisher information matrix for different sets of time points.
To examine further the CRLB on parameter estimation for a moving single molecule with a stochastic trajectory, we calculated the square root of the CRLB for the simulated trajectories with the same parameters as in Fig. 7, and different time points drawn from a Poisson process with the same mean value, which ranges from 250 to 1250. As can be seen in Fig. 9, especially for the drift coefficient, the square root of the CRLBs for different realizations of a Poisson process can vary significantly.
for G L ), p pr l := p X(T l )|T l ,D l−1 , l = 1, 2, · · · , denotes the predicted probability density function of the location of the object, and p pr1 x o |τ 1 , d 0 := p pr1 x o |τ 1 , in which we have used the assumption of Definition 2.3. T L ) , Φ, C, Θ ) be an image detection process derived by a stochastic trajectory X modeled by a linear system with the family of state matrix timefunctions Φ, the process noise W , and the measurement process Z for a time interval [t 0 , t] (or for a fixed number L = 1, 2, · · · , of photons). Let D k := (U k , T k ) , k = 0, 1, · · · , and

Proof of Theorem
, be the predicted probability density function of the location of the object, and p pr1 x|τ 1 , d 0 := p pr1 x|τ 1 .
1. Then, p pr l can be calculated through the following steps: Step 1. For l = 0, Eq. (3.3) becomes Then, by conditioning the both sides of the above equation on T 1 , the conditional probability density function p X(T1)|T1 is given by where x ∈ R 2 , and * denotes the convolution operator. Then, . . .
Step 2l. For l = 1, 2, · · · , let i.e., Since initial location of the object, observation noise, and process noise are mutually independent, according to Eq. (3.4) and Theorem 2.7 of [16], we have Therefore, by substituting Eq. (6.9) into Eq. (6.8) (note that we calculated p pr l in the previous step), Step 2l + 1. By conditioning the both sides of Eq. (3.3) on T l+1 and D l , we have , for l = 1, 2, · · · , which, according to the independency of W (T l , T l+1 ) and U l , T l−1 , becomes or equivalently (note that we calculated p f i l in the previous step), 2.2. By conditioning the both sides of Eq. (3.5) on T l and D l−1 , l = 1, 2, · · · , we have (note that Z g,l is independent of D l−1 ), and r i ∈ C denotes a running variable in the image space. Given the predicted density function p pr l of the location of the object by Eq. (3.9), according to Lemma 6.2 (see Section 6.14), we have
If the only unknown parameter is the diffusion coefficient D, i.e., θ = D, then, 6.5. Sequential Monte Carlo method. We approximate the predicted and filtered probability density functions as where p X(T l+1 )|X(T l ) is the conditional probability density function of X(T l+1 ), given X(T l ) (transition density), δ denotes the Dirac delta function, and the samplesx i l and their corresponding weights w l i , i = 1, · · · , N , are given through the following algorithm [31] 1. Draw initial samples x i 0 N i=1 according to p X(t0) (x 0 ), i.e., (6.20) and set l = 1.
2. Draw independent and identically distributed samples as, for f X(τ ) (r l ) = p Z(X(τ )) (r l ), τ ≥ t 0 , r l ∈ C, 4. Resample new particles x j l , j = 1, · · · , N , such that the probability of x j l =x i l is equal to w i l , i.e., P x j l =x i l = w i l , i = 1, · · · , N.
5. Increment l → l + 1 and return to step 2. 6.6. Estimation results for Airy measurements. Here, we analyze the error of the estimates and predicted locations of the molecule from simulated data sets with an Airy measurement noise, with the same standard deviation as the Born and Wolf and Gaussian data presented in Figs. 3-6, and obtained the similar results (see Figs. 11 and 12). We also show the differences between the means of the predicted distributions of the locations of the molecule and the true locations of the molecule in Fig. 14 (see Section 6.19). , D k := (U k , T k ) , k = 0, 1, · · · . Assume that the conditional probability density functions p θ U l |T l ,D l−1 , l = 1, 2, · · · , of U l , given T l and D l−1 , satisfy the following regularity conditions, for θ = (θ 1 , · · · , θ n ) ∈ Θ, (a) ∂p θ U l |T l ,D l−1 r l |τ l ,d l−1 ∂θi exists for i = 1, · · · , n, for G L , and p θ r 1 |τ 1 , d 0 := p θ r 1 |τ 1 .

Then, the Fisher information matrix I [t] (θ) of G [t] is given by
where d l ∈ C l × R L [t] , K = 1, 2, · · · , and L denotes the likelihood function. By substituting the expression of the likelihood function L [t] of G [t] (Eq. (4.1)) into Eq. (6.21), according to [33,34], we have 22) where P θ N (t) = 0 is the probability of N (t) = 0 and p θ [t] denotes the probability density function of D [t] and N (t).
6.13. Joint probability distribution of arrival time points for a Poisson process.
6.14. Product of multivariate Gaussian distributions.