Mathematical foundation of Capon’s method for planetary magnetic field analysis

Minimum variance distortionless projection, the so-called Capon method, serves as a powerful and robust data analysis tool when working on various kinds of ill-posed inverse problems. The method has not only successfully been applied to multi-point wave and turbulence studies in the context of space plasma physics, but also is currently being considered as a technique to perform the multipole expansion of planetary magnetic fields from a limited data set, such as Mercury’s magnetic field analysis. 5 The practical application and limits of the Capon method are discussed in a rigorous fashion by formulating its linear-algebraic derivation in view of planetary magnetic field studies. Furthermore, the optimization of Capon’s method by making use of diagonal loading is considered.


Introduction
Nonlinear and adaptive filter technique has a wider range of applications in geophysical and space science studies to find the 10 most likely parameter set describing the measurement data or to decompose the data into a set of signals and noise. Above all, the minimum variance distortionless projection introduced by Capon (1969) (hereafter, Capon's method) has successfully been applied to multi-point data analyses for the waves, turbulence fields, and current sheets (Motschmann et al., 1996;Glassmeier et al., 2001;Narita et al., 2003Narita et al., , 2013Contantinescu et al., 2006;Plaschke et al., 2008). The strength of Capon's method lies in the fact that the method performs a robust data fitting even when the spatial sampling or data amount is limited in the 15 measurement (e.g., successfully applied to four-spacecraft data (Motschmann et al., 1996)). Capon's method is currently being considered for planetary magnetic field studies in which the data (i.e., magnetic field samples) are more limited (e.g., non-ideal orbits), and has recently been applied to Mercury's magnetic field data in view of the BepiColombo mission (Toepfer et al., 2020). 20 From a theoretical point of view there exist several origins for the derivation of the method. The first derivation of Capon's method (Capon, 1969), constructed for the analysis of seismic waves, is based on the estimation of frequency-wavenumber spectra. Later on, this approach has been reformulated in terms of matrix algebra (Motschmann et al., 1996). In the light of mathematical statistics Capon's estimator can be regarded as a special case of the maximum likelihood estimator (Narita, 2019). In this work the linear-algebraic formulation of the method (Motschmann et al., 1996) with specific attention to the 25 magnetic field analysis is extended and the application of diagonal loading is discussed to improve the quality of data analysis with more justified applications and limits.

Motivation of Capon's method
The analysis of planetary magnetic fields is of great interest and one of the main tasks in space science. Here we pay special attention to the analysis of Mercury's internal magnetic field which is one of the primary goals of the BepiColombo mission 30 (Benkhoff et al., 2010). The magnetometer on board the Mercury Planetary Orbiter (MPO) (Glassmeier et al., 2010) measures the magnetic field vectors b i = b i x , b i y , b i z T ∈ R 3 at N data points x i , i = 1, ..., N along the orbit in the vicinity of Mercury.
The magnetic fields around Mercury are considered as a composition or superposition of internal fields generated by the dynamo process, crustal and induced fields, which are mainly dominated by dipole and quadrupole fields and external fields generated by the currents flowing in the magnetosphere. For Mercury the external fields contribute a significant amount to the 35 total magnetic field within the magnetosphere (Anderson et al., 2011) and therefore, a robust method is required for seperating the internal fields from the total measured field. Yet, each component has to be properly modeled and parameterized when decomposing the field.
For example, when only data in current-free regions are analyzed, the planetary magnetic field is irrotational and can be 40 parameterized via the Gauss representation (Gauss, 1839). Whithin these current-free regions, the internal magnetic field can be expressed as the gradient of a scalar potential Φ, which can be expanded into a set of basis functions. Considering the expansion into spherical harmonics, the potential for the planetary dipole and quadrupole fields results in where planetary centered coordinates with radius r i ∈ [R M , ∞), azimuth angle λ i ∈ [0, 2π] and polar angle θ i ∈ [0, π] are 45 chosen. R M indicates the radius of Mercury and P m l are the Schmidt-normalized associated Legendre polynomials of degree l and order m. Within this series expansion there occurs a set of expansion coefficients g m l and h m l , named internal Gauss coefficients. By constructing the Gauss coefficients in a vectorial fashion and defining the true coefficient vector as g = g 0 1 , g 1 1 , h 1 1 , g 0 2 , g 1 2 , h 1 2 , g 2 2 , h 2 2 T , the magnetic field vectors at every data point x i and the expansion coefficients are related where the terms of the series expansion are arranged in the matrix H i r i , θ i , λ i , called shape matrix. The vector v i describes the parts that are not parameterized by the underlying model, e.g. the external parts, closing the void between the parameterized field and the measurement noise of the sensors, which is symbolized by the vector n i . Especially, the measurement noise is 55 neither correlated with the parameterized part H i g nor with the non-parameterized part v i .
Summarizing the magnetic field measurements for all N data points into a vector B = b 1 T . . . b N T T ∈ R 3N , the field can be written as Within Eq. (4) the magnetic field vector B and the shape matrix H, given by the underlying model, are known. The coefficient vector g is to be determined by data fitting. Since in most applications the number of known magnetic data points is 65 much larger than the number of wanted expansion coefficients (G 3N ), H is a rectangular matrix in general. Furthermore, the non-parameterized parts of the field and the noise are unknown. Therefore, the direct inversion of Eq. (4) is impossible and g has to be estimated. In this case, Capon's method establishes a robust and useful tool to find the estimated solution for the expansion coefficients in Eq. (4). Since the method does not require the orthogonality of the basis functions, it has a wider range of applications when decomposing the measured data into a set of superposed signals, especially when the number of 70 data points is limited.

Derivation of Capon's method
The following derivation of Capon's method is based on the linear-algebraic formulation (Motschmann et al., 1996) which was formerly applied to the analysis of plasma waves in the terrestrial magnetosphere. Now we are focussing on the analysis of planetary magnetic fields.

75
As illustrated in the previous section, the magnetic field b i measured at data point x i in the vicinity of Mercury and the wanted expansion coefficients g are related via or in a compact form for all N data points where the shape matrix H describes the underlying model.
For every data point x i the noise vector n i is assumed to be Gaussian with variance σ n and zero mean, so that n i = 0 and n i • n i = σ 2 n I, where I is the identity matrix. The angular brackets indicate averaging over an ensemble, e.g., different samples, realizations or measurements. Therefore, b i = H i g + v i holds and equivalently since the model H and the true coefficient vector g are not affected by the averaging.
Because H is not always a square matrix but is in general a rectangular matrix with different sizes between rows and columns, the direct inversion of Eq. (7) is not guaranteed. Let us ignore for the moment the non-existence of H −1 and write 90 down the equation Despite its simplicity it is obviously incorrect. As H −1 does not exist, let us look for another matrix w called filter matrix, which follows the structure of this equation and fulfills in principle the resolution of Eq. (7) with respect to g. Capon's method is just the procedure to construct the filter matrix w and to calculate or better to say estimate g. To do so, some helpful quan-95 tities are introduced. In order to distinguish between the true coefficient vector g and the estimated solution, in the following Capon's estimator will be symbolized by g C .
For the inversion of Eq. (4) the data covariance matrix M and the coefficient matrix P are introduced as follows: Here Q indicates the number of measurements and the circle '•' symbolizes the outer product, which is defined by for any pair of vectors x ∈ R n , y ∈ R m . The dagger † indicates the Hermitian conjugate and the dot stands for the multiplication of the matrices x ∈ R n×1 and y † ∈ R 1×m . Therefore, the diagonal of the matrix P contains the quadratic averaged 105 components of the wanted estimator. It is important to note here that the covariance matrix M must be statistically averaged, otherwise the matrix is singular with a vanishing determinant and the further analysis cannot be achieved.
If the non-invertibility of the matrix B α • B α is neglected, for every measurement α = 1, . . . , Q an estimator g α C for the true coefficient vector g can be determined, so that is valid. Thereby, each estimator deviates from the true coefficient vector by an error vector ε α = g − g α C . Note that because of the non-invertibility of the matrix B α • B α the single estimator g α C cannot be calculated. Since the invertibility is solely given by the averaging over Q measurements, only the averaged estimator 115 with its related error is available.
In contrast to the estimator, the true coefficient vector is a theoretical given vector, that is not affected by the averaging 120 (g ≡ g , Eq. (7)). This property directly links the estimator to the true coefficient vector, which can be rewritten as Averaging over Q measurements and using g ≡ g results in and analogously for the second order moments.
For the further evaluation of Eq. (4) the definition of the outer product (Eq. 11) is utilized. Matrix multiplication of Eq. (4) with its Hermitian adjoint and averaging yields 135 and therefore assuming, that n is Gaussian with variance σ n and zero mean ( n = 0). By means of the limit ε → 0 in Eq. (19) the unknown matrix g • g and the true coefficient vector g can be approximated by Capon's estimator, resulting in 140 Taking into account that n • n = σ 2 n I and using the above defined abbreviations, Eq. (22) can be rewritten as Since Eq. (23) cannot be directly solved for P, the goal is to find the best estimator g C for g, so that P = g C • g C is obtained as an approximate solution of Eq. (23). Therefore, a filter matrix w is constructed, that separates the parameterized field from the noise by projecting the measured data onto the parameter space and simultanously truncates the non-parameterized parts, i.e.
Applying the filter matrix to the average of the non-parameterized parts of the field in Eq. (4)

150
where the true coefficient vector has been replaced by Capon's estimator (Eq. 16) and taking into account that n = 0 leads to the distortionless constraint This equation is one of the important constraints for the construction of the wanted filter matrix w but it is not enough.
Let us look for another criterion. The basic idea is that in Eq. (4) there may be contributions v in the data B which are not 155 caused by the internal magnetic field and thus, these contributions are not modeled by H g. Although the filter matrix w is already designed to truncate these parts, i.e. w † v = 0, their contributions to the data are unknown. This yields the following procedure.
Conferring to Eq. (24) the average output power, which is defined as the sum of the quadratic averaged components of the estimator, can be rewritten as (Pillai , 1989) 160 where tr g C • g C is the trace of the matrix g C • g C .
Using Eq. (28), the coefficient matrix P can be expressed by the weight w and the data covariance M as 165 Since the amount of the data's noise and the amount of the non-parameterized parts are unknown, we assume here, that a large part of the data are influenced by the noise and the non-parameterized parts, that have to be truncated by the matrix w and the underlying model keeps the minimal contribution to the data. This minimal contribution has to be extracted.
Therefore, the output power P = tr P has to be minimized with respect to w † , subject to the distortionless constraint w † H = 170 I or equivalently H † w = I . Using the Lagrange multiplier method this minimization problem can be formulated as where Λ are the related Lagrange multipliers and the minimum is taken with respect to w. Since the components w ij and w † ij of the matrix w and w † respectively, are independent of each other, Eq. (30) can be expanded as 175 or equivalently with related additional Lagrange multipliers Γ. Taking the derivatives with respect to w ki and w † ij results in and and therefore

190
Because P = tr P is a convex function, Λ and Γ are realizing the minimal output power.
Due to the ensemble averaging the matrix M is invertible and M −1 exists. Multiplying Eq. (36) with H † M −1 and again considering the distortionless constraint yields

195
By means of Eq. (34) the filter matrix results in and therefore Capon's estimator is given by Regarding the expensive derivation, this compact formular for Capon's estimator is surprising. The same expression can be 200 derived by treating Capon's method as a special case of the maximum likelihood estimator (Narita, 2019).
Since Capon's method has formerly been applied to the analysis of waves, the existing derivations treat the non-parameterized parts of the field as Gaussian noise. Considering the analysis of planetary magnetic fields this assumption is indefensible. For example, the external parts of the field are systematic noise and cannot be modeled by a Gaussian distribution. Therefore, the 205 above presented derivation generalizes the previous derivations of Capon's method.

Diagonal Loading
The filter matrix w is the key parameter to distinguish between the parameterized and the non-parameterized parts of the field.
Conferring to Eq. (23) the ratio of these parts defines the input signal-noise-ratio The filter matrix is applied to the disturbed data for estimating the output power, that is related to Capon's estimator. Thus, the output signal-noise-ratio can be expressed as (Haykin, 2014;Van Trees, 2002) since the filter fulfills the distortionless constraint w † H = I and truncates the non-parameterized parts of the field, i.e. w † v = 0. The ratio of the output and the input signal-noise-ratio is the so-called array gain (Van Trees, 2002) which is controlled by tr w † w .
The input signal-noise-ratio is given by the data and the underlying model and therefore, SN R i = const. in Eq. (44). In contrast to the input signal-noise-ratio, the output signal-noise-ratio depends on the filtering. When tr w † w is large, the where G again indicates the number of wanted expansion coefficients and I ∈ R G×G is the identity matrix, so that holds. Thus, Eq. (45) can be rewritten as It should be noted that the matrix T can be chosen arbitrarily as long as it is independent of w and w † .
Conferring to the previous section and considering the additional quadratic constraint, the filter matrix can be calculated by with respect to w, where σ 2 d is the related additional Lagrange multiplier. Carrying out the same procedure as described in section 3, the constrained minimizer is given by The comparison of Eq. (40) with Eq. (50) shows that the quadratic constraint results in the addition of the constant value σ 2 d to 240 the diagonal of the data covariance matrix M, which is known as diagonal loading (Van Trees, 2002). Consequently, the filter matrix is designed for a higher Gaussian background noise than is actually present (Van Trees, 2002).
In Figure 1 tr w † w is displayed with respect to σ = σ 2 d + σ 2 n . For σ d → 0, tr w † w is large resulting in a small output signal-noise ratio, which can cause signal elimination. For increasing values of σ d , tr w † w decreases and for σ d → ∞ 245 Capon's filter converges to the least square fit filter or equivalently that treats all data equally. Figure 1. Sketch of the trace of the filter matrix with respect to σ = σ 2 d + σ 2 n . For increasing values of the diagonal loading parameter σ d the output signal-noise ratio increases and Capon's filter converges to the least square fit filter, if the diagonal loading parameter is large.
Since tr w † w is a monotonically decreasing function, σ d → ∞ might be the best choice for the diagonal loading parameter.
To check this expectation we have to take a look at the output power P d for the synthetic increased noise, which can be estimated by replacing M → M + σ 2 d I in Eq. (39), resulting in (Pajovic, 2019;Richmond et al., 2005) Thus, P d is an increasing function of σ d . Since the output power has to be minimized one would expect that σ d = 0 is the best choice, which is in contradiction to the argumentation above. Therefore, the maximization of the array gain is not equivalent to the minimization of the output power (Van Trees, 2002). Since σ d cannot be directly calculated within the minimization procedure (Eq. 49), it is not clear how to choose the optimal diagonal loading parameter σ opt. , that lies somewhere between those extrema.

260
In the literature there exist several methods for determining the optimal diagonal loading parameter (Pajovic, 2019). In contrast to the analysis of waves, we favor measurements that are stationary up to the Gaussian noise for the analysis of planetary magnetic fields. Comparing the measurement times with planetary geology time scales this assumption is surely valid for the internal magnetic field. For the external parts of the field this can be realized by choosing data sets of preferred situations, 265 e.g. calm solar wind conditions. By virtue of the stationarity, the data covariance matrix M = B • B + σ 2 n I contains only one non-trivial eigenvalue λ 1 = B 2 + σ 2 n and λ i = σ 2 n , for i = 2, . . . , 3N elsewhere. Therefore, estimators for the diagonal loading parameter, that are related to eigenvalues corresponding to interference and noise (Carlson, 1988) or estimators taking into account the standard deviation of the diagonal elements of the data covariance matrix (Ma and Goh, 2003) cannot be applied. When simulated data are analyzed the deviation between Capon's estimator and the true coefficient vector, implemented 270 in the simulation, is a useful metric for estimating the optimal diagonal loading parameter (Pajovic, 2019;Toepfer et al., 2020).
But when the method is applied to real spacecraft data no true coefficient vector is available and therefore, another estimation method for the diagonal loading parameter, that solely depends on the data and the underlying model is required.
The additional quadratic constraint (Eq. 45), resulting in diagonal loading, bounds the trace of the filter matrix, that is the 275 solution of the minimization procedure (Eq. 49). To prevent signal elimination, tr w † w and P d have to be minimized simultanously. Since tr w † w is decreasing for higher values of σ d (cf. Figure 1), P d is an increasing function and thus, they act as competitors. At the optimal value σ opt. the two competitors compromise. Therefore, the diagonal loading of the data covariance matrix is equivalent to the Tikhonov regularization for ill-posed problems and the optimal diagonal loading parameter can be estimated analogously by the method of the L-curve (Hiemstra et al. (2002)). The L-curve arises by plotting lg tr w † w 280 versus lg tr w † M + σ 2 d I w for different values of σ d and is displayed in Figure 2. The optimal value σ opt. is located in the vicinity of the L-curve's knee, which is defined by the maximum curvature of the L-curve (Hiemstra et al., 2002). When Capon's method is applied to the reconstruction of Mercury's internal magnetic field (Toepfer et al., 2020), several 285 computationally burdensome matrix inversions are necessary to calculate the estimator Thus, for the practical application of Capon's method it is useful to rewrite the method in terms of the least square fit method.
Capon's estimator inherits the matrix operation structure of the least square fit estimator, which is given by (Haykin, 2014) 290 Substituting H → M −1/2 H and B → M −1/2 B in Eq. (55), the least square fit estimator may be converted into Capon's estimator.
The least square fit method minimizes the deviaton between the disturbed measurements B and the underlying model H g, 295 measured in the Euclidian norm ||.|| 2 , so that Conferring to the above mentioned substitutions, Capon's method can be interpreted as measuring the deviation in Eq. (56) in a different norm 300 so that Capon's estimator is given by Thus, Capon's method can be regarded as a special case of the least square fit method or more precisely of the weighted least square fit method, where the data and the model are measured and weighted with the inverse data covariance matrix.
This property is useful for the practical application of Capon's method. In contrast to the computationally burdensome matrix 305 inversions in Eq. (41), the usage of minimization algorithms, such as gradient descent or conjugate gradient method, for solving Eq. (58) is more stable and computationally inexpensive.
To illustrate the above presented mathematical foundations, Capon's method is applied to simulated magnetic field data in order to reconstruct Mercury's internal magnetic field. As a proof of concept, the underlying model is restricted to the internal dipole and quadrupole contributions to the magnetic field as discussed in section 2.

310
For the reconstruction of Mercury's internal dipole and quadrupole field the internal Gauss coefficients g 0 1 = −190 nT and g 0 2 = −78 nT (Anderson et al., 2012;Wardinski et al., 2019), defining the non-vanishing components of the true coefficient vector g are implemented in the hybrid code AIKEF (Müller et al., 2011) and the magnetic field data resulting from the plasma interaction of Mercury with the solar wind are simulated. The data are evaluated along selected parts of the prospective 315 trajectories of the BepiColombo mission on the night side of Mercury within a distance of 0.2 R M up to 0.4 R M from Mercury's surface. Since simulated data are analyzed, the deviation between the true coefficient vector g and Capon's estimator g C can be used as a metric to verify the estimation of the optimal diagonal loading parameter by making use of the L-curve technique.
The optimal diagonal loading parameter results in σ opt. ≈ 800 nT which corresponds with the vicinity of the L-curve's knee.
The reconstructed Gauss coefficients for the internal dipole and quadrupole field are presented in Table 1. The deviation g C − g between Capon's estimator and the true coefficient vector results in 30.2 nT or 14.7 %, respectively, for the optimal diagonal loading parameter. When the magnetic field data are evaluated at an ensemble of data points with a distance of 0.2 R M from Mercury's surface this deviation is of the same order (Toepfer et al., 2020). Extending the underlying model by a parameterization of the external parts of the magnetic field improves Capon's estimator especially when the magnetic field data are evaluated in some distance above Mercury's surface.
Capon's method is a robust and useful tool for various kinds of ill-posed inverse problems, such as Mercury's planetary magnetic field analysis. The derivation of the method can be regarded from different mathematical perspectives. Here we revisited the linear-algebraic matrix formulation of the method and extended the derivation for Mercury's magnetic field analysis.
Capon's method becomes even more robust by incorporating the diagonal loading technique. Thereby, the construction of a 330 filter matrix is vital to the derivation of Capon's estimator.
Especially the trace of the filter matrix determines the array gain, which is defined as the ratio of the output and the input signal-noise-ratio. If the trace is large, the output signal-noise-ratio can decrease resulting in signal elimination and thus, the performance of Capon's estimator degrades. Bounding the trace of the filter matrix results in diagonal loading of the data 335 covariance matrix, which improves the robustness of the method. The main problem of the diagonal loading technique is that in general it is not clear how to choose the optimal diagonal loading parameter. Since for the analysis of planetary magnetic fields we prefer measurements that are stationary up to the Gaussian noise, estimators for the diagonal loading parameter, that are related to eigenvalues of the data covariance matrix corresponding to interference and noise cannot be applied. Making use of the L-curve's technique enables a robust procedure for estimating the optimal diagonal loading parameter.

340
For the calculation of Capon's estimator several computationally burdensome matrix inversions are necessary. Interpretation of Capon's method as a special case of the least square fit method enables the usage of numerically more stable and less burden minimization algorithms, e.g. gradient descent or conjugate gradient method, for calculating Capon's estimator.

345
It should be noted that the parameterization of Mercury's internal magnetic field via the Gauss representation, as mentioned in section 2, is only one of several possibilities for modeling the magnetic field in the vicinity of Mercury. The underlying model can be extended by other parameterizations, for example the Mie representation (toroidal-poloidal decomposition) or magnetospheric models and Capon's method can be applied to estimate the related model coefficients. Besides the analysis of Mercury's internal magnetic field, the extention of the model also enables the reconstruction of current systems flowing in 350 the magnetosphere. Concerning the BepiColombo mission this work establishes a mathematical basis for the application of Capon's method to analyze Mercury's internal magnetic field in a robust and manageable way.
Data availability. The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions. All authors contributed conception and design of the study; ST, YN and UM wrote the first draft of the manuscript; All authors contributed to manuscript revision, read and approved the submitted version.