Frequency Estimation of Irregularly Sampled Data Using a Sparsity Constrained Weighted Least-Squares Approach

In this paper, a new method for frequency estimation of irregularly sampled data is proposed. In comparison with the previous sparsity-based methods where the sparsity constraint is applied to a least-squares fitting problem, the proposed method is based on a sparsity constrained weighted least-squares problem. The resulting problem is solved in an iterative manner, allowing the usage of the solution obtained at each iteration to determine the weights of the least-squares fitting term at the next iteration. Such an appropriate weighting of the least-squares fitting term enhances the performance of the proposed method. Simulation results verify that the proposed method can detect the spectral peaks using a very short data record. Compared to the previous one, the proposed method is less probable to miss the actual spectral peaks and exhibit spurious peaks. Keywords-basis pursuit; sparse representation; overcomplete dictionaries; irregular sampling; spectrum estimation


INTRODUCTION
Spectral analysis of irregularly sampled data is ubiquitous in science and technology.The range of applications for this branch of signal processing includes, among others, the estimation of Doppler frequency in radars [1], astronomical data analysis [2], the analysis of the heart rate variability in medical sciences [3] and turbulent velocity analysis in measurement systems [4].

Let us denote by
.
The frequency grid on which the spectrum is to be computed is denoted by with max , , , for a predefined maximum frequency max f .Also let us define an overcomplete dictionary of complex exponentials which is a K × matrix defined as: ( ) where ( ) exp A for matrix A is defined as a matrix of the same size as A with elements computed as the exponential of the corresponding elements in A. From (3) it can be inferred that the m th column of W is a complex exponential function of frequency m f sampled at the sampling instants t.
When the data set is known to be sinusoidal (which is the case for most applications), one of the most powerful techniques of spectrum estimation is to represent the dataset y as a linear combination of the columns of W ; i.e. to obtain a vector ŝ so that ˆ. = y Ws .If so, the nonzero elements of ŝ demonstrate the amplitude of the spectral components at corresponding frequencies in .
f Redundancy of W gives rise to infinitely many solutions ŝ which opens the way to enforcing some more constraints to the solution.The simplest of them are the least-squares fitting methods; e.g. the Lomb-Scargle periodogram [5].More sophisticated methods penalize the least-squares fitting term with a sparsity-enforcing term [6][7][8].
In [2,[9][10], the sparsity constraint is formulated as the 1 ℓ norm of the solution.This method is widely known in the literature as Basis Pursuit [11].These methods are capable to estimate the spectral lines from very limited data records and allow super-resolution frequency estimation [9][10].In [9], the dictionary W is defined as a collection of cisoids instead of complex exponentials.It is shown in [2,10] that sparse approximation in the complex exponential dictionary outperforms the method of [9].In this paper, we zoom in on the method of [10] which is the most recent amongst the above methods.
The sparse approximation of y in W is the solution to the following optimization problem [12]: where λ is called the regularization parameter and 1 s is the norm of s defined as:

www.etasr.com Zahedi and Kahaei: Frequency Estimation of Irregularly Sampled Data Using a Sparsity Constrained…
The cost function in (4) is the sum of two terms.The first one enforces the least squares fitting of the solution to the data set, and the second one guarantees the sparsity of the solution.This problem can be solved in polynomial time using a variety of methods [10].It has been shown in [10] that if ŝ is a solution to (5) and 0 ˆ2 < s (where 0 ŝ counts the number of nonzero elements in ˆ), s then it is the unique sparsest solution.
Roughly speaking, the minimizer of (4) is a sparse vector s fits to the data-set in the least-squares sense.ˆm s is thus an estimation of the amplitude of the signal component at frequency m f .Problem (4) is known in the literature as Basis Pursuit De-Noising (BPDN) [10,11].
In this paper, instead of the sparsity constrained leastsquares problem of (4), another cost function is proposed which can be interpreted as a sparsity constrained weighted leastsquares problem, and is solved iteratively.In order to refine the fitting to the data-set, in each iteration the obtained temporary solution is utilized to weight the least-squares term in the next iteration.Since the proposed sparsity-based method solves a weighted least-squares problem in an iterative manner, it falls into the class of methods named Iterative Adaptive Approach (IAA) [13][14][15].This paper is organized as follows: In Section II, the proposed method is described in detail.In Section III, the proposed method is compared to the BPDN method of [10] formulated in (4) using simulation results.The paper is concluded in Section IV.

A. Problem Formulation
Problem (4) can be rewritten in the following componentwise form: ( ) The Iterative Coordinate Descent (ICD) method described in [10] can be used to solve this scalar problem considering the following equivalence: where m w is the m th column of W, m e is defined as: and for any complex number j x x e θ = the complex soft shrinkage function ( ) is defined as [10]: It is necessary to mention again that the cost function in (6) is sum of two terms: a least-squares data-fitting term 2 − y Ws and a sparsity enforcing 1 ℓ norm term 1 s .The value of the regularization parameter λ determines the relative emphasis on each one of these two terms.A large λ should be interpreted as a tendency to obtain a very sparse solution, while with a small λ the sparsity of the solution is sacrificed in favor of its fidelity to the data-set.The regularization parameter therefore makes a trade-off between the two important parameters of sparsity and fidelity to the data-set.The level of this trade-off is improved by increase of the number of data samples .When is large enough, the solution to ( 6) can be both sparse and fitted to the data set.The question is that if is limited, is it possible to somehow improve the trade-off level?One possible answer is to be "parsimonious" in spending the recourses on minimization of the cost function by "wise" fitting to the data-set; i.e., by using a weighted least-squares fitting to the data-set instead of the simple least-squares.
Let us temporarily assume that for a specific m ( ) , the solution ˆr s to the optimization problem is available for all Inspired by [13], we define the covariance matrix for the m th component of ŝ as: where r a is defined as: ( ) For any nonzero vector ∈ u ℂ we have: Thus m ′ Q is nonnegative definite.Notice that according to (10), m ′ Q is the sum of 2M matrices Therefore, it may or may not be of full-rank.Specially, since ŝ is expected to be sparse, the 2 nd and 4 th cases of the above relation are likely to happen.The following is a revised definition of the covariance matrix which has full-rank: where ε is a small constant and I is the × identity matrix.Since m Q is nonnegative definite and full-rank, it is Positive Definite (PD) and of course invertible.The optimization problem ( 6) is then replaced by the following 1 ℓ regularized weighted least-squares problem: The problem (14) obviously cannot be solved directly, since one has to know its solution to compute m Q using (13).

B. How to Solve It?
As was shown in Part II.A, the matrix m Q is PD, which means that 1 m − Q is also PD.Therefore, there is a unique Cholesky factor m R for Substituting ( 15) in ( 14) yields: ( ) where m ′ y and m ′ W are defined as: Problem ( 16) can be solved using the ICD method (7-9) assuming that ˆr s is known for all , r m ≠ so that , m Q m R and therefore m ′ y and m ′ W can be computed.To cope with the problem of unavailability of ˆ, r s an iterative scheme can be used: starting from an initial guess (0) ˆˆˆ, , , , .
The proposed algorithm is summarized in Table I.
It is noteworthy that an excellent initial guess to start the iterations is to use the solution of (4).

C. Computational Complexity
We compute the complexity order of one iteration of the proposed algorithm summarized in Table I.Here is the order of computations for the 5 steps of the algorithm: a a should be neglected, since it is performed only for one time at the beginning and will

Compute ( )
ˆi m s using the ICD method as: not vary during the whole process.Also, we define the following matrix: Q for only one time, all the matrices ( ) Therefore, Step 1 includes one single computation of ( ) i Q of order 2 M and computation of ( ) each one of order 2 .The overall process will be of order 2 .M 2. This step includes the computation of the inverse of ( ) i m Q and then the Cholesky decomposition (each of order 3 ) for ,..., .
The overall computations will be of order 3 .M 3. The complexity of this step is dominated by the computation of the matrix multiplication which is of order 2 M for each m.The overall cost is thus 2 2 .M 4. Computational cost of Steps 4 and 5 is obviously negligible compared to Steps 2 and 3.
Therefore, the dominant steps are Step 2 of order 3 M and Step 3 of order 2 2  M .Since the dictionary is redundant, M is much larger than , and therefore the overall computational complexity of the proposed method is of order 2 2 M .
Complexity of the ICD method to solve the BPDN problem (4) is of order M which implies that the proposed method offers an improved performance in price of higher complexity.

III. SIMULATION RESULTS
In this section, the proposed method is compared with the BPDN method of [10] formulated in (4).To do so, we consider a signal with the following form: ( ) where k ϕ are i.i.d.random variables in interval [0,2π) with uniform distribution and ( ) t ν is the additive noise.A data-set containing samples is then formed by sampling this signal according to an irregular sampling pattern.The two methods are applied to the resulting data-set several times to be able to compare the frequency estimation methods accurately.
The simulated signal consists of three sinusoidal terms with frequencies 1, 1.1 and 3 Hz with amplitudes of 3, 4 and 1, respectively.The additive noise is assumed to be zero-mean white Gaussian with standard deviation of 0.2.

σ =
The sampling pattern shown in Figure 1   and ρ is a constant parameter [10].For 1 ρ ≥ the solution to (4) is identically zero [10], therefore ρ is assumed to be less than 1.
In practice, 0.05 ρ = seems to be a good choice [10].In simulations, we use 0.05 ρ = for both the proposed method and BPDN method.The value of ε is assumed to be 0.001.
The result of 20 independent simulations altogether has been shown in Figure 2. As can be seen from Figure 2(a), the spectrum estimated by BPDN is full of spurious peaks.The amplitudes of many of these spurious peaks are higher than that of the actual peak at 3 Hz.Figure 2(b) shows that spurious peaks obtained by the proposed method are very rare with very small amplitudes.Moreover, the proposed method is less likely to miss the small actual peak at 3 Hz.To illustrate this, the area nearby the frequency 3 Hz in Figure 2 has been zoomed in and shown in Figure 3. Accurate counting of the number of circles (detected peaks) at 3 Hz shows that the number of those peaks with nonzero amplitude in Figure 3(a) is 15, while in Figure 3(b) is 20.This means that BPDN has missed this peak in 5 experiments, while the proposed method has never missed it.IV.CONCLUSION An iterative sparsity-based method was proposed for the estimation of spectral lines of nonuniformly-sampled signals.
The proposed method enforces the sparsity constraint to a weighted least-squares problem.It was shown via computer simulations that it is capable of estimating the spectral lines of the signal provided that a very limited number of data samples taken with arbitrary sampling pattern is available.The proposed method seems to be sensitive to the choice of a parameter named ε .Therefore, the selection of an appropriate value for this parameter remains an open problem.
(depending on the two different possibilities ˆ0 m s = and ˆ0 m s ≠ ).The matrix thus has the following rank:

Q
y, dictionary W, regularization parameter λ, parameter ε , initial guess from the following relation:

2
e. the difference between two adjacent sampling instants follows an exponential distribution with mean 1 s.Number of data samples is assumed be 24.= The frequency grid f is limited to max 10 f ± = ± Hz, therefore covers the interval of -10 to 10 Hz with step-size of 0.05 Hz.Only 5 I = iterations of the proposed algorithm were performed.

Figure 2 (
Figure 2(b)shows that the amplitude of the spectral peaks suggested by the proposed method is not exact.However, this should not be a serious problem.After the detection of the spectral peaks, their amplitude can be corrected by lest-squares fitting to the data-set y.The result of fitting the 3 spectral peaks detected by the proposed method to y is shown in Figure4.The actual values of the amplitudes are also shown in this figure with cross signs.As seen, the estimate values are very close to the actual ones.

Fig. 4 .
Fig. 4. Estimate amplitude of the spectral peaks detected by the proposed method after least-squares fitting to the data-set.The actual values are shown by cross signs.