Monday, February 20, 2017

fft - What statistic is used to determine presence of a signal in noise?


This is a detector problem I believe:


I am being stumped by what appears to be a simple problem. Basically, I have a band of interest. If signal energies exist within this band of interest, then I perform operation X on my signal.


My problem is that I am not sure exactly how to go about 'deciding' if a signal exists or not. In that, after I perform an FFT, I can look for peaks.


But now what?




  • Is the statistic used usually comparing this peak to the surrounding mean of the spectrum? Or is it some other statistic?

  • What statistical measure do I use to simply determine if a signal is present, and go from there?

  • How do I set this value? Simple thresholding?


EDIT Based on feedback:


For this simple case, I am assuming a tone, in white gaussian noise. What I am trying to get a handle on are:




  1. How exactly does one generated a ROC curve. Does one have to go and label all the data first, and then get the true-positive and false-positive rates for a multitude of thresholds?





  2. How does decreasing SNR affect the ROC curve? Move it towards the diagonal?




  3. What is adaptive thesholding doing to a given ROC curve that was otherwise generated without an adaptive threshold?


    3a. What are some common adaptive threshold techniques I can look at that are common?





Answer




This is one of the oldest signal processing problems, and a simple form is likely to be encountered in an introduction to detection theory. There are theoretical and practical approaches to solving such a problem, which may or may not overlap depending upon the specific application.


A first step toward understanding the approaches to the problem is understanding how you would measure the performance of your signal presence detector. There are two important and related metrics used to quantitatively measure how good a detector is: its probability of detection $P_d$ and its probability of false alarm $P_{fa}$.


$P_d$ is specified as the probability that your detector will indicate the presence of the signal of interest, given that the signal is actually there. Conversely, $P_{fa}$ is the probability that your detector will indicate the presence of the signal of interest, given that the signal is not there. As you might expect, then, in a perfect world, we would design a system that yields $P_d = 1$ and $P_{fa} = 0$ and call it a day. As you might also expect, it's not that easy. There is an inherent tradeoff between the two metrics; typically if you do something that improves one, you will observe some degradation in the other.


A simple example: if you are looking for the presence of a pulse against a background of noise, you might decide to set a threshold somewhere above the "typical" noise level and decide to indicate presence of the signal of interest if your detection statistic breaks above threshold. Want a really low false-alarm probability? Set the threshold high. But then, the probability of detection might decrease significantly if the elevated threshold is at or above the expected signal power level!


To visualize the $P_d$ / $P_{fa}$ relationship, the two quantities are often plotted against one another on a receiver operating characteristic curve. Here's an example from Wikipedia:


enter image description here


An ideal detector would have a ROC curve that hugs the top of the plot; that is, it could provide guaranteed detection for any false alarm rate. In reality, a detector will have a characteristic that looks like those plotted above; increasing the probability of detection will also increase the false alarm rate, and vice versa.


From a theoretical perspective, therefore, these types of problems boil down to selecting some balance between detection performance and false-alarm probability. How that balance is described mathematically depends upon your statistical model for the random process that the detector observes. The model will typically have two states, or hypotheses:


$$ H_0: \text{no signal is present} $$ $$ H_1: \text{signal is present} $$


Typically, the statistic that the detector observes would have one of two distributions, according to which hypothesis is true. The detector then applies some sort of test that is used to determine the true hypothesis and therefore whether the signal is present or not. The distributions of the detection statistic is a function of the signal model that you choose as appropriate for your application.



Common signal models are the detection of a pulse-amplitude-modulated signal against a background of additive white Gaussian noise (AWGN). While that description is somewhat specific to digital communications, many problems can be mapped to that or a similar model. Specifically, if you are looking for a constant-valued tone localized in time against a background of AWGN, and the detector observes the signal magnitude, that statistic will have a Rayleigh distribution if no tone is present and a Rician distribution if one is present.


Once a statistical model has been developed, the detector's decision rule must be specified. This can be as complicated as you desire, based on what makes sense for your application. Ideally, you would want to make a decision that is optimal in some sense, based on your knowledge of the distribution of the detection statistic under both hypotheses, the probability of each hypothesis being true, and the relative cost of being wrong about either hypothesis (which I'll talk more about in a bit). Bayesian decision theory can be used as a framework for approaching this aspect of the problem from a theoretical perspective.


In the simplest practical case, the detector might trigger a detection if the detection statistic exceeds a fixed threshold $T$. In a more complicated and practical case, the detector might have some criteria for setting an adaptive threshold $T(t)$ and trigger a detection at time $t$ if the detection statistic breaks the threshold value at that instant. In your description of your problem, you hit on one common method for setting such an adaptive threshold: calculate the neighborhood mean to estimate the "background level", then set a detection threshold some amount above that mean. This can work for some applications, and there are many other ways to arrive at such a threshold.


Given a statistical model for the detector's input and the decision rule used to map that statistic to detection conclusions, one can then calculate the detector's theoretical performance metrics. In the design phase, you would typically calculate these metrics as functions of the free design parameters that you have (for instance, the threshold $T$ above). You can then evaluate the inherent tradeoffs: "if I set $T=5$, then I get $P_d = 0.9999$, but $P_{fa} = 0.01$. That's too high of a false alarm rate, so I better increase the threshold."


Where you eventually decide to sit on the performance curve is up to you, and is an important design parameter. The right performance point to choose depends upon the relative cost of the two types of possible failures: is it worse for your detector to miss an occurrence of the signal when it happens or to register an occurrence of the signal when it hasn't happened? An example: a fictitious ballistic-missile-detector-with-automatic-strikeback-capability would be best served to have a very false alarm rate; starting a world war because of a spurious detection would be unfortunate. An example of the converse situation would be a communication receiver used for safety-of-life applications; if you want to have maximum confidence that it doesn't fail to receive any distress messages, it should have a very high probability of detection of the transmitted signal.


No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...