Monday, April 30, 2018

Detecting direction of sound using several microphones


First of all, I've seen a similar thread, however it's a bit different to what I'm trying to achieve. I am constructing a robot which will follow the person who calls it. My idea is to use 3 or 4 microphones - i.e. in the following arrangement in order to determine the from which direction the robot was called:


enter image description here


Where S is source, A, B and C are microphones. The idea is to calculate phase correlation of signals recorded from pairs AB, AC, BC and based on that construct a vector that will point at the source using a kind of triangulation. The system does not even have to work in real time because it will be voice activated - signals from all the microphones will be recorded simultaneously, voice will be sampled from only one microphone and if it fits the voice signature, phase correlation will be computed from the last fraction of second in order to compute the direction. I am aware that this might not work too well i.e. when the robot is called from another room or when there are multiple reflections.



This is just an idea I had, but I have never attempted anything like this and I have several questions before I construct the actual hardware that will do the job:



  1. Is this a typical way of doing this? (i.e. used in phones for noise cancellation?) What are other possible approaches?

  2. Can phase correlation be calculated between 3 sources simultaneously somehow? (i.e. in order to speed up the computation)

  3. Is 22khz sample rate and 12bit depth sufficient for this system? I am especially concerned about the bit depth.

  4. Should the microphones be placed in separate tubes in order to improve separation?



Answer



To extend Müller's answer,





  1. Should the microphones be placed in separate tubes in order to improve separation?




  1. No, you are trying to identify the direction of the source, adding tubes will only make the sound bounce inside the tube which is definitely not wanted.

    The best course of action would be to make them face straight up, this way they will all receive similar sound and the only thing that is unique about them are their physical placements which will directly affect the phase. A 6 kHz sine wave has a wavelength of $\frac{\text{speed of sound}}{\text{sound frequency}}=\frac{343\text{ m/s}}{6\text{ kHz}}=5.71\text{ mm}$. So if you want to uniquely identify the phases of sine waves up to 6 kHz, which are the typical frequencies for human talking, then you should space the microphones at most 5.71 mm apart. Here is one item that has a diameter that is less than 5.71 mm. Don't forget to add a low pass filter with a cut-off frequency at around 6-10 kHz.





Edit


I felt that this #2 question looked fun so I decided to try to solve it on my own.





  1. Can phase correlation be calculated between 3 sources simultaneously somehow? (i.e. in order to speed up the computation)



If you know your linear algebra, then you can imagine that you have placed the microphones in a triangle where each microphone is 4 mm away from each other making each interior angles $60°$.


So let's assume they are in this configuration:


       C
/ \
/ \

/ \
/ \
/ \
A - - - - - B

I will...



  • use the nomenclature $\overline{AB}$ which is a vector pointing from $A$ to $B$

  • call $A$ my origin

  • write all numbers in mm


  • use 3D math but end up with a 2D direction

  • set the vertical position of the microphones to their actual wave form. So these equations are based on a sound wave that looks something like this.

  • Calculate the cross product of these microphones based on their position and waveform, then ignore the height information from this cross product and use arctan to come up with the actual direction of the source.

  • call $a$ the output of the microphone at position $A$, call $b$ the output of the microphone at position $B$, call $c$ the output of the microphone at position $C$


So the following things are true:



  • $A=(0,0,a)$

  • $B=(4,0,b)$

  • $C=(2,\sqrt{4^2-2^2}=2\sqrt{3},c)$



This gives us:



  • $\overline{AB} = (4,0,a-b)$

  • $\overline{AC} = (2,2\sqrt{3},a-c)$


And the cross product is simply $\overline{AB}×\overline{AC}$


$$ \begin{align} \overline{AB}×\overline{AC}&= \begin{pmatrix} 4\\ 0\\ a-b\\ \end{pmatrix} × \begin{pmatrix} 2\\ 2\sqrt{3}\\ a-c\\ \end{pmatrix}\\\\ &=\begin{pmatrix} 0\cdot(a-c)-(a-b)\cdot2\sqrt{3}\\ (a-b)\cdot2-4\cdot(a-c)\\ 4\cdot2\sqrt{3}-0\cdot2\\ \end{pmatrix}\\\\ &=\begin{pmatrix} 2\sqrt{3}(b-a)\\ -2a-2b-4c\\ 8\sqrt{3}\\ \end{pmatrix} \end{align} $$


The Z information, $8\sqrt{3}$ is just junk, zero interest to us. As the input signals are changing, the cross vector will swing back and forth towards the source. So half of the time it will point straight to the source (ignoring reflections and other parasitics). And the other half of the time it will point 180 degrees away from the source.


What I'm talking about is the $\arctan(\frac{-2a-2b-4c}{2\sqrt{3}(b-a)})$ which can be simplified to $\arctan(\frac{a+b+2c}{\sqrt{3}(a-b)})$, and then turn the radians into degrees.



So what you end up with is the following equation:


$$\arctan\Biggl(\frac{a+b+2c}{\sqrt{3}(a-b)}\Biggr)\frac{180}{\pi}$$




But half the time the information is literally 100% wrong, so how.. should one.... make it right 100% of the time?


Well if $a$ is leading $b$, then the source can't be closer to B.


In other words, just make something simple like this:


source_direction=atan2(a+b+2c,\sqrt{3}*(a-b))*180/pi;
if(a>b){
if(b>c){//a>b>c
possible_center_direction=240; //A is closest, then B, last C

}else if(a>c){//a>c>b
possible_center_direction=180; //A is closest, then C last B
}else{//c>a>b
possible_center_direction=120; //C is closest, then A last B
}
}else{
if(c>b){//c>b>a
possible_center_direction=60; //C is closest, then B, last A
}else if(a>c){//b>a>c
possible_center_direction=300; //B is closest, then A, last C

}else{//b>c>a
possible_center_direction=0; //B is closest, then C, last A
}
}

//if the source is out of bounds, then rotate it by 180 degrees.
if((possible_center_direction+60) if(source_direction<(possible_center_direction-60)){
source_direction=(source_direction+180)%360;
}

}

And perhaps you only want to react if the sound source is coming from a specific vertical angle, if people talk above the microphones => 0 phase change => do nothing. People talk horizontally next to it => some phase change => react.


$$ \begin{align} |P| &= \sqrt{P_x^2+P_y^2}\\ &= \sqrt{3(a-b)^2+(a+b+2c)^2}\\ \end{align} $$


So you might want to set that threshold to something low, like 0.1 or 0.01. I'm not entirely sure, depends on the volume and frequency and parasitics, test it yourself.


Another reason for when to use the absolute value equation is for zero crossings, there might be a slight moment for when the direction will point in the wrong direction. Though it will only be for 1% of the time, if even that. So you might want to attach a first order LP filter to the direction.


true_true_direction = true_true_direction*0.9+source_direction*0.1;

And if you want to react to a specific volume, then just sum the 3 microphones together and compare that to some trigger value. The mean value of the microphones would be their sum divided by 3, but you don't need to divide by 3 if you increase the trigger value by a factor 3.





I'm having issues with marking the code as C/C#/C++ or JS or any other, so sadly the code will be black on white, against my wishes. Oh well, good luck on your venture. Sounds fun.


Also there is a 50/50 chance that the direction will be 180 away from the source 99% of the time. I'm a master at making such mistakes. A correction for this though would be to just invert the if statements for when 180 degrees should be added.


No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...