Good afternoon everyone, this is Matthew Wall here, Director of Technology at Influxis. Today I am writing my first blog post, and I sure do hope this is a useful one! This post is all about echo cancellation in Flash using Speex. Many Flash applications benefitted from the release of the EnhancedMicrophone with Flash Player 10.3 and AIR 2.7, and was received with much enthusiasm. When our customers began to implement this, they told us of how relatively poor the echo cancellation seemed to be, in many cases, any improvement in echo cancellation was unnoticeable. For awhile we had recommended using push-to-talk functionality or using the audio detection to mute the non-talking parties or perhaps implement a speaker token that is passed around. Not anymore.
So what are these techniques I refer to? Well let’s go into the causes of the echo loopback. The echo, as most readers will likely already know, is caused when the microphone of the receiving party captures enough of the returning audio from the speakers of the listener to produce an audible response on the speakers on the sending party. This in effect causes the speaker to hear himself through his own speakers, with a slight delay (accommodating the latency in the round trip). Most of the time the speaking party’s microphone will then capture some of the return sound and loop it right back. This causes a series of amplifications and increasing loops that only ceases when one’s patience for such noise does. Well there is one way to cease it, a headset. This is 2014, and the fact that to this day you generally need a headset to have a good conversation over Flash makes me uncomfortable.
I set out to try to see what I could do to improve the echo cancellation, and a few days later after we implemented the changes into our two-to-many application, Faces, and tested, we realized we had succeeded in eliminating the headset requirement.
Okay, okay, now let’s get to the good stuff, how to do it.
First, you must decide which gain levels you want to use for the application. This is important as it is different from volume. Volume represents the amount of amplification of a signal whereas gain represents the degree to which the microphone will capture the sound around it. As one approaches 100 percent gain, the microphone will begin capture more and more ambient noise and sounds, and then amplifying them. This will cause the microphone to capture more of the return sound from the speakers than is desirable. I recommend using below 70 percent gain, but of course you should test other ranges in order to perfect it for your application!
The other changes rely on changing some properties in the EnhancedMicrophoneOptions class. There are two main areas to look at, the echoPath and mode properties. The echoPath property accepts two values, 128 and 256 – defined in milliseconds ( although you don’t specify that in the code, this is just for context ). The default setting is 128, and setting this to 256 enables a much more CPU intensive acoustic echo cancellation algorithm that works much better. The Influxis cloud can easily accommodate this additional CPU overhead, so if you’re an Influxis customer and are reading this feel free to implement this changes immediately. If not, be sure to keep a watch on your Media Server’s CPU resources.
The default mode for all USB-connected microphones is half duplex, and full duplex for non USB microphones. These days, a lot of microphones are USB based, and there are very few that do not have the capabilities to use full duplex. You can specify “FULL_DUPLEX” in the mode property to increase the fluidity of the conversation. Using full duplex also seems to enhance the acoustic echo cancellation performance, though this was not extensively tested.
Beyond that there is a property called nonLinearProcessing, that defaults to true. When this is true, the time-domain nonlinear processing technique is used. This should always be left alone, unless there is a music source being used. If there is a music source playing, and you are having a Flash client record a stream – perhaps using the music to dance to, etc., will greatly benefit from linear processing, so set the nonLinearProcessing property to false in order to take advantage of this.
If you’re having quality issues after these changes, you can override the framesPerPacket property of the Microphone class from it’s default of 2 frames per packet by increasing it. Increase it in multiples of 2, in order to prevent misalignment of A/V packets going into the Flash/Adobe Media Server.
Here’s a picture of the settings implemented in the application I link to below (faces)
If you would like to test these settings out, you can implement these settings to our Faces application or click on the link below. Be warned, this is a one-to-one-to-many application so you might meet someone new if you hop on! Two people can be streaming at the same time, and when one relinquishes control another viewer who is not publishing can go ahead and take the seat. This application is hosted from our San Jose, California data center.
I hope this little blog post has helped, and thank you for taking the time to read it! Please feel free to post any questions or comments below – feedback is always welcome and so very much appreciated!