It’s late Tuesday afternoon, and you get a call from one of your show’s producers. There’s an expert guest on the other coast, and he’s planned a long-form interview with her tonight. The studio she’s in has an IP audio codec. So do you. “Let’s do this!” says the producer. What could go wrong?
It seems like this should be simple. Give the far-end studio your IP codec’s address, and enjoy the sweet wideband audio flowing with low delay back and forth, right? Nobody needs to know she hasn’t flown in to talk to you.
But when her studio tries to connect, nothing happens. The interview is conducted by phone. The producer is livid. You call the far-end studio in the morning for the post-mortem. It turns out that you have brand X codec and they have brand Y.
Why can’t different brands interconnect? The answer is: They can. But it’s not simple. And both the codec and your network need to be specially configured to do this, and things need to be tested ahead of time. And there are a couple of layers of “gotchas” based on the brands that can throw a wrench in things.
Sometimes this is enough for a busy engineer to say “to heck with it.” But if you’re willing to spend a little time configuring things, here are some tips on how to make it happen.
WHY IS IT SO?
Interoperability requires standards that developers can use to create common protocols that connect together. In the case of IP codecs, one standard does exist: EBU Tech 3326. It’s imperfect, but it does define a way professional-grade codecs should interoperate.
One main issue with the standard is that it was defined after many of the codecs on the market were developed or were already heavily in development. Several manufacturers had substantial sales of codecs before 3326 was published, and had done a lot of development work on perfecting their own protocols. So 3326 was added as an alternate mode of operation on most devices, with the proprietary modes still being the default.
Also, the EBU chose to make the 3326 standard use the same VoIP (Voice-over-IP) protocol gaining use in the telephone industry. Known as SIP, for Session Initialization Protocol, it was used by virtually no audio hardware codecs at the time. By default, most IP codecs use a single network socket to send and receive their “handshaking” or call setup information and their audio media. SIP does things differently, and as we’ll see, complicates the IT configuration required to use it.
Finally, the IP audio codec market is competitive. This competition drives innovation. Manufacturers have added things like error correction layers, presence/traversal services and call security features that aren’t well-defined in the standard.
ABOUT SIP
In the world of VoIP, SIP is virtually always set up as a client-server protocol. A SIP endpoint (which can be a phone, a software client or a PBX) registers with a server upstream, and creates a “keep-alive” signaling channel between the client and server. The server can also be a PBX (serving SIP endpoints) or it can also be a cloud-based server providing VoIP services directly to PBXs or endpoints. Fig. 1 shows this. In this figure, the PBX is both a client (registering with the cloud provider) and a server (providing registration services to the phones).
This has a big advantage if routers with NAT and firewalls exist between the client and the server. Because the initial registration request is made outgoing from the client, a socket is created and can be kept active with keep-alive messages. If an unsolicited message must be sent from the server to the client (e.g., incoming call alert), the open socket can be reused, and the unsolicited information is allowed to pass back to the client through the NAT router.
The channel that is kept open is strictly for signaling, and no audio media is ever passed on it. SIP dictates that a separate RTP socket gets opened between client and server in order to pass media.
This is the fact that makes SIP complicated in IT environments. Somehow, this independent media stream must also bridge the same NAT routers and firewalls as the signaling channel. In the client/server model (where it’s assumed no NAT or firewall exists on the server side) this can be done simply by assuring the client creates a stream first. The server can respond on the same socket and be reasonably sure things will flow unimpeded both ways. This is shown in Fig 2.
PEER-TO-PEER SIP
Having cloud servers to route calls is convenient in the VoIP application of SIP, but often doesn’t fit well into the workflow of connecting hardware codecs together. It is possible, however, and SIP registration servers are available for free from several providers.
More often, codec users are accustomed to connecting to a target IP address. This is allowed in SIP. But when both peers are sitting behind routers and firewalls, you lose the NAT traversal advantages of the client/server model. This is the number one failure point of SIP peer-to-peer connections and is shown in Fig. 3.
Professional codec users are accustomed to the fact that in order to receive incoming calls, they will need to open an incoming port on their network. This is only a single port, and is well-defined by the manufacturer, so while IT may grumble about it, this is usually possible.
SIP complicates things. In addition to the “normal” IP codec port, the user must open the SIP signaling channel (standardized as UDP 5060) and two separate ports for RTP media. There are no standard ports defined for SIP RTP media, so you must research which ports your codec uses. Here is where the IT department’s grumbling usually increases in volume.
An alternative to opening RTP ports for incoming streams is to enable a “SIP ALG” function in your router. Implementations vary, but in theory, your router should become aware of an attempted incoming SIP call (by reading the signaling channel) and allow incoming connections on the correct RTP ports. This could reduce the heartache of opening multiple ports for SIP, but the signaling channel (UDP 5060) must still be opened.
None of these IT configuration changes are much fun to do in the timeframe where you’re actively trying to set up a call between manufacturers in a hurry. It’s worth taking time to configure and test these arrangements long before they are needed. But IT configurations aren’t the only potential issue you’ll encounter.
ENCODERS AND DECODERS
On the codecs themselves, you must be sure your box is enabled to receive SIP/3326 calls. On Comrex codecs, this is done in the systems settings menu as shown in Fig. 4a.
The outgoing caller must also configure the call to use a SIP channel, rather than the manufacturer’s proprietary protocol. In Comrex codecs, this is two-step process where the user will create an outgoing “profile” that uses SIP, then create an outgoing peer entry in the “phone book” that has this profile assigned. See Fig. 4b.
The final catch in compatibility is that encoders must be specified that are compatible on both ends of the link. At Comrex, we recommend connections use AAC encoders for the best combination of audio quality and compatibility. Opus is also a good choice, if both sides support it. G.722 exists as a “lowest common denominator” and is almost universally compatible, albeit at its lower audio bandwidth of 7 kHz. Our testing shows good compatibility between Comrex and Tieline using these encoders. Our last testing against Telos products had issues with AAC encoders, so we recommend G.722 when connecting to those.
WRAPPING IT UP
Compatibility between IP codec manufacturers isn’t typically “plug and play,” but with a little advanced configuration and testing, you can make your codec open to connections with other professional hardware codecs.
Setting up for SIP calls has side advantages, since you’ll now be able to send and receive calls to an increasing number of VoIP devices, many of which are now implementing wideband audio encoders. Taking the time to learn about how these connections work can change your codec into a hub to receive many types of calls from many devices.
Tom Hartnett is technical director at Comrex.