Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Layer 2 and 3 Considerations for AoIP

Learn the differences between the IP layers to improve audio performance

This article originally appeared in TV Technology.

When it comes to audio over IP, one area that seems to cause a substantial amount of confusion is whether a technology works at Layer 2 or Layer 3. Most of us know that one of these layers restricts us to a single network segment and the other doesn’t, but understanding how each of them functions is important when making decisions about AoIP technology and the network it will run on.

The differences between the two layers come down to the protocols running on them and how they communicate with other devices. They are just two of the seven layers that make up the Open Systems Interconnection (OSI) Model but they’re critically important to us because they ensure successful data transmission between source and destination devices.

To make sense of the OSI model, (or “stack”), I prefer to flip it on its side and view it more as traditional signal flow. Virtually all network processes involve two-way communication so the stack is usually processing data in two directions, as it is received and as it is sent. The stack itself is simply groups of protocols that handle the specialized tasks required for sending and receiving data, or Protocol Data Units (PDU), over a network. Structuring the stack into seven layers means protocols can be updated or replaced as more efficient or improved versions come along without disrupting the rest of the stack.

OSI stack signal flow

Click on the Image to Enlarge

The Physical Layer (1) converts Layer 2 data frames to and from the transmission type — electrical, optical, or RF — required by the network. The Data Link Layer (2) passes data between devices within the same network segment and handles the task of getting the data to and from the physical layer. The job of ensuring that data can be successfully exchanged with devices on any network falls to the Network Layer (3). The Transport Layer (4) segments and reconstructs data and maintains conversations between host devices and network applications.

As data moves down the stack it is encapsulated with additional information in the form of headers, trailers, addressing and error checking at Layers 3 and 2. This encapsulation process is reversed when data is received and the information is read, verified and removed on the way back in. The final three layers, Session (5), Presentation (6) and Application (7), all prepare data for use by the receiving host. Applications, as defined in the OSI stack, are still just network protocols, but they drive network-savvy user applications, which anyone working in broadcast will be familiar with, and include FTP, HTTP and IMAP, among others.

Layer 2 provides core networking protocols, including how to move data packets from and to the Physical Layer and the ability for network devices to communicate with each other using one another’s unique physical address, usually the Media Access Control (MAC) address of the network interface card (NIC) in the device. Physical addressing is what keeps Layer 2 devices bound to a single network because devices outside the local network aren’t looking for MAC addresses. In the AoIP world, an all Layer 2 network should be easy to build and maintain, but the network size will be limited, and there is no ability to expand the network once it reaches capacity. Moving beyond the local network requires using Layer 3 devices, which utilize Internet Protocol (IP) logical addresses instead of physical addresses to send data between devices while also providing the ability to associate Level 2 physical addresses with IP addresses.

Just like Layer 2 devices, each Layer 3 device needs to have a unique address on the network for identification, in this case an IP address. Layer 3 also brings with it the ability to route data to other networks by using a default gateway router. The default gateway maintains a routing table containing a list of active devices on the local network as well as other remote gateway routers it has recently encountered, but it does not typically know about the devices on remote networks.

When a local device wants to communicate with a remote device it sends the local default gateway router a PDU containing its source MAC and IP addresses and only the IP address of the remote destination device since it doesn’t know the MAC address of the destination device.

The local router, using its own IP address as the return address, reaches out to other routers to see if they have a device on their network that has the destination device’s IP address. Each gateway router knows the IP and MAC addresses for devices on its network and when a match is found, the remote gateway router lets the local one know that it can send the data, opening a Layer 4 communication session between the two devices.

As you can see, Layer 3 networks are a bit more complex than the Layer 2 variety, but with that complexity comes global expandability and this ability to go beyond the local network and interact with devices anywhere in the world allows Layer 3 networks to grow quite large.

While Layer 3 delivers us from small, single-zone networks, we cannot overlook its dependency on Layer 2 because, after all the routing and IP addressing is done, the hardware address ends up being used for actual device communication.

When planning an AoIP rollout, whether building an audio only network — as part of a larger video plant IP installation, or for something no more complicated than hanging some audio devices on existing network infrastructure — it is important to keep in mind the eventual scope of the network as well as the implications of building a large one.

The various AoIP technologies are also likely to have their own inherent scaling and bandwidth limitations that should be factored in. Finally, networks are far more complicated than the high-level look we just took at these two network layers so there are many more considerations, most importantly traffic prioritization, to be made before rolling out a network.

Jay Yeary is a broadcast engineer and consultant who specializes in audio. He is an AES Fellow and a member of SBE, SMPTE, and TAB. He can be contacted through TV Technology or at