At a recent Linux video storage installation my colleague Georg and I encountered really strange problems that we had never seen before. Our initial goal was to connect five Windows 7 machines running Avid Media Composer 7 to separate NICs on the new storage system. As it was impossible to install new cables we simply added a second IP address to the LAN connection to ensure correct traffic routing and to fulfil the bandwidth requirements. What we did not know was that this change caused the NetBIOS name resolution and discovery to fail – a not that unusual issue. For me this would have been just a minor annoyance however the client had several workflows that depended on it. Hence we had to revert everything and find another workaround.
In the end we agreed to simply connect all NICs to the client’s unmanaged core switch and assigned a dedicated IP to each of them. The image below gives a simplified network and traffic overview. We believed to have solved the NetBIOS issue while still being able to guarantee the bandwidth. If only we had known!
The first issue we encountered was that we were unable to connect to the internet but only from the new Linux system. We did not investigate this further as the client reported several other issues with their firewall configuration. We proceeded with our performance test that yielded miserable results!
After several hours we finally found out that the Windows clients correctly sent their traffic to their designated NIC on the central Linux storage however the corresponding replies were all sent from one single NIC limiting the total read performance to the bandwidth of this single link. The picture below illustrates the traffic flow we observed.
Given this, we found several articles, like “Routing packets back from incoming interface“, describing this behaviour. Linux, by design, considers packages individually for routing purposes. This means that the routing decision is only based on the package itself and not if it’s a response of some sort.
This also explained why we were not able to connect to the internet. Although we had only one interface with a default gateway configured, the entries in the routing table were not correct. What the means is, we sent the Ethernet frame over say NIC0 with the source IP address of NIC1 to the standard gateway. The firewall on it however detected that the MAC of the incoming frame did not match the MAC of NIC1 and thereby suspected an ARP spoofing attack. Logically it was blocked!
Luckily we can use additional custom routing tables to build a workaround. Novell’s KB entry “Reply packets are sent over an unexpected interface” explains in detail how to do it. With that help we could solve our issue and now all NICs get utilized as expected. As we did not know about this behaviour before it was a quite unexpected for us. However from a developer’s point of view this design makes absolute sense.