By | June 14, 2022

Unsupported QinQ

My work ordered a pair of ExpressRoute circuits to connect our DC to Azure. I was not involved in the sales or requirements discussions, only the “Here’s two circuits make them go” part.

We have a pair of Nexus 9k switches for this location. Azure Private on the MS side is set for VLAN 350. Configure my ports as routed ports on VLAN 350, /30s, etc. No BGP. No ping. No ARP. Hmm.

Work with the carrier who points at MS. Work with MS who points at the carrier. Eventually I dig up a design guide on the carrier’s site explaining that ExpressRoute is delivered as a QinQ service, which makes sense because we could, in theory, have public and private peering over the same circuits.

Unfortunately, since no one ever mentioned this we don’t have any gear laying around that can route at gigabit speeds and has native support for QinQ. So after a few days of research and testing, my workaround was implemented.

ExpressRoute comes into the switch on Eth1/38. This port is set:
switchport
switchport mode trunk
switchport trunk allowed vlan 300 ### The outer VLAN from the carrier ###
switchport trunk allowed vlan 300
switchport trunk native vlan 999 ### Something unused ###

Port 1/35 is configured:
switchport
switchport mode dot1q-tunnel
switchport access vlan 300 ### Outer VLAN ###
spanning-tree bpdufilter enable

Port 1/35 is looped to port 1/36, configured:
Eth1/36.350
encapsulation dot1q 350
vrf member azureprivate ### VRF for Azure
ip address a.b.c.d/30

With this in place, 1 out of 2 circuits came up, the one on our ’02’. Moved both circuits to this switch (with a different outer vlan on each), both come up. Move both to the ’01’ switch, neither come up. Work with TAC on why that is, determine for some reason the CPU is getting the double tagged frame before it loops out 1/35 and in 1/36. Slap a static MAC different than the BIA onto 1/36.350 and bam, circuit comes up. Do the same on the other switch for consistency.

Was told by TAC that though they can find looping a 9k to itself for this purpose in any design guide, they also can’t see anything that says it is unsupported or shouldn’t work.