OK, I know that the title to this post is a bit of a oxymoran.  ECMP Active/Passive?  Isn’t ECMP about active/active/active/active/…. ? Yes it is, but imagine a scenario where you are building your NSX-v deployment across two campuses, with datacenter firewalls upstream – making it important that you ensure data flow stays predictable and asymmetric routing doesn’t become an issue: ingress through datacenter A, egress through datacenter Roie Ben Haim has a fantastic write-up for why this is important, so I’ll leave you with this link as a primer.

Welcome back.  Now that we are on the same page on why we need to control datacenter ingress/egress, let’s further imagine that we have some incredibly demanding North South requirements that force us to lay down the maximum ECMP configuration of 8 ESGs.  Does that mean you can’t have passive, lower weighted ESGs on the secondary site of the DLR?  Does it mean you have to fail the ESGs over to the secondary site?

These questions hit me today, so I sought out to answer them.

Prior to answering this question I needed to do some work in my lab to enable the testing, as I only have a single DLR and two ESGs.  To solve this I whipped up the following powershell code to deploy out 7 more ESGs, and to configure BGP on them.  A few notes to understand the snippet:

  • is my network core.  The ESGs uplink to here and peer with a pair of vyos routers.
  • is my DLR transit.
  • AS 64513 is my vyos routers
  • AS 64514 is my ESGs
  • AS 64515 is my DLR
  • The snippet may not be perfect, it was a hack job to get the test going.

The heavy lifting done, I logged into my vyos routers and added the new ESGs as BGP peers, then gave everything a few minutes to settle.  One thing to note, the script as written configures the DLR to peer with 9 ESGs with equal weights.

So how does the DLR handle having  equal weighted paths, let’s show ip bgp and find out.  The DLR successfully pairs with all upstream ESGs with nary a complaint.

Now what about actual path selection, up next show ip route.

Only 8 selected paths, well that’s a bummer, but not entirely unexpected since the advertised ECMP limit for NSX is 8-way.

Let’s update the weight on .79 and make it a member of the “passive” datacenter and repeat our show ip route command.  Our expectation here is that .79 will drop from the route table, and be replaced by .78.

Excellent, this looks great.  I will spare you the screenshot, but also as expected .79 still shows up in show ip bgp but I still have two questions.

  • If all members of the 8 way ECMP(65,66,73-78) fail, what does the DLR do with .79?
  • If I revert .79 back to a 50 weight with the others, what does the DLR do if one of the selected ESGs fail?

Since .79 is currently configured as the passive site ESG, let’s fail edges 1-8.

And again it behaves exactly as expected, using .79 as the proper ESG!

Okay, now for our final test.  We will go power up edge1-8 and return .79 to a weight of 50, which as we saw earlier will result in 9 possible BGP paths, and NSX selecting 8 of them to use for ECMP.  Once everything is up and stable, we will fail edge7 and observe what happens (.66 was the one left out).

After .77 failed, we can see that the DLR added .66 to the ECMP route tables.


So in today’s screenshot laden post we examined the behavior af and NSX-v DLR when ECMP is utilized but you overload it with BGP peers.  What did we learn today?

  • It is possible to overload a DLR  with more than 8 BGP peers of equal weight.  When this occurs the DLR will select 8 of the peers and use them for ECMP.  If one of the peers fails, it will add one of the “overloaded” peers to the pool.  This is interesting because it allows you to provide N+1 8-way ECMP by deploying extra ESGs for situations where the bandwidth provided by 8 ESGs is critical to your environment.
  • It is possible to overload a DLR DLR configured for 8 way ECMP with one or more ESGs of lower weight for scenarios like datacenter failover.

While we explicitly tested this with an NSX-v DLR, I suspect this same behavior can be found in NSX-v ESGs, and any other Vyatta style router, and possible even classical hardware routers.  I am really at the neophyte level when it comes to routing and BGP behavior, so I don’t want to overpromise here.

Hopefully these two pieces of information are helpful to you in making informed decisions when planning out your NSX and possibly other routing deployments.

3 thoughts on “NSX-v ECMP Active/Passive configuration

  1. Quick question on this. So let’s say you have 8 ESGs in The Active/Primary DC, and 8 ESGs on the Passive/Standby DC (incase of a DR/failure of Active DC). The 8 ESGs in the passive DC are prepend or local pref is rendering them inactive until the upstream router(s) in Active DC fail, and ranges are advertised to Passive DC.

    What happens if you lose 1 ESG in Active DC, and the passive DC ESGs are added to the ECMP pool, does that not pose a risk of traffic blackholing?

    • I think to answer my own question, because the weight is(should) be different on the passive DC ESGs, these ESGs will never be utilised until ALL Active DCs ESGs fail (including any hot-standby’s that you suggest by overloading the pool with ESGs).

      Great write up.

      • Thanks!

        You are correct on your thought process; until all members of the active pool are down, the passives “should” not see any traffic.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.