Blog

Using VMWare Cloud DR to protect VMs in an SDDC. (Part 1 – Deploying vCDR)

VMware Cloud Disaster Recovery is currently the solution that has the most interest from my customers and partners. It’s a solution that offers the best way to deliver an on-demand easy-to-use cost effective DRaaS solution for workloads running on-premise.

A few months ago, it added the ability to protect cloud workloads running in VMware Cloud on AWS with inter-region DR failover (i.e. failover from one region to another region).

Very recently, the solution has now the ability to protect cloud workloads running in VMware Cloud on AWS with intra-region DR failover (i.e. failover into a different availability zone (AZ) within the same region).

Let’s see how we can leverage this to protect workloads.

Deploying vCDR in VMC on AWS

Accessing the vCDR dashboard

First of all I need to access the CSP Console and locate the VMWare Cloud DR tile under the My Service menu.

If I click on the tile here it brings me to VMware Cloud DR landing page.

As you can see it looks very similar to the CSP page. The development team have been doing a great job integrating vCDR in the Services portal.

Currently the dashboard is showing you the health capacity and especially the number of protected VMS, the number of Protection groups, as well as the replication direction of each of your protected sites and recovery SDDC.

In my current Demo environment, there are 3 protected on-premise sites and one recovery SDDC (CSLAB-M17).

Fan in architecture model is supported: 3 sites and 1 SDDC currently protected to CSLAB-M17

The cloud backup site where the Scalable Cloud Filesystem stands is CSA-VCDR-SCFS.

On the left, I can see the replication events and any recent alarms and alerts displayed.

Adding the SDDC as a new Protected site

In this lab, the Scalable Cloud File system has already been deployed. So we can directly jump into the deployment of the vCDR connector on my VMC on AWS SDDC by clicking the Set up a protected site link menu.

Here I choose VMware Cloud on AWS and Click Next.

The list of SDDCs in my organization are then displayed. I can see that only the SDDC that is in a different AZ from my SCFS can be used. So I picked the SDDC in US East North Virginia region.

Here I am presented with two choices: manually create the Gateway Firewall rules or leave vCDR automatically add the right rules. The DRaaS Connector is a VM that has to be deployed on a Compute segment in the SDDC. I decided to choose Automatic and pick the default segment of my SDDC. Obviously it’s up to you to choose another segment dedicated for it.

If you are not sure which option to select, see Network Considerations for a Protected SDDC for more information.

To finish the site creation I clicked Setup.

After a few seconds, the SDDC (JG-LAB-TEST) appears as a Protected Site.

Deploying the DRaaS Connector in the newly protected SDDC

Once the site is configured, the next step is to deploy the DRaaS connector which would enable the SaaS orchestrator communicate with the Protected SDDC vCenter. Refer to the Documentation for the VM CPU and network requirements

This process is quite straight forward. Just click on the Deploy button.

You will presented with a screen that explains every steps.

First of all you have to download the virtual appliance that will enable connectivity from the SDDC to the Cloud filesystem, second connect on the console to finish setting up the IP and to enter the Cloud orchestrator FQDN.

Make a note of the Console credentials, which you need to log in to the VM console: admin/vmware#1. Also copy (or write down) the Orchestrator Fully Qualified Domain Name (FQDN), which you need when you configure the connector in the VM console

A few things you need to know:

  • Do not name the DRaaS Connector VM using the same naming conventions you use to name VMs in your vSphere environment.
  • Avoid giving the DRaaS Connector VM a name that might match the VM name pattern you use when you define protection groups.
  • If you are deploying the DRaaS Connector to a VMware Cloud SDDC with more than one cluster, you must choose a cluster to deploy the connector VM on. Each cluster in your SDDC must have the connector VM deployed on it in order for the VMs running on the cluster to be added to protection groups and replicated to a cloud backup site.
  • Do not use non-ASCII characters for the connector name label.

After downloading the OVA by using the URL, I have uploaded the OVA to a Content Library in my SDDC. And started the deployment of the OVA.

I gave it a name.

The only Resource pool that I can choose is the Compute-ResourcePool.

The Storage datastore can only be WorkloadDatastore.

I have chosen the default compute segment (sddc-cgw-network-1).

I am then presented with the final page of the wizard and I click finish to launch the deployment.

After a few seconds, the Connector Virtual Machine appears in the inventory. I just started the VM to be able to continue the setup.

Finishing configuring the Cloud Connector in the SDDC

Second phase of the deployment is to setting up the networking.

Once the VM has started, I have had to open a console from vCenter in order to finish the configuration I have had to connect with credential presented is the latest window: admin/vmware#1.

I have typed ‘a’ to start Static IP allocation and entered a new IP address and subnet mask plus a DNS IP address (I picked the google one).

Next step is to enter the Cloud Orchestrator FQDN.

And to achieve the configuration the site specific pass-code…

and the site label (I kept the same name as the VM).

After a few seconds, I received a Success message to inform me that the setup was achieved.

To finish this phase, I have checked that the right firewall rules have been created in my SDDC.

With the newly added rule, the segment where the Cloud Connector runs has access to the Cloud Orchestrator in the cloud with SSH and HTTPS, to the SDDC vCenter, and to the Auto-support server in HTTPS. Finally it has also access to the scalable Cloud File System on the port TCP 1759.

That’s conclude the first part of this very long series of post on vCDR.

In my next post I am going to show you how to start protecting VMs in the Protected SDDC!

NSX Manager Standalone UI for VMC on AWS

Today I want to focus on the new feature from M16 release that enable customer to have a direct access to NSX Manager UI.

This is for me an interesting capability especially because it gives access to a more familiar interface (at least for customers that already utilise NSX-T) and it also reduces the latency involved with the CSP Portal reverse proxy.

In addition, it enables the access to NSX-T TraceFlow which will be very helpful to investigate connectivity issues.

Let’s have a look at this new Standalone UI mode.

Accessing the standalone UI

There are two ways to access the NSX Manager Standalone UI in VMC on AWS:

  • Via Internet through the reverse proxy IP address of NSX Manager. No particular rule is needed on the MGW.
  • Via the private IP of NSX Manager. It’s the option you will take if you have configured a VPN or a Direct Connect. A MGW firewall rule is needed in that case.

In order to choose between the type of access that fits our needs, we need to select it in the Settings tab of the VMC on AWS CSP console.

There are two ways to authenticate to the UI when leveraging the Private IP:

  • Log in through VMware Cloud Services: log in to NSX manager using your VMware Cloud on AWS credentials
  • Log in through NSX Manager credentials: log in using the credentials of the NSX Manager Admin User Account (to perform all tasks related to deployment and administration of NSX) or the NSX Manager Audit User Account (to view NSX service settings and events)

Both accounts have already been created in the backend and their user name and password are accessible below the URLs.

I have chosen the Private IP as I have established a VPN to my test SDDC.

So prior to accessing the NSX Manager, I have had to create a Management Gateway Firewall rule to allow source networks in my lab to access NSX Manager on HTTPS (the predefined group NSX Manager is used as a target).

Navigating the standalone UI

I started by clicking on the first URL here:

After a few seconds, I am presented with the NSX Manager UI:

Networking tab

Menu from Networking Tab

This tab will give you access to configuring the Connectivity options, Network Services, Cloud Services, IP Management, or Settings.

Basically the settings can be accessed in read only or read/write mode.

Keep in mind you will not have more rights or permissions to modify settings than if you were editing it from the CSP Console.

VPN and NAT options are accessible with same capabilities as from CSP console.

The Load Balancing options is there and is usable only if you have Tanzu activated in your cluster.

For example, for the Direct Connect you can change the ASN number or enable VPN as a backup.

For Transit Connect, you can have a look at the list of Routes Learned or Advertised.

Public IPs allow for requesting new IP addresses for using them with HCX or a NAT rule.

Let see what’s possible to do from the Segments menu.

From here you can see is a list of all your segments. You can also create a new segment, modify an existing segments or delete your segments.

I was able to edit the settings of one of my segment DHCP configuration.

I was also able to edit my Policy Based VPN settings.

All the other options are reflecting what we can already do in the CSP Console.

Security tab

This Menu is divided into two parts:

  • East-West Security that gives access to the Distributed Firewall rules and Distributed IDS/IPS configuration,
  • North-South Security covers internal traffic protection and the Gateway Firewall rules settings.

Nothing really interesting here, it’s pretty much the same as from the CSP Console as you can see here:

On the Distributed IDS/IPS, I can review the results of my previous penetration testing that I did in my previous post.

Inventory tab

This tab is covering:

  • Services: this where you’ll configure new protocol and services you want to leverage in the FW rules
  • Groups: group of Virtual Machines for Management FW rules and Compute Gateway rules
  • Context Profiles: you can basically add new FQDNs useful for the DFW FQDN filtering feature, AppIDs for Context Aware Firewall rule, and set up Context Profiles.
  • Virtual Machines: list all the VMs running an attached to segments with their status (Stopped, Running, …)
  • Containers: will show Namespaces and Tanzu Clusters.

Plan and Troubleshoot tab

The tab is covering:

  • IPFIX: this where you’ll configure new protocol and services you want to leverage in the FW rules
  • Port Mirroring: this permits to setup a target collector VM and then replicate and redirect all trafic from a logical port switch to it for analysis purpose
  • Traceflow: very nice feature to monitor and trouble shoot a trafic flow between two VMs and to analyze the path of the trafic flow.

The last one is a feature not existing on the current VMC on AWS CSP Console and which is to my opinion worth having a look at.

Traceflow option

Let’s have a look more deeply into what this brings onto the table in my next post.

Stay tune!

NSX Advanced Firewall Add On for VMware Cloud on AWS (Part 3)

In my previous post, I talked about the FQDN filtering feature which is one of the new Add-Ons of the Advanced firewall.

In this Part 3 of this multi part blog series, let’s focus on the latest feature, the Distributed IDS/IPS which is part of the newly announced NSX Advanced Firewall for VMware Cloud on AWS.

Introduction to Distributed IPS/IDS

With NSX Distributed IDS/ IPS, customers gain protection against attempts to exploit vulnerabilities in workloads running on VMware Cloud on AWS.

Distributed IDS/ IPS is an application-aware deep packet inspection engine that can examine and protect traffic inside the SDDC. Customers can detect and prevent lateral threat movement within the SDDC using the intrinsic security capabilities of Distributed IDS/IPS. 

Like DFW, Distributed IDS/IPS is built into the hypervisor and inspection can be performed for all traffic coming into or leaving the VM. Since the inspection is performed on all the hypervisor hosts in a distributed manner, there is no single inspection bottleneck that chokes the traffic flow.

Enabling Distributed IDS/IPS

First thing we will do is to activate and configure the Distributed IDS/IPS feature in VMC on AWS SDDC.

If you don’t have already activated the NSX Advanced Firewall add-on, please do so otherwise you will get this message:

Remember in my first Post of this series, I already have shown you how to activate the NSX Advanced Firewall Add On for VMware Cloud on AWS.

Once you have activated the add-on feature, in the browser, Click the Networking and Security tab. Click Distributed IDS/IPS, located in the Security Section.

The IDS/IPS is disabled by default so you have to enable it for the cluster. Here I have only one cluster.

Just move the slider to enable the feature and confirm that you want to enable the cluster and you are ready to test it!

Once it’s enabled you can choose to regularly update the Signatures by selecting the Auto Update new versions button.

NSX Distributed IDS/IPS utilizes the latest threat signature sets and anomaly detection algorithms to identify attempts at exploiting vulnerabilities in applications. It is integrated with the NSX Threat Intelligence Cloud Service to always remain up to date on the latest threats identified on the Internet.

You can check the other versions that have been presents in the environment by clicking the View and change versions link.

This is launching a new window with historical details. Here we can see that the first Default Signature was installed Jun 17th, 2021 and additional signatures has been pushed Oct 20th and Nov 12nd.

By clicking on the New signatures, I can dive deep into the details of each of them and access really good information on what signatures have been disabled, updated, …

We are gonna go ahead and be using the latest versions.

If you don’t have access to Internet from your NSX Manager, you also download the IDS/IPS signatures from the Network Threat Intelligence services page and be able to upload them manually.

Now it’s time to finish configuring the feature and launch some real test attacks by leveraging both Kali Linux and the infection Monkey tooling to simulate some attacks!

Configuring Distributed IDS/IPS profile & rule

Create a Profile for IDS/IPS

In this section, I will create a default profile to use with an IDS/IPS rule.

NB: We can configure up to 25 profiles.

Under the Profiles tab under Distributed IDS/IPS within the Security section, I have clicked ADD PROFILE and create the ChrisIDSProfile profile:

I have accepted the default settings but you can customise the profile to meet your requirements. You can for instance only select the Intrusion attack with a level of severity to Critical or High and Critical only.

One thing you can do is to tweak it by selecting specific CVSS or Attack types.

I clicked save to finish creating the Profile.

We can see that the profile has been successfully created.

After a few seconds it appears in green:

Create a Policy with rules for IDS/IPS

Now let’s create a Policy.

For that, I need to go to the Rules tab and add a specific IDS/IPS Policy called ChrisIDS-Policy.

I have selected the check box next to the name of the policy, then click Add Rule.

To finish the configuration I have to select the profile previously created.

I have also changed the source from Any to my SDDC subnets.

Please note that I leave the Sources and Services columns to Any and the Applied to field set to DFW.

I have also left the Mode to Detect Only. In Production it’s better to change this setting and switch to Detect & Prevent.

Now that I am done with the setup, I just need to click Publish.

Now it’s time to go for some tests of attacks and exploits.

Testing the IDS/IPS

In order to test the IDS/IPS feature, I have used my best security scanning tools to generate some attacks and try to exploit some vulnerabilities in one special server.

Basically I will launch the exploits on a OWASP Web Application server which is a test server with vulnerabilities that I have deployed in my SDDC. In a nutshell OWASP stands for The Open Web Application Security Project® and it is a nonprofit foundation that works to improve the security of software. It’s a very good way to test the level of security of your environment.

This OWASP server is going to be the target for all the vulnerability scanning coming from my two different tools.

Scanning tools

First one is the Kali Linux distribution in a Virtual Machine which have a multitude of security tools preinstalled in it. I love it!

The second one is the Infection Monkey virtual appliance from Guardicore which is a platform with a graphical interface that you can leverage to launch the exploits.

Infection Monkey is an open source breach and attack simulation (BAS) platform that allows organisations to discover security gaps and fix them. You can Simply infect a random machine with the Infection Monkey and automatically discover your security risks. Test for different scenarios – credential theft, compromised machines and other security flaws.

Deploying Kali Linux

It’s a simple process as you can install it from a ISO CD or download a virtual image directly from here.

I have choose to install it with the ISO CD as it gives more flexibility to tweak your VM settings.

Once the VM is deployed there is nothing more to do.

Deploying Monkey Island VM

First I have deployed the Monkey Island VM from the OVA downloaded from the Infection Monkey website. This is an Ubuntu Linux VM with a small footprint of only 2 vCPU and 2GB of RAM.

Once it’s been installed, I have just started the VM.

My VM is up and running very quickly and I can connect to it from the web console on port 5000:

Once I am logged in with the default username: monkeyuser and password, I can setup the system.

I start by clicking on Configure Monkey.

I need to click the Network tab, and Change the Target list IP address with the IP address of the OWASP VM running in the App segment (172.11.11.115).

Then I clicked on the Run Monkey on the left and Select From Island.

At that moment the tool launches the exploits automatically.

Launching the attacks and exploits

With Kali Linux tools

In my environment, the Kali Linux server address is 172.11.11.107.

And the OWASP Broken Web Application has the following address: 172.11.11.115.

In this first stage, I started to use Kali Linux with nmap to scan the OWASP Web server.

As you can see, there are 9 opened ports on the machine. The nmap command is able to output the name and version of the services that use the ports

Is this next step, I have leveraged the nikto command to scan for vulnerabilities on the server.

Multiple vulnerabilities have been displayed. Mainly affecting the Apache server and also the version of Python which is outdated

The result of the exploit is visible now on the CSP Console as you can see on the screen below. At the top, you can see there is representation of the attempt to compromise the server and they are spread over a time range with a slider that can be changed as needed.

The attacks have triggered a lot of Emerging Threats (ET Scan) alerts with Medium, High and Critical severity levels.

Medium alerts inform that the http protocol on the Web-Server is exploitable with vulnerabilities. The response here is just “Detect”. You can see the CVE number and CVSS Classification of the vulnerabilities on the right.

When I click on the VMs Affected, a list of the VM that have been affected by the vulnerabilities displays:

In addition, clicking the purple bar allow for displaying a detail window:

With Monkey Island tools

As I said before the scanner starts automatically after finishing the setup. Once it has finished its scanning operations, Monkey Island shows a map with all the results accessible through a web page.

It also displays a map of the devices that have been scanned by the tool.

On the right of the page, there is a tab called ATT&CK report that helps understand the exploits that have successfully been used or tried.

On the VMC on AWS Console, the results are displayed the same way as before with the Kali Linux tool:

The Alert displayed here is an apache Struts remote code execution attempt.

Conclusion

This new Advanced Firewall Add-on IDS/IPS feature is really interesting as today it’s the only way to prevent attacker from exploiting vulnerabilities from inside the SDDC.

That’s conclude the post, I hope this has given you a better understanding on how this feature is powerful.

NSX Advanced Firewall Add On for VMware Cloud on AWS (Part 2)

In my previous post, I have introduced you to the new Advanced Firewall Add-on in VMWare Cloud on AWS.

I also covered the Context Aware Firewall feature to filter connection based on the App id and not only the protocol number.

In this post, I am going to cover Distributed FW FQDN filtering to allow applications that communicate outside the SDDC gain layer 7 protection.

Introducing the FQDN Filtering feature

This feature can allow users to only access specific domains by whitelisting and/or blacklisting FQDNs. In many high-security environments, outgoing traffic is filtered using the Distributed firewall. When you want to access an external service, you usually create IP-based firewall rules. In some cases, you don’t know which IP addresses hide behind a domain. This is where domain filters come in handy.

Because NSX-T Data Center uses DNS Snooping to obtain a mapping between the IP address and the FQDN, you must set up a DNS rule first, and then the FQDN allowlist or denylist rule below it.

SpoofGuard should be enabled across the switch on all logical ports to protect against the risk of DNS spoofing attacks. A DNS spoofing attack is when a malicious VM can inject spoofed DNS responses to redirect traffic to malicious endpoints or bypass the firewall

You can define specific FQDNs that are allowed and apply them to DFW policies. Conversely, you can define specific FQDNs that are denied access to applications in the SDDC. The DFW maintains the context of VMs when they migrate. You can then increasingly rely on application profiling and FQDN filtering to reduce the attack surface of their applications to designated protocols and destinations.

Configuring DFW with FQDN filtering

In this section, I will show you how to setup a FQDN Context Profile, and a Firewall policy to limit access to specific URLs from VMs.

Creating a FQDN Context Profile.

First thing first ! Let’s create the context Profile.

Under Networking and Security, in the Inventory section, click Context Profile.

Click FQDNs Tab

Click ACTIONS –> Add FQDN

Enter the Domain: *.yahoo.com, and then Click SAVE.

Create a second FQDN with *.google.com.

Click the Context Profile Tab, and Click ADD CONTEXT PROFILE

Give it a Name: Allowed FQDNs, Click Set

Click ADD ATTRIBUTE –> Domain(FQDN) Name

Select the following domains: *.yahoo.com, *.office.com, *.google.com and Click ADD.

Click APPLY, Click SAVE. We now have a Context Profile setup.

Creating a Firewall rule and a Policy

I have created a Group called MyDesktops which includes a segment with my Windows VMs.

Now I am going to setup a Firewall Policy including this Context Profile. I will limit my VM in the MyDesktops group to access to the Allowed FQDNs. Also I limit access from this Group of VMs to specific DNS servers (8.8.8.8, 8.8.4.4).

I also add a Drop rule at the end to limit access to only the FQDNs that were whitelisted.

Now I am allowed to access google.com and Yahoo.com but I can’t connect anymore to the vmware.com site.

This concludes the post on FQDN Filtering. In my final post, I will cover the Distributed IDS/IPS feature.

NSX Advanced Firewall Add On for VMware Cloud on AWS (Part 1)

VMware Cloud on AWS already offers a robust sets of networking and security capabilities that enable customers to run production applications securely in the cloud.

The release of the M16 version is introducing new Advanced Firewall Features as an Add-on.

This includes the following new security capabilities:

  • L7 Distributed (Context Aware) Firewall with application ID – With L7 (Context-aware) firewall you can go beyond simple IP/ port level layer 4 security to complete stateful layer 7 controls and filtering.
  • L7 Distributed Firewall with FQDN Filtering – Applications that communicate outside the SDDC also gain layer 7 protection using Distributed Firewall FQDN filtering capability. Customers can define specific FQDNs you can define specific FQDNs that are denied access to applications in the SDDC. The DFW maintains the context of VMs when they migrate. Customers increasingly rely on application profiling and FQDN filtering to reduce the attack surface of their applications to designated protocols and destinations.
  • User Identity Firewall – You can create groups based on User ID and define DFW rules to control access to virtual desktops and applications in the SDDC. Per user/ user session access control limits the amount of time and exposure users have to desktops or applications. Integration with Active Directory / LDAP enables the DFW to continuously curate user access to applications. User ID based rules are enforced by the DFW at the source, delivering pervasive, intrinsic security throughout the SDDC.
  • Distributed IDS/IPS – With NSX Distributed IDS/ IPS, customers gain protection against attempts to exploit vulnerabilities in workloads on VMware Cloud on AWS. Distributed IDS/ IPS is an application-aware deep packet inspection engine that can examine and protect traffic inside the SDDC.

Let’s try them to see how it works!

Enabling the NSX Advanced Firewall Add-On

The NSX Advanced Firewall Add-on adds Layer-7 Firewall protection, Identity Firewalling, Distributed IDS/IPS and FQDN Filtering to the VMC on AWS SDDC. This Feature is an Add-on featured and priced in addition to the Standard VMC on AWS subscription.

Before any of these features can be used, you must first enable the add-on onto your SDDC. In the following section, I am going to walk you through the steps of enabling the NSX Advanced Firewall functionality onto your SDDC.

  1. On your SDDC tile, click View Details
  2. Click the Add-Ons tab
  3. In the NSX Advanced Firewall Tile, click Activate

Click Activate

Click OPEN NSX ADVANCED FIREWALL (This will take you to the Networking & Security Tab)

At this step, the NSX Advanced Firewall Addon has been enabled. To make use of the functionality it provides, you must configure them individually.

In the upcoming sections, we will configure and test each of these features.

Configuring L7 Distributed Context Aware Firewall

With L7 (Context-aware) firewall, it’s possible to go beyond simple IP/ port level layer 4 security to complete stateful layer 7 controls and filtering. This will avoid for instance someone from changing Port number to bypass a firewall rule.

Extremely powerful !

Deep packet inspection (DPI) built into the Distributed Firewall enables you to allow only the intended application / protocols to run, while denying all other traffic at the source. This enables you to isolate sensitive applications by creating virtual zones within the SDDC.

Distributed Firewall (DFW) layer 7 policies are enforced at the hypervisor (vNIC) level and can migrate with the VM when they move from host to host in the SDDC, ensuring there are no gaps in enforcement.

Let’s see how to configure and use the feature.

Configuring a standard L4 FW rule

In my example, I have two VMs (webserver01, webserver02) running in my SDDC which are part of a group called Web Tier.

Here are the IPs of the VMS:

They can communicate together over any protocol as this is the default settings in the Distributed Firewall as we can see here:

First let’s create a traditional L4 firewall rule to block SSH traffic between the two VMS.

Now if I want to ssh from webserver01 to webserver02 it’s blocked:

What if SSH was listening on another port, however? What if some nefarious person (knowing SSH on port 22 is being blocked) changed the port the server listens on and attempts to SSH to the server against this new port, what happens then? 

To do that I have edited the sshd_config on the webserver02 VM and changed the port to 2222:

I have then restarted the ssh service on the VM:

We can see the ssh server is now running on port 2222:

Let see what happens when we apply context awareness to the firewall rule.

now if I try to connect back but on port 2222, it works!

Unfortunately, the L4 DFW doesn’t block it. As mentioned earlier the firewall is looking for SSH on port 22, not port 2222, so I was able to bypass the firewall policy.

Configuring Context Aware Firewall rule

NSX Context-Aware Firewall  Rule (L7) enhances visibility at the application level and helps to override the problem of application permeability. Visibility at the application layer helps you to monitor the workloads better from a resource, compliance, and security point of view.

In order to switch to the Context Aware firewall, I just have to remove the SSH in the Service field from the DFW rule and need to add SSH in the Context Profile field:

The rule is now changed:

Let’s try to connect again to port 2222:

Now the attempt to connect to the modified port is block. That’s much better! This is because the DFW now assesses the packet at layer 7 and identifies the heuristics of the packet to be ssh and does not allow the traffic through.

With Context-Aware Firewalling you can enable enforcement of security protocol versions/ciphers reduce attacks by only allowing traffic matching APP Fingerprint, and enforce port-independent rules.

In the next post I will introduce you to the L7 Distributed Firewall with FQDN Filtering. Stay tune!

Leveraging Log Insight Cloud for VMC logging and auditing

In this blog post I will talk about how vRealize Log Insight Cloud can be leveraged to monitor the Audit logs of Vmware Cloud on AWS.

Introduction to vRLIC

vRealize Log IC is able to ingest data coming from multiple sources including Physical infrastructure, VMware Cloud on AWS as well as Cloud providers platform or applications running on any cloud. It gives customers the choice on multiple ingestion options like :

  • Remote Cloud proxy: a small virtual appliance deployed on-premise to collect data from any physical or virtual on-premises hardware as well when redirecting logs from vRLIC to an external SIEM (like Splunk)
  • VMC integration: any subscribed customer will get audit and security logs automatically redirected to a cloud instance of vRLIC unlike the NSX-T firewall logs that need to be activated
  • REST API integration: allow to push logs from any log agent like Fluentd or Logstash running on applications running anywhere but can also use the vRLI agent and Cloud proxy
  • Cloud native AWS, GCP or Azure integration: for instance AWS CloudWatch or Cloudtrails logs can be redirected to vRLIC through a lambda function. Azure or GCP can send several log types through HTTPs.

There is nothing to configure with VMC on AWS and once customers gets access to vRealize Log Insight cloud, they are automatically getting all the audit and firewall logs coming from VMC.

The Cloud Proxy

The Cloud proxy is basically responsible for sending logs coming from on-prem to the vRealize Log Insight Cloud.

It can be installed from the vRLIC instance by going to the Cloud Proxies available under the Configuration Menu.

Once you click you will be redirected to a pop window with a link to directly download the proxy OVA. A specific key is also generated and it will have to be used when you deploy the OVA on-premise in order to identify it towards the vRLIC instance.

Content Packs

vRealize Log insight Cloud leverage the concept of Content Packs to add additional dashboards and alerts settings preconfigured to enrich the default configuration.

Content packs are a bundling of dashboards, queries, alerts, and extracted fields to make it easier to consume log data coming from your applications and infrastructure. vRealize Log Insight Cloud offers content packs for VMware products such as VMware Cloud on AWS, NSX-T, NSX-V, vSphere, vSAN, vRealize Operations, vRealize Network Insight, and Workspace One just to name a few.

There are content packs for infrastructure like firewalls and switches, and applications such as Apache, MySQL, SQL Server, and Active Directory.

There are also a content pack for popular AWS Cloud Services such as S3 and RDS. I will cover this topic in a next post. We have added recently support for Azure And Google Cloud as well.

There are currently around 77 Content Packs available in vRealize Log insight cloud today which makes it a very relevant platform when it comes to monitor Cloud native applications, virtual infrastructure, as well as on-premise computing and networking devices.

Log Sources

There are a lot of different log sources ranging from infrastructure, applications, development tools, middleware, AWS, Azure, GCP.

The first thing you do when you want to add a new log sources is to verify the steps to add the log source. In order to do it just go to the Log Sources page of vRLIC here:

and select the relevant solutions you want to collect logs from.

Let’s say you want to add AWS CloudTrail logs into Log Insight Cloud. What you need to do is just select the CloudTrail Log sources and you will be guided by the configuration guide listing all the required steps. Very easy!

Out of the Box content for VMC on AWS

There are a lot of interesting content that customers are entitled to when subscribing to the VMC on AWS service including VMC on AWS specific Dashboards, Queries and Alerts.

The following 3 Content Packs are enabled by default and they can be consumed as soon as logs are starting flowing into vRLIC.

As for now the latest versions of the Content Packs are :

  • 4.4 for the General
  • 2.0 for the Audit Events for VMware Cloud on AWS
  • 4.0 for the NSX-T for VMware Cloud on AWS

In the free edition, customer will not be able to activate any other Content Packs.

What logging content comes with VMware Cloud?

When the customer subscribe to the VMC on AWS service the following content are enabled by default :

  • Audit Logs: is a Core service for VMC customers that monitors VMC deployments for potential security breaches or internal misuses of infrastructure as well as give visibility into VMC deployment, including which users is doing what and when, which users created or deleted a VM or accessed the CSP portal,
  • Firewall Logs: allow to gain visibility on the NSX-T Firewall or DFW events and monitor traffic being allowed or dropped. This mainly helps customers to tune the firewall security policies and monitor the dropped packets or the traffic flow that is needed for security.

Please note that the Firewall Logs are only available in the paid version.

Be aware that as a customer, you will not have access to all the messages coming from VMC on AWS like you would from your on-premise infrastructure. We have agreement with AWS that we won’t be exposing any of the underlying hardware and infrastructure to customers.

But don’t worry ! That’s not really of a problem as VMC on AWS is a managed service where VMware is taking care of the infrastructure including the hosts and the clusters. So we do have SREs in the back end that have access to the logs to make sure everything is ok.

What are the AUDIT logs for VMC?

Basically what is collected as log messages is well documented by VMware through the specific filters that are applied at VMC level and it can be checked from the documentation here.

It includes the raw log messages coming from ESXi, vCenter, and User-driven Activity Events as well as the NSX-T Audit and Packet Log Events. We also redirected all the VMC Notification Gateway Events.

Audit Events

The Audit Events for VMware Cloud on AWS Content Pack comes with 11 dashboards.

The Activity dashboard is all related to activity logs from VMC and it includes the major changes made to your SDDC classified by Type, by User or by Resource Type.

The one I found the most interesting is the VMC Overview dashboard as it gives a unique view on the multiple Events that affect the SDDC and the Alarms that have been triggered.

This is where you can list the number of VM created and deleted for instance.

In the Dashboard workbench, you can edit the dashboard and adapt the visualisation to your needs.

Editing a widget into a dashboard allow for modifying the look and feel and to switch from multiple visualisation and different colors.

You can also remove or add widgets by dragging and dropping a widget from the left to the right.

Once you have modified the dashboards, you can save it to a new name.

Be aware that Content pack dashboard are read only and need to be cloned prior to editing.

NSX-T events

The NSX-T Content Pack comes with 5 dashboards covering Distributed FW events (overview and traffic), Gateway Firewall log messages, NSX-T Audit events and a specific one for DHCP events.

It covers all the major changes made to the Gateway and distributed Firewall inside the SDDC.

Here is an example of the type of logs you can get from the Distributed Firewall – Traffic dashboard including the traffic traversing the FW including Sources or Application Ports Permitted:

or also FW Destinations:

Exploring Logs for VMC on AWS

Log explorer is where you can see the log stream, log types and alerts. This is where you can filter based on specific criteria and view the details of the log messages as well as create queries for custom dashboards and alerts.

You can also extract fields from the log explorer view. The Log Explorer is gonna show on the right hand side the fields which are the string of text that are in the log message.

Extracted fields are really powerful as they can be leverage in the ADD filters to help you filter log messages based on a particular value. That could be for example to see logs for a particular host or an application.

In order to access the Log Explorer, you just have to Select Explore Logs from the drop down menu on the left to access it.

If you don’t filter on anything all the log messages coming from all the hosts will be displayed.

There are different ways to view messages by leveraging several filtering options. Common language search is supported and you don’t have to be an expert in regular expression (REGEX is also supported however).

Whenever you create a query by using a query filter you can create an alert from it and get an email notification when a particular log message is detected.

Once you have created your queries you can favorite, save, compare and pin them and come back later to the specific queries that you have saved.

All VMware Cloud on AWS Logs will have an associated SDDC_ID. If multiple SDDC’s exist, each will have a unique ID. So you can limit the view to a specific SDDC by entering the SDDC ID as a filter for example.

The time frame is by default 5 minutes and it must sometimes be updated to reflect older log messages and avoid empty dashboards. You can specify a custom Time Range if you want to filter on a larger time frame like multiple hours, days or months. Even when increasing the time window some widgets will remain empty and that is completely normal in a healthy configuration.

There are different ways to view messages:

  • Event Types display the most frequently occurring events
  • Alerts is specifically focusing on the alerts present in log messages
  • Event Trends is there to help you better understand the trends in the environment.

So you have the option to view log messages by Event Types and limit the view to Alerts present in the log message.

Event Trends is delivering the information on the frequency (event per hour) on specific Events. This is doing a log comparison again the user defined window (time frame) by grouping log messages into categories. This is leveraging an AI driven log categorization to group related log messages into log types.

To access the information on the event frequency, just go over the related events as displayed bellow:

Enabling SDDC alerts

Some thing to call out is that by default none of the VMC Alerts that are coming from the VMC Content Packs are enabled.

In order to enable some of the main Alerts for VMC, you first need to switch to the Alert definitions Menu and toggle the switch on the left that it appears in green.

You can always disable an alert for a certain amount fo time if you will.

There is also a Triggered Alerts area where you can view the Alerts that were triggered over a specific time and it is represented through a Graph on which you can adjust the Time Frame.

You can also view the details of the Alerts in a log explorer at the bottom of the window like here:

and you can also update an Alert definition if you have created it.

Adding Notification to an Alert

Once you have activated an Alert you might want to change the notification options and add specific emails, webhooks like slack channels as recipient.

There are several other attributes that you can update in the Alerts settings, here are a list of them:

  • Query: what is the search criteria for generating the alert
  • Trigger Conditions and Severity
  • Notify: this is where you will choose the specific method to get the notification event it can be Email, or a Webhook (Slack, PagerDuty,Datadog)
  • Metadata: key value pair to be sent as payload to the webhook
  • Tags: add a tag to an alert to easily search for same type of Alerts

Please note that you can only add tags, or add metadata, or change the notification recipient in the Content Pack Alerts.

Conclusion

vRLIC is really a valuable product for DAY 2 operations in VMC on AWS. In my next posts I will show you some examples on how we can leverage the logging and alerting features of the product.

Stay tune!

HCX MON Policy routes

I recently have had a question from a customer asking me how the default route is managed into HCX when Mobility Optimized Network is enabled.

Basically when extending multiple VLANS from on-premise to VMware Cloud on AWS by leveraging HCX, you may still want to send egress traffic to an on-premise security device in order for the workloads running in the SDDC to be protected.

I decided to check the way HCX is managing default gateway in my own on-premise lab.

Testing the default gateway with MON

I have a VMware vSphere lab environment with a 3 nodes VSAN cluster which is link via Internet to an SDDC that I have deployed with Terraform on our internal CSA Organisation. I have already established site pairing between both sites:

And created a service mesh:

Then I have extended 3 networks to the VMware Cloud on AWS SDDC: APP (VLAN 1712), DB (VLAN 1713) and WEB (VLAN 1711).

Two extended networks are MON enabled: APP and WEB.

On the APP network (VLAN 1712), I have deployed one VM (DEB10-APP01) which is running in my VSAN on-premise 3 nodes cluster. This network Extension hasn’t MON feature enabled.

I have also setup a second subnet called WEB(VLAN 1711). I have another VM (DEB10-WEB01) running there, which have been migrated on a new cluster in VMware Cloud on AWS. This extended network is MON enabled.

I have open a shell session and try to ping the on premise VM DEB10-APP01 from DEB10-WEB01. The trafic is flowing over the internet to my on-premise site through the service mesh:

This is a ping between my VM in SDDC (WEB, 172.11.11.105) to on-premise VM (172.11.12.107, APP)

Now I wanted to check where the default route is in the SDDC. Is it going to be the IGW through the T0 router or is it going to be the on-premise gateway?

To check, I have traceroute to the 8.8.8.8.

This is not using my on-premise gateway to egress traffic. However default traffic is going out through the T0 router and internet gateway of the SDDC in AWS.

So now how can I make traffic on a MON enabled network to egress via on-premises?

How the HCX MON Policy routing works?

How HCX MON Policy routing works is very simple. When MON is enabled on a network extended segment, HCX adds the gateway IP with /32 net mask into the SDDC Compute Gateway. For each VM that has MON enabled there is also a /32 route injection for created or Migrated Virtual Machines. So whenever a Virtual machine on the different segment in the SDDC wants to reach the VM it will allow reachability from SDDC Compute Gateway.

There is a default setting in the Policy routing that is evaluated whenever a destination is not within the SDDC:

If the destination IP is matched in the policy and allowed by the policy, the trafic is forwarded to the on-premise gateway. If the destination IP is not listed in the policy settings then the traffic is sent to the T0 router in the SDDC and routed accordingly.

Changing the HCX MON Policy routes

This setting can however be changed by editing it through a menu available from the ADVANCED tab in the console:

The default MON Policy setting is, as displayed, allowing only RFC-1918 subnets to be routed back to on-premise gateway :

Only private subnet traffic is forwarded to the on-premise router. Internet egress is sent to the SDDC Compute Gateway.

To change the default policy routing to route default egress traffic to on-premise, you simply have to add a 0.0.0.0/0 route in the list as allowed.

The default route is now displayed in the policy Routes setting:

Default route means that Internet traffic will flow over the Interconnect from SDDC to the on-premise gateway.

Let’s check it by launching a traceroute again from my VM in the SDDC:

It shows now that the traffic is sent to my on-premise default gateway (192.168.2.1).

Conclusion

When you have MON enabled with the default route policy settings and you are advertising 0.0.0.0/0 into the SDDC, HCX ignore this and still send traffic out of the IGW.

Whenever you advertised the default route to the SDDC through BGP, you must change the default Policy to avoid egress traffic to be sent to the Tier-0 router, otherwise you will experience asymmetric routing.

HCX Migration Event Details

As mentioned in my previous post, I have updated my HCX server to very latest version 4.0. Now it’s time to test some of the new features including the Mobility Migration Event details.

Introduction

Before we jump into a demonstration let’s quickly recap what the Mobility Migration Event Details feature provides:

  • It provides a detailed migration workflow information and it shows the state of migration and how long a migration remains in a certain state as well as a long a migration has been succeeded or failed.
  • It provides the detailed information for individual Virtual Machines that has been migrating from source to destination site. It’s working for Bulk Migration, HCX RAV, and OS Assisted Migration.

So now let’s see how the procedure works.

Procedure and Tests

I decided I am going to migrate two VMs back from my on-prem Lab to VMC on AWS. The two VMs are deb10-App01 and deb10-web01.

As you can see, I have already created a site paring between my source on-prem Lab and VMC on AWS datacenter.

I have also established a Service Mesh between my on-premise lab environment and one of the Lab SDDC.

The Service Mesh is established between my lab and VMC. I have enabled Application Path Resiliency and TCP Flow conditioning

The Service Mesh is a construct that associate two Compute Profiles.

Now let’s select the VMs for migration. For this exemple, I have chosen to migrate two VMs with the Bulk Migration option.

In order to launch the migration, I have to click on Migrate.

Next I am going to select the VMs I want to migrate.

I clicked the add button to move VM to a Mobility Group.

I now have to provide the information for Transfer and Placement. So basically I have selected the only possible resource pool = Compute-ResourcePool and datastore = WorkloadDatastore. I also switch the migration profile to Bulk Migration and the Folder to VMs migrated to Cloud.

Next Step is to validate the migration option by selecting the two VMs and clicking the Validate button.

The result display a few warnings only related to the VMtools I have installed into my Debian 10 VMs but the validation is Successful.

So I am gonna go ahead and start the migrations by clicking the green Go button.

and confirm it by clicking Validate.

In the beginning it shows only 0% BAse sync as the process starts.

We can click on the group info to see more information.

If I click on the group itself I can the see the list of Virtual Machines that are migrated.

After a few seconds we are starting to see the first event in the windows. If I click on individual VMs I can see the detailed events that is happening as the migration is taking place.

On the lower hand inside, you can see there is a separate section that provides the event information.

This section is divided in multiple part. Currently we see the Transfer Events section. There is a specific color coding there to distinguish tasks that are running on-premise and the one on destination. The darker blue shows the information collected on the target site.

It is possible to update regularly the list of events by clicking on EVENTS.

As the Base Sync in initiated, we can see the remaining time that’s stay to transfer the virtual machine. This is really handful when the size is very large to be aware of the time remaining to complete the transfer.

As the transfer event finishes, meaning the transfer of the VM is completed, we now see a Switch Over events section. This is visible for all of the Virtual Machines.

We can confirm that the switch over is ongoing from the first line.

After the witch over is finished, the latest events are Cleanup Events.

If I go back the Group Info, it shows me that on migration is finished and the other one is ongoing.

All the details of events is now listed in all sections.

All my Virtual Machines are now migrated and we saw a detailed events and real state of the migration of individual VMs.

This concludes this post thank you for watching it.

HCX 4.0 is now available.

A new version of my favorite solution for migrating to the Cloud has been recently released and I can wait to test it.

Let’s have a look at it…

What’s new in HCX 4.0

First of all what is on the list of new features for this version, there are multiple great enhancements around Migration, Network Extension, Service Mesh configuration and Usability.

Migration Enhancements

  • Mobility Migration Events details: The HCX Migration interface will display detailed event information with time lapse of events from the start of the migration operation.
  • NSX Security Tag Migration: Transfers any NSX Security tags associated with the source virtual machine when selected as an Extended Option for vSphere to vSphere migrations. See Additional Migration Settings
  • Real-time Estimation of Bulk Migration – HCX analyzes migration metrics and provides an estimation of the time required to complete the transfer phase for every configured Bulk migration. The estimate is shown in the progress bar displayed on the Migration Tracking and Migration Management pages for each virtual machine migration while the transfer is in underway. For more information, see Monitoring Migration Progress for Mobility Groups.
  • OS Assisted Migration Scaling – HCX now supports 200 concurrent VM disk migrations across a four Service Mesh scale out deployment. A single Service Mesh deploys one Sentinel Gateway (SGW) and its peer Sentinel Data Receiver (SDR), and continues to support up to 50 active replica disks each. In this Service Mesh scale out model for OSAM, the HCX Sentinel download operation is presented per Service Mesh. See OS Assisted Migration in Linux and Windows Environments.
  • Migrate Custom Attributes for vMotion  – The option Migrate Custom Attributes is added to the Extended Options selections for vMotion migrations. 
  • Additional Disk Formats for Virtual Machines – For Bulk, vMotion, and RAV migration types, HCX now supports these additional disk formats: Thick Provisioned Eager Zeroed, Thick Provisioned Lazy Zeroed. 
  • Force Power-off for In-Progress Bulk Migrations – HCX now includes the option to Force Power-off in-progress Bulk migrations, including the later stages of migration.

Network Extension Enhancements

  • In-Service Upgrade – The Network Extension appliance is a critical component of many HCX deployments, not only during migration but also after migration in a hybrid environment. In-Service upgrade is available for Network Extension upgrade or redeploy operations, and helps to minimize service downtime and disruptions to on-going L2 traffic. See In-Service Upgrade for Network Extension Appliances
  • Note: This feature is currently available for Early Adoption (EA). The In-Service mode works to minimize traffic disruptions from the Network Extension upgrade or redeploy operation to only a few seconds or less. The actual time it takes to return to forwarding traffic depends on the overall deployment environment.
  • Network Extension Details – HCX provides connection statistics for each extended network associated with a specific Network Extension appliance. Statistics include bytes and packets received and transferred, bit rate and packet rate, and attached virtual machine MAC addresses for each extended network. See Viewing Network Extension Details.

Service Mesh configuration Enhancements

HCX Traffic Type Selection in Network Profile – When setting up HCX Network Profiles, administrators can tag networks for a suggested HCX traffic type: Management, HCX Uplink, vSphere Replication, vMotion, or Sentinel Guest Network. These selections then appear in the Compute Profile wizard as suggestions of which networks to use in the configuration. See Creating a Network Profile.

Usability Enhancements

  • HCX now supports scheduling of migrations in DRAFT state directly from the Migration Management interface.
  • All widgets in the HCX Dashboard can be maximized to fit the browser window.
  • The topology diagram shown in the Compute Profile now reflects when a folder is selected as the HCX Deployment Resource.
  • In the Create/Edit Network Profile wizard, the IP Pool/Range entries are visually grouped for readability. 

Upgrading from 3.5.3 to 4.0

The process is straightforward. You just have to go to the HCX console and check the System Updates.

Click the Version on the list of available version to start updating. This will begin by updating the HCX Manager.

Once you have upgraded the HCX Manager, the HCX Network Extension appliances and Interconnect Appliances have to be upgraded.

For that we have to switch to the Service Mesh and Select Appliances Tab. On the right columns ‘Available Version’ you will see the latest build number available : 4.0.0-17562724 with a small flag NEW!

Now you just have to select the appliances (don’t select WAN Optimization as it will updated differently) and click UPDATE APPLIANCE.

I selected the Interconnect (IX) and choose UPDATE APPLIANCE

A confirmation window with a warning message that is recapping the process is displayed. Just select the Force Update appliance and Click UPDATE button.

A message confirming the launch of the upgrade is displayed.

This message is confirming the process starts and also that the remote appliances will be upgrade as well

After a few seconds, the process starts updating the appliance and you can see the successive tasks and operations in the vCenter Recent Tasks.

First of all it will deploy the new appliance from the OVF

Just after you will see that the appliance is reconfigured.

Step 2 is reconfiguring the appliance

Next the system will finish the update job and power on the newly deployed appliance.

The Tasks tab of the Service Mesh in HCX Console is also detailing in real time all the required steps followed by system to upgrade the appliance.

This window is showing the Network extension tasks

A the end of the operation, you can confirm that all tasks completed successfully if they have a green arrow on the left.

The final result for the IX appliance (Interconnect) with all successful steps

A confirmation that the process is finished successfully will also appear in the console.

You can confirm the process has been successful by reading the Upgrade complete message

Keep in mind that when you want to Start updating your appliance, it also upgrade the remote peer sites (could be a Cloud target like VMC on AWS or AVS or a target vCenter environment like VCF).

All my Appliances are now upgraded and tunnels shows as Up in the console.

Now that I have finalised the upgrades it’s time to evaluate the cool new features. I invite you to discover it in my next post!

Introducing Multi-Edge SDDC on VMC on AWS

The latest M12 release of SDDC (version 1.12) came with a lot of interesting storage features including vSAN compression for i3en, TRIM/UNMap (I will cover it in a future post) as well as new networking features like SDDC Groups, VMWare Transit Connect, Time-based Scheduling of DFW rules and many more.

One that typically stands out for me is the Multi-Edge SDDC capabilities.

Multi-Edge SDDC (or Edge Scaleout)

By default, any SDDC is deployed with a single default Edge (actually this a pair of VMs) which size is based on the SDDC sizing (Medium size by default). This edge can be resized to Large when needed.

Each Edge has three logical connections to the outside world: Internet (IGW), Intranet (TGW or DX Private VIF), Provider (Connected VPC). These connections share the same host Elastic Network Adapter ENA and it’s limits.

In the latest M12 version of VMC on AWS, VMC is adding Multi-Edge capability to the SDDC. This gives the customer the ability to add additional capacity for North-South network trafic by simply adding additional Edges.

The goal of this feature is to allow multiple Edge appliances to be deployed, therefore removing some of the scale limitations by:

  • Using multiple host ENAs to spread network load for traffic in/out of the SDDC,
  • Using multiple Edge VMs to spread the CPU/Memory load.
The Edge Scale out feature consists in the creation of a pair of additional EDGEs where specific traffic type can be steered to

In order to be able to enable the feature, additional network interfaces (ENA) are going to be provisioned in the AWS network and additional compute capacity are created.

It’s important to mention that you do need additional hosts in the management clusters of the SDDC to be able to support it. So this feature is coming with an additional cost.

Multi-Edge SDDC – Use Cases

The deployment of additional Edges allow for an higher network bandwidth for the following use cases:

  • SDDC to SDDC connectivity
  • SDDC to natives VPCs
  • SDDC to on-premises via Direct Connect
  • SDDC to the Connected VPC

Keep in mind that for the first three, VMWare Transit Connect is mandatory to allow the increased network capacity by deploying those multiple Edges. As a reminder, Transit Connect is a high-bandwidth, low latency and resilient connectivity option for SDDC to SDDC communication in an SDDC groups. It also enables an high bandwidth connectivity to SDDCs from natives VPCs. If you need more information on it, my colleague Gilles Chekroun has an excellent blog post here.

Multiple Edges permits to steer certain traffic sets by leveraging Traffic Groups.

Traffic Groups

Traffic Groups is a new concept which is similar in a way to the source Based Routing. Source based routing allow to select which route (next hop) to follow based on the source IP addresses. This can be an individual IP or complete subnet.

With this new capability customer can now choose to steer certain traffic sets to a specific Edge.

At the time you will create a traffic group, an additional active edge (with a standby edge) is going to be deployed on a separate host. All Edge appliances are deployed with an anti-affinity rule to ensure only one Edge per host. So there need to be 2N+2 hosts in the cluster (where N=number of traffic groups).

Each additional Edge will then handles traffic for its associated network prefixes. All remaining traffic is going to be handled by the Default Edge.

Source base routing is configured with prefixes defined in prefix lists than can be setup directly in the VMC on AWS Console.

To ensure proper ingress routing from AWS VPC to the right Edge, the shadow VPC route tables are also updated with the prefixes.

Multi-Edge SDDC requirements

The following requirements must be met in order to leverage the feature:

  • SDDC M12 version is required
  • Transit Connect for SDDC to SDDC or VPC or SDDC to on-prem
  • SDDC resized to Large
  • Enough capacity in the management cluster

A Large SDDC means that management appliances and Edge are scaled out from Medium to Large. This is now a customer driven option that doesn’t involve technical Support anymore as it’s possible to upsize an SDDC directly from the Cloud Console.

Large SDDC means an higher number of vCPUs and memory for management components (vCenter, NSX Manager and Edges) and there is a minimal 1 hour downtime for the upscaling operations to finish, so it has to be planned during a Maintenance Window.

Enabling a Multi-Edge SDDC

This follow a three step process.

First of all, we must define a Traffic Group that is going to create the new Edges (in pair). Each Traffic group creates an additional active/standby edge. Remember also the “Traffic Group” Edges are always Large form-factor.

Immediately you will see that 2 additional edge Nodes are going to be deployed. The New Edges have a suffix name with tg in it.

Next step you have to define a prefix list with specific prefixes and associate a Prefix List. It will contain the source IP adresses of SDDC virtual machines that will use the newly deployed Edge.

After some minutes, you can confirm that the Traffic groups is ready:

NB: NSX-T configures source based Routing with the prefix you define in the prefix list on the CGW as well as the Edge routers to ensure symmetric routing within the SDDC.

You just need to click on Set to enter the prefix list. Enter the CIDR range, it could a /32 if you just want to use a single VM as a source IP.

NB: Up to 64 prefixes can be created in a Prefix List.

When you done entering the subnets in the prefix list, Click Apply and Save the Prefix List to create it.

Last step is to associate the Prefix List to the Traffic Group with Association Map. To do so click on Edit.

Basically we now need to tell what prefix list to use for the traffic group. Click on ADD IP PREFIX ASSOCIATION MAP:

Then we need to enter the Prefix List and give a name to the Association Map.

Going forward any traffic that matches that prefix list will be utilising the newly deployed Edge.

Monitoring a Multi-Edge SDDC

Edge nodes cannot be monitored on VMC Console but you can always visualise the Network Rate and consumption through the vCenter web console.

When we look at the vCenter list of Edges, the default Edge has no “-tg” in its name. So basically the NSX-Edge-0 is the default. As long as we add the new traffic group, the traffic is going to manage the additional traffic and liberate the load on this default Edge.

The NSX-Edge-0-tg-xxx is the new one and we can see an increase on this new Edge in the traffic consumption on it now because of the new traffic going to flow over it:

What happened also after the new scale edge is deployed, is that the prefix list is using the new Edge as its next-hop going forward. This is also propagated to the internal route tables from Default Edge as well as CGW route table.

All of these feature is exposed over the API Explorer, for the traffic Groups definition in the NSX AWS VMC integration API. Over the NSX VMC Policy API for the prefix list definition perspective.

In conclusion, remember that Multi Edge SDDC doesn’t increase Internet capacity nor VPN or NAT capacity. Also that there is a cost associated to it because of the additional hardware requirements.