Recently, I shared a post from my colleague Nathalie D’Hers about enabling remote work at Microsoft. D’Hers is a leader on our Microsoft Core Services Engineering and Operations (CSEO) team, the internal IT team that builds and operates the systems that run Microsoft. Every day, tens of thousands of our employees connect to our network using a virtual private network (VPN). And it’s CSEO’s job to make sure that VPN performs reliably, even when we experience a spike in usage. Here, I’m sharing a post from the team detailing how they achieve that. I think you’ll find it useful as you consider your own organization’s VPN platform.
Enhancing VPN performance at Microsoft
- DOWNLOAD PDF
SEO has redesigned our VPN platform, using split-tunneling configurations and new infrastructure that supports up to 500,000 simultaneous connections. The new design uses Windows 10 VPN profiles to allow auto-on connections, delivering a seamless experience for our users.
- EXPLORE RELATED CONTENT
Modern workers are increasingly mobile and require the flexibility to get work done outside of the office. Every weekday, an average of 45,000 to 55,000 Microsoft employees use a virtual private network (VPN) connection to remotely connect to the corporate network. On weekends and during non-peak hours, that number only dips slightly to 25,000 to 35,000. Microsoft Core Services Engineering and Operations (CSEO), as part of our overall Zero Trust Strategy, has redesigned the VPN infrastructure at Microsoft—simplifying the design and consolidating access points. We have increased capacity and reliability, while also reducing reliance on VPN by moving services and applications to the cloud.
- Providing a seamless remote access experience
Remote access at Microsoft is reliant on the VPN client, our VPN infrastructure, and public cloud services. We have had several iterative designs of the VPN service inside Microsoft. Regional weather events in the past required large increases in employees working from home, heavily taxing the VPN infrastructure and requiring a completely new design. Three years ago, we built an entirely new VPN infrastructure, a hybrid design, using Microsoft Azure Active Directory (Azure AD) load balancing and identity services with gateway appliances across our global sites.
The key to our success in the remote access experience was our decision to deploy a split-tunneled configuration for the majority of employees. We have migrated nearly 100 percent of previously on-premises resources into Azure and Office 365. Our continued efforts in application modernization are reducing the traffic on our private corporate networks as cloud-native architectures allow direct internet connections. The shift to internet-accessible applications and a split-tunneled VPN design has dramatically reduced the load on VPN servers in most areas of the world.
- Using VPN profiles to improve the user experience
We use Microsoft Endpoint Manager to manage our domain-joined and Azure AD–joined computers and mobile devices that have enrolled in the service. In our configuration, VPN profiles are replicated through Microsoft Intune and applied to enrolled devices; these include certificate issuance that we create in Configuration Manager for Windows 10 devices. We support Mac and Linux device VPN connectivity with a third-party client using SAML-based authentication.
We use certificate-based authentication (public key infrastructure, or PKI) and multi‑factor authentication (MFA) solutions. When employees first use the Auto-On VPN connection profile, they are prompted to authenticate strongly. Our VPN infrastructure supports Windows Hello for Business and Multi-Factor Authentication. It stores a cryptographically protected certificate upon successful authentication that allows for either persistent or automatic connection.
For more information about how we use Microsoft Intune and Endpoint Manager as part of our device management strategy, see Managing Windows 10 devices with Microsoft Intune.
- Configuring and installing VPN connection profiles
We created VPN profiles that contain all the information a device requires to connect to the corporate network, including the supported authentication methods and the VPN gateways that the device should connect to. We created the connection profiles for domain-joined and Microsoft Intune–managed devices using Microsoft Endpoint Manager.
Installing the VPN connection profile
The VPN connection profile is installed using a script on domain-joined computers running Windows 10, through a policy in Endpoint Manager.
For more information about how we use Microsoft Intune as part of our mobile device management strategy, see Mobile device management at Microsoft.
- Conditional Access
We use an optional feature that checks the device's health and corporate policies before allowing it to connect. Conditional Access is supported by connection profiles, and we’ve started using this feature in our environment.
Rather than just relying on the managed device certificate for a “pass” or “fail” for VPN connection, Conditional Access places machines in a quarantined state while checking for the latest required security updates and antivirus definitions to help ensure that the system isn’t introducing risk. On every connection attempt, the system health check looks for a certificate that the device is still compliant with corporate policy.
- Certificate and device enrollment
We use an Azure AD certificate for a single sign-on to the VPN connection profile. And we currently use Simple Certificate Enrollment Protocol (SCEP) and Network Device Enrollment Service (NDES) to deploy certificates to our mobile devices via Microsoft Endpoint Manager. The SCEP certificate we use is for wireless and VPN. NDES allows software on routers and other network devices running without domain credentials to obtain certificates based on the SCEP.
NDES performs the following functions:
- It generates and provides one-time enrollment passwords to administrators.
- It submits enrollment requests to the certificate authority (CA).
- It retrieves enrolled certificates from the CA and forwards them to the network device.
When a device-compliance–enabled VPN connection profile is triggered (either manually or automatically):
- The VPN client calls into the Windows 10 Azure AD Token Broker on the local device and identifies itself as a VPN client.
- The Azure AD Token Broker authenticates to Azure AD and provides it with information about the device trying to connect. A device check is performed by Azure AD to determine whether the device complies with our VPN policies.
- If the device is compliant, Azure AD requests a short-lived certificate. If the device isn’t compliant, we perform remediation steps.
- Azure AD pushes down a short-lived certificate to the Certificate Store via the Token Broker. The Token Broker then returns control back over to the VPN client for further connection processing.
- The VPN client uses the Azure AD–issued certificate to authenticate with the VPN gateway.
- Remote access infrastructure
At Microsoft, we have designed and deployed a hybrid infrastructure to provide remote access for all the supported operating systems—using Azure for load balancing and identity services and specialized VPN appliances. We had several considerations when designing the platform:
- The service needed to be highly resilient so that it could continue to operate if a single appliance, site, or even large region failed.
- As a worldwide service meant to be used by the entire company and to handle the expected growth of VPN, the solution had to be sized with enough capacity to handle 200,000 concurrent VPN sessions.
- Homogenized site configuration. A standard hardware and configuration stamp was a necessity both for initial deployment and operational simplicity.
- Central management and monitoring. We ensured end-to-end visibility through centralized data stores and reporting.
- Azure AD–based authentication. We moved away from on-premises Active Directory and used Azure AD to authenticate and authorize users.
- Multi-device support. We had to build a service that could be used by as much of the ecosystem as possible, including Windows, OSX, Linux, and appliances.
- Being able to programmatically administer the service was critical. It needed to work with existing automation and monitoring tools.
When we were designing the VPN topology, we considered the location of the resources that employees were accessing when they were connected to the corporate network. If most of the connections from employees at a remote site were to resources located in central data centers, more consideration was given to bandwidth availability and connection health between that remote site and the destination. In some cases, additional network bandwidth infrastructure has been deployed as needed.
wwwww
ReplyDelete