Find out What's Going On
VMware is glad to see that the Microsoft Exchange Server (and Performance) teams appear to have identified the prevalent cause of performance-related issues in an Exchange Server 2013 infrastructure. We have been aware for several years that Microsoft’s sizing recommendation for Exchange Server 2013 is the number one cause of every performance issue that have been reported to VMware since the release of Exchange Server 2013, and it is gratifying that Microsoft is acknowledging this as well.
In May of 2015, Microsoft released a blog post titled “Troubleshooting High CPU utilization issues in Exchange 2013” in which Microsoft acknowledged (for the first time, to our knowledge) that CPU over-sizing is one of the chief causes of performance issues on Exchange Server 2013. We wish to highlight the fact that the Exchange 2013 Server Role Requirements Calculator is the main culprit in this state of affair. One thing we noticed with the release of Exchange Server 2013 and its accompanying “Calculator” is the increase in the compute resources it recommends when compared to similar configuration in prior versions of Exchange Server.
We at VMware have been drawing our customers’ attention to this anomaly and have been educating them to not take the “Calculator’s” recommendation as gospel. Unfortunately, not many customers like to buck Microsoft, especially in the face of strident claims of “This is our product and we are the experts”. Sadly, customers who have moved to Exchange Server 2013, using the Calculator’s recommendation (or the equally disruptive “Preferred Architecture” design from Microsoft) have been invariably hurt by the unsound recommendation.
Fortunately for everyone concerned, Microsoft appears to be moving in the right direction – if the recent blog post from the Exchange Server’s Principal PM is an indication of things to come. in his “Ask the Perf Guy: How big is too BIG?” Jeff Mealiffe expounded on the revelation in the “Troubleshooting High CPU utilization issues in Exchange 2013” blog post, and provided a handy chart of recommended Exchange Server 2013 Server CPU and Memory sizing:
In a nutshell, we recommend not exceeding the following sizing characteristics for Exchange 2013 servers, whether single-role or multi-role (and you are running multi-role, right?).
|Recommended Maximum CPU Core Count|| |
|Recommended Maximum Memory|| |
If you have upgraded your Exchange infrastructure to Exchange Server 2013, you will do yourself a lot of good if you find the opportunity to completely read the discussion presented in the links above.
A corresponding (and hopefully, more reasonable) “Calculator” has also been released to accompany this new recommendation – Exchange 2013 Server Role Requirements Calculator.
While we are glad that Microsoft has evolved in some ways and the Exchange team is now more open in discussing the inherent defects in Exchange Server 2013, we cannot but notice that Jeff’ et al continued to push the “Combined Role” design recommendation, in spite of the fact that such design unnecessarily complicates performance troubleshooting and hinders fault domain isolation. We at VMware once wondered what necessitated Microsoft’s change in design prescription around the time of Exchange Server 2013’s release (Microsoft previously championed separated roles design, with the exception of the CAS/HT roles). Our (speculative) conclusion was that it was the ONLY reasonable design option that the Microsoft Exchange Server team could propose in order to continue to justify the “Exchange is Better on Physical” design proposition favored by Microsoft. The “Better on Hardware” mindset is the basis of the Preferred Architecture.
One of the major issues addressed in Jeff’s and the previous blog posts is the way Exchange pre-allocates memory based on the number of CPU cores that it “sees”. We suspect that this is the main reason why the Exchange Team is not virtualization-friendly. Perhaps on some hypervisors, the virtualized Exchange server “sees” ALL the CPUs that the parent sees, hence if the parent host sees 64 CPUs cores, Exchange Server will count all 64 cores as needing to be accounted for in memory allocation, even if the Exchange VM itself has only been allocated, say, 8 vCPUs. We speculate. But, this would be a logical rationale for Microsoft’s insistence on multitudinous proliferation of “itsy-bitsy-sized” silo’ed physical hardware for Exchange Server. For the avoidance of doubt, this is NOT a problem on the vSphere platform – the virtualized Exchange Server does NOT “see” more than the number of CPUs that is has been allocated, regardless of the size of the Esxi Host’s physical hardware.
We would like to enthusiastically echo Jeff’s conclusion in his blog post:
It’s a fact that many of you have various constraints on the hardware that you can deploy in your datacenters, and often those constraints are driven by a desire to reduce server count, increase server density, etc. Within those constraints, it can be very challenging to design an Exchange implementation that follows our scalability guidance and the Preferred Architecture. Keep in mind that in this case, virtualization may be a feasible option rather than a risky attempt to circumvent scalability guidance and operate extremely large Exchange servers. Virtualization of Exchange is a well understood, fairly common solution to this problem, and while it does add complexity (and therefore some additional cost and risk) to your deployment, it can also allow you to take advantage of large hardware while ensuring that Exchange gets the resources it needs to operate as effectively as possible.
One more item of important – PLEASE do not rely on the CPU/RAM sizing recommendation of the Exchange Calculator as sole determinant of how much compute resources you will allocate to your VIRTUALIZED Exchange servers. In addition to the calculator’s recommendation being intentionally generous in its recommendation, the conservative approach to maximum utilization target of the allocated resources is VERY PROBLEMATIC in a virtual environment. One of the major tenets of virtualization is resource sharing. In order to ensure equitable sharing of the pooled resources in a virtualized environment, it is important to ensure that a VM does not unnecessarily hog resources. A VM should have adequate access to the compute resources that it NEEDS whenever it needs them. However, in doing so, the VM should not have more compute resources than it needs, otherwise such VMs will contravene the principles of “equitable sharing” and “fairness” in the virtual environment.
Here is a sample of the recommended compute resources that the Exchange Server team released with the latest Exchange Server 2013 Calculator:
At its most extreme, the Exchange servers shown in the image above are NOT expected to exceed a 46% utilization threshold. In steady state operation, the target is 28% of allocated resources. In any scenario, these numbers would be considered gross wastage, with the type of ROI that gives a CFO persistent ulcers. In a virtualized environment, such gross under-utilization will be noticeably detrimental to the virtualized workloads. In our experience, baselining your virtualized Exchange workload at 70% of the Calculator’s recommended sizes has always been a prudent choice for our customers. One of the benefits of virtualization is that adjusting this allocation upwards is a trivial exercise that does not take more than 5 minutes of schedule downtime – a much better proposition than oversizing and running into not just the issues described in the blog posts above, but also having the VM under-perform because it was not able to judiciously use its allocated monster-size resources. This is why we caution our customers against basing their Exchange Server virtualization projects on the prescriptions of the Microsoft Preferred Architecture (PA). The Preferred Architecture assumes that Exchange will be hosted on physical servers, so it has no notion of the “fairness” doctrine described above. Trying to retrofit a Preferred Architecture design onto a virtual environment invariably leads to severe performance issues.