After upgrading to VMWARE 3.5 and VirtualCenter 2.5, the SYSTEM proces on our File server had an high CPU load (100%). This especially occurred when we were copying files from the file server. Killing the SYSTEM proces did not result in a lower CPU load.
The problem had the following characteristics:
- After a migration of a virtual machine to another ESX server, the SYSTEM proces constantly took 100% CPU
- Killing the SYSTEM proces did not help
- Rebooting the server did not help lowering the SYSTEM CPU usage
- A reboot of the VMWARE ESX server solved the problem for only a while
- You have VMware Distributed Resource Scheduler (DRS) enabled on VMWARE ESX 3.5
After some research I found the solution (copied from the VMWARE site):
Starting with ESX Server 3.5 and VirtualCenter 2.5, VMware DRS applies a cap to the memory overhead of virtual machines to control the growth rate of this memory. This cap is reset to a virtual machine specific computed value after VMotion migrates the virtual machine. Afterwards, if the virtual machine monitor indicates that the virtual machine requires more overhead memory, VMware DRS raises this cap at a controlled rate (1MB per minute, by default) to grant the required memory until the virtual machine overhead memory reaches a steady-state and as long as there are sufficient resources available on the host.
For VirtualCenter 2.5, this cap is not increased to satisfy the virtual machine’s steady-state demand as expected. Thus, the virtual machine operates with an overhead memory that is less than its desired size, which in turn may lead to higher observed virtual machine CPU usage and lower virtual machine performance in a VMware DRS-enabled cluster.
Diagnosing the Issue
To diagnose the issue:
- Log in to VirtualCenter with Virtual Infrastructure Client as an administrator.
- Right-click your cluster from the inventory.
- Click Edit Settings.
- Disable VMware DRS.
- Click Ok and wait for 1 minute.
- In the Virtual Infrastructure Client, note the virtual machine’s CPU usage from performance tab and the virtual machine’s memory overhead from the summary tab.
- Right-click your cluster from the inventory.
- Click Edit Settings.
- Re-enable VMware DRS.
- se VMotion to migrate a problematic virtual machine to another host.
- Note the virtual machine CPU usage and memory overhead on the new host.
- Disable VMware DRS on the cluster again, as noted above and wait for 1 minute.
- Note the virtual machine CPU usage and memory overhead on the new host.
- If the CPU usage of the virtual machine increases in step 11 in comparison to step 6, and decreases back to the original state (similar to the behavior in step 6) in step 13 with an observable increase in the overhead memory, this indicates the issue discussed in this article.
You do not need to disable DRS to work around this issue.
Working around the issue
To work around this issue:
- Log in to VirtualCenter with Virtual Infrastructure Client as an administrator.
- Right-click your cluster from the inventory.
- Click Edit Settings.
- Ensure that VMware DRS is shown as enabled. If it is not enabled check the box to enable VMware DRS.
- Click OK.
- Click an ESX Server from the Inventory.
- Click the Configuration tab.
- Click Advanced Settings.
- Click the Mem option.
- Locate the Mem.VMOverheadGrowthLimit parameter.
- Change the value of this parameter to 5 and click OK.
Note: By default this setting is set to -1.
Verifying the workaround
To verify the setting has taken effect:
- Log in to your ESX Server service console as root from either an SSH Session or directly from the console of the server.
- Type less /var/log/vmkernel.
A successfully changed setting displays a message similar to the following and no further action is required:
vmkernel: 1:16:23:57.956 cpu3:1036)Config: 414: VMOverheadGrowthLimit” = 5, Old Value: -1, (Status: 0x0)
If changing the setting was unsuccessful a message similar to the following is displayed:
vmkernel: 1:08:05:22.537 cpu2:1036)Config: 414: “VMOverheadGrowthLimit” = 0, Old Value: -1, (Status: 0x0)
Note: If you see a message changing the limit to 5 and then changing it back to -1, the fix is not successfully applied.
In the case that the fix is unsuccessful attempt the following:
- Create a new cluster and move the ESX Server hosts to this cluster.
- Check to see if the fix has been implemented successfully.
To fix multiple ESX Server hosts
If this parameter needs to be changed on several hosts (or if the workaround fails for the individual host) use the following procedure to implement the workaround instead of changing every server individually:
- Log on to the VirtualCenter Server Console as an administrator.
- Make a backup copy of the vpxd.cfg file (typically it is located in C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg).
- In the vpxd.cfg file, add the following configuration after the <vpxd> tag:
<cluster><VMOverheadGrowthLimit>5</VMOverheadGrowthLimit></cluster>
This configuration provides an initial growth margin in MB-to-virtual machine overhead memory. You can increase this amount to larger values if doing so further improves virtual machine performance. - Restart the VMware VirtualCenter Server Service.
Note: When you restart the VMware VirtualCenter Server Service, the new value for the overhead limit should be pushed down to all the clusters in VirtualCenter.
This issue will be addressed in a future VMware VirtualCenter update release. The workarounds will not be needed in the update release and in any subsequent releases of VirtualCenter.
Hi, I have the same problem.
This is my environment:
Host: VMware Server 2: On Windows Server 2008 Standard x64
Guest1: Windows Server 2008 Standard x32
Guest2: Windows Server 2008 Standard x32
Both guest have a web .NET application running on them.
There is a 100% CPU usage in each VM. The process is w3wp.exe. It starts when I use the web application.
Any ideas?
Thanks
Enzote
Hi This is Satish , to health check the server there is free tool called vscope. just we need to install and add the esx or vcenter server , after opening the application we can find out the health check of virtual machines.