Recently, I faced a weird situation with 1 specific virtual server running an application and a database. This particular server tended to crash/bluescreen every day around lunch time.
The server was a fully patched and up-to-date Windows 2008 R2 machine. The hypervisor was a fully patched and up-to-date HyperV server.
Using BlueScreenView, I was able to find out it had something to do with memory allocation. But still it was very unclear:
13122016 10:49:10 PAGE_FAULT_IN_NONPAGED_AREA 0x00000050 fffff680`003beb00 00000000`00000000 fffff800`016f3fec 00000000`00000002 ndistapi.sys ndistapi.sys+4009f50 NDIS 3.0 connection wrapper driver Microsoft® Windows® Operating System Microsoft Corporation 6.1.7600.16385 (win7_rtm.090713-1255) ntoskrnl.exe+70400
8-12-2016 13:13:11 IRQL_NOT_LESS_OR_EQUAL 0x0000000a fffff6fb`40001dd8 00000000`00000000 00000000`00000000 fffff800`016b8fec ntoskrnl.exe ntoskrnl.exe+70400 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.1.7601.23569 (win7sp1_ldr.161007-0600) ntoskrnl.exe+70400
1-12-2016 11:30:18 MEMORY_MANAGEMENT 0x0000001a 00000000`00003452 00000000`773cb000 fffff700`010c0400 02700007`c9d6d424 ntoskrnl.exe ntoskrnl.exe+70400 NT Kernel & System Microsoft® Windows® Operating System Microsoft Corporation 6.1.7601.23569 (win7sp1_ldr.161007-0600) ntoskrnl.exe+70400
Poolmon and RAMMAP didn’t point towards any memory leak in any of the installed applications. I had updated the Integration Services to the very latest version. Still, the problem remained.
Then somewhat lucky I stumbled upon the Page File settings. A long time ago, a Custom Size was set (1024-32768MB) for the D-drive. However, during the years the D-Drive was filling up and now it had only 20000MB free space available.
After I had configured the paging file size to “System Managed”, the system had never unexpectedly rebooted anymore. The cause of the blue screen apparently had something to do with that page file. I analysed the D-drive and noticed that no page file (pagefile.sys) existed. The OS might have noticed the lack of free space and then had decided not to create any page file at all.
My humble suspicion is
– the running application on the server looks in the system settings for Page File configuration, and assumes it can write to D:\pagefile.sys. It is not aware that the page file does not exist.
– At some time, the OS runs out of virtual memory.
– The Application tries to write data to page file memory, which obviously isn’t possible. Result: Blue Screen