Low pool memory

Basically, the two types are Non-Paged Pool Memory and Paged Pool Mmeory.

Overview

There are two certain types of memory that is considered to be used by the kernel, and which could run out. One is called “Non-Paged Pool” (“NPP”) memory, and the other is called “Paged Pool” memory.

[#npp32bit]: Problems for 32-bit operating systems

On 32-bit operating systems using pre-Vista kernels (so, the 32-bit versions of Windows Server 2003 and Windows XP), the Non-Paged Pool memory is limited enough that some systems will use it up. This can lead to software malfunctioning, including critical parts of the operating system. The following symptoms may occur, quite possibly in this order: On a web server using IIS, HTTP.SYS may become unable to operate as needed and so web pages fail to be delivered. Backups may be affected. As the memory becomes even more scarce, more issues may occur. Terminal Server sessions fail to start. Other login types also fail to start. Already-logged-in sessions are unable to operate well: visible windows may not fully draw their pixels (especially for remote terminal sessions). Remote management using RPC programs may fail. The system cannot even shut down cleanly, and once it is restarted, the event logs may be misleading about when a reboot occurred. MSDN Blog by David Wang, about IIS6 failing to accept connections notes “the machine will blue screen.”

TechNet Blog: “Ask the Performance Team” blog entry about the MmSt Pool Tag lists some impacts.

[#iflowplm]: Detecting whether these memory types are too low

In a nutshell, at the time of this writing (in the year 2011 A.D.), the answer for a 64-bit Microsoft Windows operating system is generally always the same: No. In such an operating system, there is so much memory available to the kernel that computers are not likely to run out of that type of memory before other types of memory are likely to run out. Vista and newer may also be less likely to encounter the problem. Mark Russinovich's page on Paged and NPP shows this: XP and Server 2003 32-bit may be limited to 256MB, while 32-bit Vista and newer can use up to 2GB and 64-bit (XP, 2003, Vista or newer) may use 128GB.

Microsoft KB 912376 says, “The amount of available paged pool memory depends on several factors. These factors include boot switches such as /USERVA and /3GB, registry settings, and physical RAM.” Other factors may include the operating system and registry entries.

[#iflownpp]: NPP Reporting Threshold

Perhaps 190 million bytes will be a good number to report on. Otherwise, a slightly higher amount like 200 million bytes may be a good limit to enable alerting on. This is a generalization: Some systems may have problems even if substantially less memory is used. Here is how that number was determined as a general number:

Looking at machines using 32-bit Windows Server 2003 or 32-bit WinXP

Note that not all machines will be able to function right if 185 million bytes are used. The number of 185 million bytes being okay is based on the idea of about 256 MB of NPP memory being available, which is true for Windows 2000 systems using 1.5GB (or more) of memory. TechNet “Ask the Performance Team” Blog: “Understanding Pool Resources” (in the “Memory Management” section) documents shows for Windows 2003 SP1 that there is 254MB max NPP on a 1.5GB system, and 252MB max on a 2GB system. (It seems odd that a 2GB system would have less NPP memory than a 1.5GB system.) Systems with less RAM may have noticably less NPP memory. Systems with the /3GB from \BOOT.INI active may have only 128MB of NPP memory. The alerting threshold described here is unlikely to cause false alarms on any computer using these older operating systems, but lower alerting thresholds may be more useful on some systems.

TechNet: When “Nonpaged Pool is over the warning threshold” discusses that the /3GB switch may cause issues to occur if Non-Paged Pool usage is 100MB or more.

TechNet: When “Nonpaged Pool is over the warning threshold” says, “On a healthy Exchange server, unless a backup or restore is occurring, there should be no more than 85 MB of non-paged pool memory being used.” Experience has shown that number may be a bit on the low side: perhaps the quote was meant to apply to a server that only runs Exchange and no other operations like AD, file serving, etc. Checking real-world systems running Windows Server 2003 may show that a normal system may, someone commonly, use up to nearly 185 million bytes, but not reaching nor exceeding that amount.

Some web pages mention 80% of pool memory being used. A couple of examples are Microsoft KB 312362 (and, for additional reading, the similar Microsoft KB 304101) which mention changing a treshold from 80% to 60%. Also, CTHaun's archived page about NPP Memory Depletion (causing web pages to stop being served) says, “Sometimes connections get refused when it appears like there is more than 20MB of NPP left.  So it may be more like 20% of the total NPP memory rather than 20MB.  Perhaps when NPP has become 80% depletion, http.sys will begin to refuse client connections.”

256MB = 268,435,456, and 80% of that would be 214,748,364.8 bytes. (Huh? .8 of a byte? That would be 6.4 bits...) 214,748,364.8 - 185,000,000 = 29,748,364.8, so there seems to be about a 28MB warning threshhold. Since 185,000,000 was determined by reviewing some actual systems from various organizations, and Adding half of that to the lower number, 185 million bytes, results in a value of about 199 million bytes.

TechNet: When “Nonpaged Pool is over the warning threshold” notes that some systems will use an alerting threshold of 200 MB which would be 209,715,200 bytes. However, that is 4.8MB away from the problematic value of 80% of 256MB. If a bit earlier warning may be given, why not use it?

In addition to TechNet: When “Nonpaged Pool is over the warning threshold”, there is also TechNet: When “Nonpaged Pool is over the error threshold”.

Parallels: KB on troubleshooting paged and nonpaged memory pool shortage says, “Values higher than 220,000 KB are considered dangerous for the system stability (x86 platform only).”

So, to summarize, an amount of at least 185 million bytes, and perhaps no more than 200 MB (using traditional binary megabyte measurements) or even 200 million bytes, may be ideal.

Considering 64-bit machines and those using newer versions of Windows

A sensible starting point for an alerting threshold may be to match the starting alerting threshold that is used for 32-bit operating systems. Microsoft KB Q294418: Comparison of 32-bit and 64-bit memory architecture for 64-bit editions of Windows XP and Windows Server 2003 shows higher limits for 64-bit Windows. One might wonder why to bother alerting above 200 million bytes if the limits aren't quite so likely to be exhausted.

Simple: Even such a low alerting threshold is probably not too low if there are no false positives. If a machine normally doesn't exceed that amount of memory, and then it starts to, then the change is, well, a change. Even if it isn't threatening system stability, knowing and understanding changes may be worthwhile. Therefore, having an alert to investigation may be worthwhile. Also, if there is a driver that is wasting a limited resource, even if the limit is sufficiently large that system stability isn't affected, knowing about improper memory handling that exists may be nicer than just being ignorant of such a situation. MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” says, “64bit(x64&ia64) machines have less of a problem here due to their larger address space but there are still limits and thus no free lunch.”

NPP usage during backups

Note, though, that NPP usage may commonly increase during backups, and if that routinely happens on a 64-bit machine operating system then there is NOT a problem. In such a case, using a much higher alerting threshold (or even abandoning this specific monitoring) may be perfectly sensible.

[#msdnsnpp]: DNS server

If the issue seems to be the DNS server...

TechNet: Configuring the (DNS) socket pool shows that 18MB for sockets plus 72MB for buffers may use up 90MB. Clearly that could be the largest user of memory of an alert indicates at least 195MB is being used. Either accept the situation and adjust the alerting, or modify the behavior of the DNS server. For information on checking this memory usage and/or modifying it, see Microsoft DNS Server: Socket Pool/Size.

Paged Pool Memory (Alerting Treshhold)

This may not be quite as critical to alert on (compared to NPP), as issues from Paged Pool Memory exhaustion/depletion may be more rare.

This might not be quite as precise as NPP, but here are some pointers to information for anybody interested in more details: Microsoft KB Q294418: Comparison of 32-bit and 64-bit memory architecture for 64-bit editions of Windows XP and Windows Server 2003 shows 32-bit versions of those operating systems may have a limit of 470MB. MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” mentions that problems may exist around 200MB insted of “around 460MB”, and provides information about how to get the actual maximum for a system.

TechNet: Paged pool is over the warning threshold provides some maximums limits: “In Windows 2000 Server and Windows 2000 Advanced Server, the maximum value for paged pool memory is 470 MB. When the /3GB switch is added to a computer running Windows 2000 Advanced Server, the maximum value for paged pool memory is 192 MB. On a computer running Windows Server 2003 without the /3GB switch, the maximum value for paged pool memory is 491 MB. On a computer running Windows Server 2003 with the /3GB switch, the maximum value for paged pool memory is 256 MB.

TechNet “Ask the Performance Team” Blog: “Understanding Pool Resources” (in the “Memory Management” section) seems to show some lower numbers; as low as 160MB for Win2K systems that do use Terminal Services and which haven't had a specific registry change made.

MS KB 912376 states, “Under standard load, there should be approximately 50 MB of available paged pool memory. If you have less than 30 megabytes free, you should take immediate steps to reduce the load on the server.”

Determining current values
Graphical methods
Task Manager
On the “Performance” tab of Task Manager (TaskMgr.exe) is a section called “Kernel Memory”. (The “Total” may simply be a result of simply adding the Paged and Nonpaged values. However, the Paged and Nonpaged values are apparently truncated, as the Total might be 1 higher than adding the values displayed for Paged and Nonpaged.)
Performance Logs/Alerts
...
Process Explorer

(See information on Process Explorer by Sysinternals.)

Either go to the View menu and choose System Information, or get to the same System Information screen by pressing Ctrl-I. There is a “Kernel Memory” section.

Potentially even better than Task Manager, the display in Process Explorer might be able to show useful information for the “Paged Limit” and “Nonpaged Limit” section. However, it can only do that in symbols are available. They probably aren't. To get them, obtain the Symbols files (which might be available by obtaining a file located after visiting MSDN: Download and Install Debugging Tools for Windows), and then in Process Explorer, set the location of the Symbols files by going to the Options menu and then choosing “Configure Symbols...”.

Debugging software
...
Text mode/command line
WMIC or Poolmon
See info about what individual processes are using up pool memory. (Then add the values?)
Debugging Tools
KB 970054 shows that using !vm may “confirm pool resource depletion.”
Need review/clean-up:

If things are operating well enough to determine how much NPP is being used (e.g. if Task Manager is running well), check out the amount of NPP memory being used.

Need review/clean-up: Microsoft KB Q918643: How to troubleshoot a memory leak or an out-of-memory exception in the BizTalk Server process notes, “However, the /3GB switch allows for only 1 GB of addressable memory for kernel mode operations. Additionally, this switch may increase the risk of running out of pool memory.” MS KB 815372: section about the /3GB switch enabled in the \boot.ini file says “Using this switch reduces the memory available in” the Nonpaged Pool, the Paged Pool, and the system pool called “System Page Table Entries” (“PTEs”).

Need review/clean-up: There may also be an option involving Performance Log and Counter. This might not be the quickest way to get information when there is a known problem, but it may be a very good way to automatically detect when a problem is starting to get out of hand. (Further information would be good to have here...)

[#rsplwplm]: Responding to the issue(s) of low pool memory

MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” says, “This issue is commonly misdiagnosed, however, 90% of the time it is actually quite possible to determine the resolution quickly without any serious effort at all!”

Perhaps Tate meant significant effort that takes many hours to do. Following the steps in this guide may take 15 minutes to an hour, but does discuss what is generally the best way to handle this sort of situation to minimize current and future downtime.

[#nppresp]: Responding to the issue(s) from NPP (and perhaps Paged too?)

(Note: This section might now also suitably cover Paged Pool Memory usage.

[#nppdonow]: Short term: Actions to take quickly
Log In

Perhaps an issue specific to low NPP memory (and less of an issue with situations involving low paged pool memory), make sure to create a login as son as possible because creating a login session may be a bit difficult now, and may become more difficult later. (So if the issue is currently happening, create the login session, even before reading the rest of this text!)

If the server is remote, then logging into the server will be desired. Try to do this quickly so that a login session is successfully created, before such a login session cannot be successfully created. If one type of login session (such as a Terminal Services session from a remote location) cannot be created, then trying to use another type of login session might work.

The error text that may exist is shown by Microsoft KB Q272568: “User Environment
Windows cannot logon you because the profile cannot be loaded. Contact your network administrator.
DETAIL - Insufficient system resources exist to complete the requested service.”

It might be possible to regain a bit of this type of memory by using RPC to stop non-critical services. If RPC is failing, trying again shortly (within 30 seconds to a few minutes) might work (possibly due to changes occuring on the server). If so, once RPC works, immediately try to identify services on the machine which may be stopped without causing the entire machine to become unresponsive to remote maintenance. CTHaun's archived page about NPP Memory Depletion (causing web pages to stop being served) says, “Since the usual cause of an NPP leak is an outdated driver, try stopping various services one at a time (such as antivirus services or backup software services)”. (Although it is generally not recommended to shut off anti-malware software, keep in mind that this NPP exhaustion may be affecting how well programs work, and so both malware protection software and (happily) malware might be unable to work as designed. Stopping a backup process may cause a failed backup job which may be frowned upon, but if these sort of issues are occurring then the backup job may not end up completing successfully anyway.) Once as much software that may be stopped has been, try again to create a login session. Obviously, if that login session works, there isn't an abundance of free memory in the exhausted pool, so definitely work quickly to perform the following critical steps.

Be the hero: eliminate noticable troubles
Low NPP memory
Helping IIS

First, a note for IIS web servers. Immediate relief might be quickly available. The relief may be (very) temporary unless additional steps are taken to prolong the relief and to prevent additional major malfunctions on the computer. In a nutshell, in the registry key called HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters\ proceed to set a DWORD value called EnableAggressiveMemoryUsage to use the data of 1, and then stop the HTTP service and then restart IIS services. All of this may be done by running the following commands:

REG QUERY HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters /v EnableAggressiveMemoryUsage >> oldcfg.txt
REG DELETE HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters /v EnableAggressiveMemoryUsage
REG ADD HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters /v EnableAggressiveMemoryUsage /t REG_DWORD -d 1
net stop http /y
iisreset /restart

Now, if that worked, realize that a small amount of time may have been obtained. The HTTP.SYS file may be malfunctioning because available memory is below the limit being used. Adjusting this limit may get the web server to start serving web pages again.

This limit may be an artificial limit which may be adjustable. MS KB820129: Http.sys registry settings for ISS says, “By default, the HTTP service stops accepting connections when less than 20 megabytes (MB) of non-paged pool memory is available.” CTHaun's archived page about NPP Memory Depletion (causing web pages to stop being served) says, “Sometimes connections get refused when it appears like there is more than 20MB of NPP left.  So it may be more like 20% of the total NPP memory rather than 20MB.  Perhaps when NPP has become 80% depletion, http.sys will begin to refuse client connections.” (Microsoft KB 945977 refers to 30MB.)

However, according to the documentation that suggest 20MB is the limit, the limit was adjusted so things start breaking when there is under 8MB free. (This amount, the new limit of 8MB instead of 20, is documented by Microsoft KB820129: Http.sys registry settings for ISS and by Microsoft KB 934878: Users receive a “The page cannot be displayed” error message, and “Connections_refused” entries are logged in the Httperr.log file on a server that is running Windows Server 2003, Exchange 2003, and IIS 6.0.) If problems occurred on a system with 256 MB of NPP memory (which is a common value for some operating systems) and things broke because NPP memory has been getting used up rapidly, so 236MB is being used up, adjusting the limit so that things break only after 244MB is used up is... temporary relief, at best.

Time will still be fairly short, but at least for a small business that uses the web server fairly lightly, this may end the user-noticeable downtime caused by the web server not operating right. So, adjust that limit, but then work quickly so more, substantial problems don't crop up.

Turning off TCPChimney?

Note: This advise came from what appears to be a reliable source, although it may be time consuming and hasn't seemed to help. Unlike much of the rest of this guide, the following advice should be considered to be rather unverified, but potentially a somewhat quick and easy way to resolve a problem (without requiring an immediate server reboot). CTHaun's archived page about NPP Memory Depletion (causing web pages to stop being served) says, “If the server is on SP2 for Windows 2003, it may be a very good idea to try disabling the TCPChimney. This can be done without a reboot per kb 945977. Try disabling the TCPChimney, run an IISRESET, wait a few seconds, and test” (for improvement). “If this ‘trick’ works, perhaps your NIC drivers need to be updated and the TCPChimney can be enabled later.”

To try this, perhaps the following might work? First, see if the current settings may be adjusted:

netsh int ip show /?
netsh int ip set /?

Then, Microsoft KB 945977 would suggest running:

netsh int ip set chimney DISABLED

Finally, try restarting the Internet services and follow the rest of the advice from CTHaun's archived page about NPP Memory Depletion (causing web pages to stop being served).

Set EnableTCPChimney value to 0. MS KB 912222 documents this should be a DWORD type).

REG ADD HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters /v EnableTCPChimney /t REG_DWORD /d 0
Other causes

Other than the above scenario(s), unlike scenarios involving exhausted Page Pool Memory, there may not be a lot of ways to really get temporary relief of an exhaustion of Non-Paged Pool Memory. The best bet may be to just take care of the issue. MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” says, “NonPaged pool size is not configurable other than the /3GB boot.ini switch which lowers NonPaged Pool’s maximum.”

Helping with low Paged Pool Memory

MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” says, “Paged Pool size is often able to be raised to around its maximum manually via the PagedPoolSize registry setting”. Microsoft KB 312362 provides, for 32-bit Windows Server 2003, an option as well as a warning about a danger to this approach in the rare case of having 32-bit Windows Server 2003 on a system with 64 GB of RAM: Misuse may result in continuous reboots. Microsoft KB 304101 provides similar details and also more details about older operating systems.

Microsoft KB Q312362 may have info, including a danger to this approach in the rare case of having 32-bit Windows Server 2003 on a system with 64 GB of RAM.

Get details while they are available

Generally, it is best to try to gather information before rebooting the server. If services are not operational, there may be substantial pressure to get things operational quickly (even if that means rebooting the server). Although rebooting does tend to make things temporarily operate again, the problems are typically caused by situations that will repeat themselves (possibly hours, or perhaps months) down the road.

Prevent re-occurrences: The next step is generally to try to gather information to identify what is causing the leak, so that the issue may be effectively resolved in the future.

While gathering the information, don't be afraid to consider what is being seen. If a program is clearly causing problems, restarting just that program may help things. However, if no problems are readily apparent, analyzing the gathered information might be able to be done at a later time. At least gather the information so that it is available for manual review (possibly after things are working). Here are some of the details that can really help, so start storing these details into files (which may be reviewed later if needed):

Programs using large amounts of (Non-)Paged Pool Memory

Determine what programs are using up large amount of pool memory. Solutions may include:

WMIC

Perhaps:

WMIC Process Get Caption,CommandLine,Description,Handle,HandleCount,ParentProcessId,PeakWorkingSetSize,ProcessId,QuotaNonPagedPoolUsage,QuotaPagedPoolUsage,QuotaPeakNonPagedPoolUsage,QuotaPeakPagedPoolUsage,ThreadCount,VirtualSize,WorkingSetSize
WMIC server get PoolNonpagedBytes,PoolNonpagedFailures,PoolNonpagedPeak,PoolPagedBytes,PoolPagedFailures,PoolPagedPeak
TaskMgr
On the Processes tab, see if there is a column for the desired type(s) of memory, such as &ldqu;NP Pool” (nor NPP) and “Paged Pool” (for Paged Pool Memory). If not, select View, Select Columns...”, and then find the related options: making the “NP Pool” column appear is done with the checkbox labelled “Memory - Non-paged Pool”. Similarly, the “Paged Pool” column has a Memory prefix by the checkbox (which is labelled “Memory - Paged Pool”). (While on this screen, also add columns for the following if they are available (using these exact names or anything similar): “PID (Process Identifier”), Handles, Threads, and “Command Line”.
User mode dump heap (Umdh)

(This information has not been heavily tested by this guide, but is being provided for reference.)

Microsoft KB 268343: (Umdhtools.exe: How to use Umdh.exe to find memory leaks

Further info to review: TechNet Blog by YongRheeNC: How to troubleshoot the use of paged pool kernel memory (event id 2020) in Windows Server 2003, ...
Other resources
Handles

In addition to checking for high NPP usage, also record how many “Handles” are used by each process.

Services and PIDs
While details are being gathered, go ahead and save a copy of information about which services in Microsoft Windows are using each PID for svchost.exe. This may be less likely to be useful, but in case the information may be useful, record the information now since the information may not be so easily available later. To do so, run:

TaskList.exe /SVC >> pidsvchs.txt
find /V "N/A" "pidsvchs.txt"
Noting the Pool Tags being used
[#getpltag]: Finding that Pool Tags are used

First, “Tag Mode” needs to be enabled.

One way to get this information is to use Poolmon.

Identifying which memory tags are using up Non-Paged Pool Memory by using Poolmon

Poolmon may be a great resource if it is available, as it will not only show the memory use but also the memory tags. (It does not directly show the filenames, though, so using other tools may gather information that is more quickly useful.) However, it often is not installed by default.

Poolmon's Requirements

Poolmon Remarks says “To see the entire Poolmon display, the command window screen must be at least 80 characters wide (width=80) and at least 53 rows high (height=53); and the command window screen buffer must be at least 500 characters wide (width=500) and at least 2000 rows high (height=2000). Otherwise, the display might be truncated.”

Poolmon Requirements lists poolmon.exe, msdis130.dll, msvcp70.dll, msvcr70.dll, and pooltag.txt.

However, the pooltag.txt is likely optional. On a Vista X64 machine, simply extracting the pooltag.exe file (which needed to be renamed) resulted in a perfectly working file.

Obtaining PoolMon

If PoolMon is not included in the operating system, obtain it.

  • One way to get it may be to obtain Support Tools for an operating system. For example, Download Details page (requiring verification for downloading) for Win XP SP2 Support Tools (including Poolmon).

  • Win2K/XP/2003: See \Support\Tools folder on CD-ROM. Win NT 4.0: See the resource kit.

  • WDK Kit info lists a WDKPath\tools\amd64\poolmon.exe and forum posting does say an x64 version comes with WDK. (download page says “This release of the Windows Driver Kit is available only as a DVD ISO image.” Rather than use that download link, check the above version because this software might be updated frequently. However, note that if this release is only available as a DVD ISO image, chances are other versions may also have that.

    If the ISO image is obtained, take a look at the \WDK\generaltools_*fre*.* files.

    When installing (using KitSetup.exe), choose “Tools” underneath the “Full Development Environment” section. Then perhaps look under C:\WinDDK\*\Tools\Other\.

  • Older releases are available from Debugging Tools for Windows 64-bit Version (which now redirects to MSDN: Debugging Tools for Windows 64-bit Version). Be sure to get the Native x64 release if that is what is desired. However, note that version 6.11.1.404.msi does not seem to have Poolmon.)
Using PoolMon's Command Line parameters

Use:

This may be nice because then it won't require scrolling. A text file of output may be searched without the screen updating regularly.

poolmon -?
poolmon -b -n poolsnap.log

The parameter after -n may be optional and defaults to the filename poolsnap.log. If the file already exists, early tests indicate that the file time gets updated but the contents don't seem to be overwritten.

Actually, superior results (meaning results that provide more useful information) are given by using -g pooltag.txt but only if the specified file is available. Also, superior results may be obtained on 32-bit operating systems by using -g pooltag.txt.

To show only non-paged pool memory, either use:

poolmon -b -n poolsnap.log
find /v " Paged"

... or use ...

poolmon -b -p -p -n npp.log

To show only paged pool memory, either use:

poolmon -b -n poolsnap.log
find /v " Paged"

... or use ...

poolmon -b -p -n pagdpool.log

Although the latter options are more precise and may be better for automation, the earlier method may be nicer for creating a text file that can be more useful later.

Technet: Windows Server 2003 Technical Reference: “Poolmon Remarks: Core Services” says, “You can add the /n parameter to a command to save a snapshot of the Poolmon output to a file. Because the data saved to the file is static, the columns that show the change in values between updates are not included in the file.”

TechNet: Windows Server 2003 Technical Reference: “Poolmon Examples: Core Services” says, “While poolmon is running, you can use the parameters in the running syntax to change the display.” Speculation: Based on Example 3 where it indicates that pressing P twice will show Paged Pool, and then B to show Bytes, but earlier text shows /p will use Non-Paged bytes, perhaps using /n /p /p /b will show both (Non-Paged and Paged) sorted by bytes of memory?

MSDN: Windows Driver Kit: PoolMon examples may show similar/identical examples.

Using PoolMon interactively

Microsoft KB Q177415: Using Poolmon lists supported keystrokes (M, T, E, A, F, S, E, and the following), although the most commonly used ones may be these: Press P to sort between Paged, Non-Paged, or mixed. (Press P again to change the sorting method again.) This way NonP (NPP) memory may be easily seen. Then press B to sort by the Bytes column. After copying any desired data (and pasting into a notepad window), press P as needed to repeat the process (by then pressing B and copying desired data) for paged pool memory. Finally, press Q to quit.

A nice thing about an interactive use is that the software will update its display and highlight memory tags that have changed information. However, the updates may continue to happen, even if trying to use the mouse to copy and paste the text of the display. That may be annoying. MSDN: PoolMon Display says, “PoolMon updates its display every five seconds. You cannot change the update rate.”

MSDN: PoolMon Display may have informationabout the columns shown.

What's this?

One of the potentially most useful parts of Poolmon is to show the Pool Tags being used.

MS KB Q177415: How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks (previously at http://support.microsoft.com/support/kb/articles/Q177/4/15.ASP)

Dump file

Post-mortem debugging generally isn't as preferred, but a dump file could allow that. If a dump file does exist, be sure that data is sufficiently saved. (This may be overly paranoid, but it might be sensible to copy the file so that it is stored in a location that doesn't automatically handle dump files.) (Manually creating a dump file/crash report might not be convenient/worthwhile to generate. Further investigation may be needed to determine if such a process may often be able to generate details if other methods don't reveal the actual causes.)

Help in the future
[#pltagmod]: “Tag Mode”

Having this be enabled can help with debugging issues with pool memory. The good news for users of Windows Server 2003, Vista, and later versions of Microsoft Windows is that this step can be skipped. WDK info on Poolmon says “On Windows Server 2003 and later versions of Windows, pool tagging is permanently enabled.” (Similarly, Q177415: Using Poolmon notes about the Gflags.exe utility: “Because pool tagging is permanently enabled in Windows Server 2003, the Enable Pool Tagging check box in the Global Flags dialog box is dimmed and commands to enable or disable pool tagging fail.”)

For (Windows XP and any) other versions of Windows earlier than Windows Server 2003, go ahead and enable pool tagging in case that helps things down the road. However, the news is less pleasant: the bad news for users of older operating systems is that if Tag mode isn't enabled, then it needs to be enabled and part of that process will require a reboot. That reboot will probably also temporarily fix the memory issue, and so further research will not be available until the issue re-occurs after the reboot.

This process may be referred to by documentation as enabling “Tag Mode” or “pool tagging” (e.g. TechNet: Windows Server 2003 Technical Reference: “Poolmon Examples: Core Services”).

As noted by Q177415: Using Poolmon, there are multiple ways to get Pool Tag Mode enabled. One is to set a registry key in “HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager”. Record the value of “GlobalFlag” (as instructed by Q177415) so this may be reversed later. Then change it to 0x00000400 hexadecimal. (It is not clear why “Microsoft KB Q177415: How to use Memory Pool Monitor (Poolmon.exe) to troubleshoot kernel mode memory leaks” says “It is important to add all of the leading zeros or some of the Poolmon information will not display on the output screen.” That advice indicates that the value entered should be 0x00000400, not 0x400, even though the output may then change to say 0x400.) Once the registry value has been modifeid, the change doesn't take effect until the computer is restarted.

Another method, as shown by Technet: Windows Server 2003 Technical Reference: “Poolmon Remarks: Core Services” may be to run GFlags with command line parameters:

gflags /r +ptg

After doing that, a restart of the computer will be needed before it takes full effect.

The other method is to check a checkbox with the “Global Flags Editor” (Gflags.exe) from the \Support\Tools folder of Win2K, XP, and Svr 2003 CD-ROMs and from the NT 4.0 Resoruce Kit. There is a “Enable Pool Tagging” checkbox. After making the change in this program and then seleting “Apply” and “OK”, the change doesn't take full effect until the computer is restarted.

Judgement call: Intentionally making things stop working, to prevent further troubles and/or to provide further temporarily relief

A common process that may use substantial NPP is an actively running backup job. If there is either a backup job running on the machine having problems, or if there is a backup job running on another machine which is getting information from a machine with problems, consider whether aborting the backup job may be a good step to take. (Note that even if the backup job was allowed to continue running, the backup might not successfully complete anyway.)

At this point, there may be more of a judgement call to make. If things are still not working and this is negatively affecting people, it may be best to just reboot the server so things seem to start working again fairly soon. Trying to identify the cause of the problem may take some time. However, if things are operational, perhaps because the only noticeable problem was that HTTP.SYS wasn't working, then it may be desirable to try to find and address the issue. In some cases, huge temporarily improvements to the situation might be achievable just by identifying which software is using the memory, and then restarting that specific software. The process of rebooting the server soon, which may take longer (and possibly be more disruptive for other reasons as well) may be avoidable. Accomplishing that would probably be the most desirable method, as it causes the least amount of further downtime. However, the risk is that the system might be able to be clealy restarted with relative ease, but the system might soon lose that ability. If a lack of sufficient NPP memory exists, the problem is probably going to be getting worse until something is done to resolve the problem.

Resolving the issue in the long term
Having enough information

Having information about what was/is using non-paged pool memory may be useful. So, if the problem is still existing, perform the recommended steps to respond to low NPP, including gathering information.

If the information is available, possibly from logs created during the time or possibly by interacting with a not-yet-crashed system, then identify what was/is using up the non-paged pool memory. If this information is no longer available, possibly because the system was restarted (perhaps due to ignorance on how to better handle the situation, or possibly because the system really wasn't stable enough to be able to gather information better), then the best bet is to gather information. What will be most useful is information about what is using up non-paged pool memory after some time. Meanwhile, if the system was recently rebooted, it may be helpful to record some values when the system is working well after it was rebooted. Then follow up to try to gather information at a later time and see how things have changed. Ideally that information gathering will occur after more NPP gets used up again, but before so much NPP gets used up that the system becomes unstable (causing some additional problems, and perhaps so many problems that the information is again unable to be gathered). The ideal amount of time may be hours or months. The process for gathering this information should be in the section decribing recommended steps to respond to low NPP.

If a dump file was saved, perhaps post-mortem debugging may be usable: (Need review/clean-up: Review this.) MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” mentions post-mortem debugging info. MSDN: Pool Tag Driver Tips for Pool Memory Allocations mentions debugger extensions of !pool and !poolfind and !poolused as well as “!verifier 0x3”. The page also notes that “Debugging Tools for Windows” (information at Download and Install Debugging Tools for Windows, which has information about the software other than just how to download and install it) has information about !pool and !poolfind and !poolused and “!verifier”.

Warning about another problem

The severe problems on 32-bit machines that are lacking NPP are often seen most frequently on servers that get rebooted less than once a month. Servers that do get restarted once a month, which may be routinely done after installing operating system updates, are noticeably less likely to encounter problems. As a generalization, servers should be using recent operating system versions and should have operating system updated fully applied regularly so that the number of known security vulnerabilities is likely to be minimized. The process of fully applying these updates typically involves rebooting a server. Since the operating system updates for Microsoft Windows are typically released at least once a month, servers that have the severe problems on 32-bit machines might not properly be getting updated regularly. This won't always be the case: a regularly rebooted machine may also encounter these issues, but as a generalization, experiencing problems from an NPP shortage may be cause to check whether the system is regularly updated properly.

(For some documentation about a restart being needed, Microsoft KB 912376 states, “If you do not use the /3GB switch, it is likely that Exchange Server services will have to be restarted periodically to defragment virtual memory. Trading off paged pool kernel memory for additional application memory is a worthwhile tradeoff. However, this tradeoff means that you must monitor the use of paged pool memory more closely. For more information about memory tuning for Exchange Server,” see KB 815372: How to optimize memory usage in Exchange Server 2003.)

Identifying the software

The basic process is to start by identifying what software is using up too much memory.

Simply seeing a process using a large amount of non-paged pool memory may be a sign that it is using memory heavily. Very often, the process is not that simple. For instance, if software is using up many Handles, the software causing the issue may be causing a part of Windows to be using up quite a bit of NPP. If there isn't a piece of software clearly identifying itself (e.g. in a Task Manager process) as using up a lot of NPP, consider some of the other known culprits (see if anything is using high Handles or Threads), or resort to using pool tags.

Using pool tags
Applicability

This process might not work with absolutely all software, but it is likely to work with most software: “MSDN: Six Tips for Efficient Memory Use” says, “Drivers should use the tagged versions of the pool allocation routines instead of the nontagged versions, which are obsolete. WinDbg and numerous testing tools use the tags to track memory allocation. Tagging pool allocations can help you more easily find memory-related bugs.” Archived White Paper by Microsoft: “Low Pool Memory and Windows XP discusses some other software development considerations. So, hoepfully the newer recommendations are being followed by coders. For other software, MSDN: PoolMon Display says, “If the driver does not assign a tag value” ... “Windows still creates a tag, but it assigns the default tag value None. As a result, you cannot distinguish the statistics for that driver's allocations from that of other pool allocations” (by software that isn't using a tag.)

Since most software supports this, plan to use this method if it will probably help. (However, realize there may be some possibility that some software might not be using this tagging.)

Getting the used pool tags
This is discussed elsewhere, in the section about Finding that Pool Tags are used.
Finding out which software uses a pool tag

The basic process may be to find out what file, or driver, uses a Pool Tag. (From there, the process is to figure out what software uses that file or driver.) There are multiple avialable methods to find out what file, or driver, uses a Pool Tag.

Localtag.txt
MSDN: Pool Tag Driver Tips for Pool Memory Allocations (and MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019”) states, “For 32-bit versions of Windows, use poolmon /c to create a local tag file that lists each tag value assigned by drivers on the local machine (%SystemRoot%\System32\Drivers\*.sys). The default name of this file is Localtag.txt.”

Creating and using the localtag.txt
Creating the file

Poolmon Requirements says “The /c parameter, which creates a localtag.txt file of pool tags used by drivers on the local machine, is supported only on 32-bit versions of Windows.” Poolmon Examples says about an example with the /c parameter: “If you do not specify a local tag file and Poolmon cannot find a Localtag.txt file on a 32-bit system, it creates one”, however, the page goes on to say “(Poolmon cannot generate a local tag file on 64-bit systems.)” Poolmon Syntax says “Poolmon cannot generate a Localtag.txt file on 64-bit versions of Windows Server 2003. As a result, the /c parameter and its functionality are available only on 32-vit versions of Windows.” (That is likely a typo, and so the end of that quoted part of Microsoft's documentation should have said “32-bit versions of Windows.” The error was in the originally quoted text.) The -c and -z options simply don't exist when running the X64 version of Poolmon.exe.

Poolmon Syntax page shows that /c “scans the drivers on the local computer (C:\Windows\System32\Drivers\*.sys) and generates a Localtag.txt file.”

Poolmon examples says “The right-most column in the display, Mapped_Driver, shows that the memory was allocated by Ntfs.sys, the driver for the NTFS file system. In this case, the display is even more specific, because Pooltag.txt includes the source files for Ntfs allocations.” The example shows that ntfs.sys's *.c source code files are individually identified by the Pool Tag.

Using the file

If the file already exists, it may be used with the -c localtag.txt parameter.

Pooltag.txt

Found after installing the software from Debugging Tools for Windows 64-bit Version (in the Traige\ subdirectory of where this is installed to).

MSDN page about Poolmon says this is “a file installed with PoolMon and with the Debugging Tools for Windows packages. Occasionally, Microsoft updates this file. To check for updates, go to the Microsoft support website and search for "pooltag.txt."”

Perhaps less useful, but perhaps more convenient, Archived web page with data from pooltag.txt.

Finding the tag

Microsoft KB Q298102: How to find pool tags that are used by third-party drivers mentions methods of looking for the tag inside all *.sys files that exist in certain directories. (Directions on how to do this are provided in this text after describing more details on what to look for, and where.) The search should be able to be case-sensitive. “If you receive multiple files, try to reduce the amount of files returns by adding the letter "h" to the tag before you run the search. This is mainly useful when the tag is comprised of three letters.” Specifically, from the examples shown, the letter “h” is prepended to the tag. Examples shown are hTCPt and hCPnp.

The directories to search include %SYSTEMROOT%\drivers (as the most preferred directory to start looking in), followed by the %SYSTEMROOT% and %ProgramFiles% and %SystemDrive% and %ProgramData% directories.

The way to search multiple directories, using the command shown by Microsoft KB Q298102: How to find pool tags that are used by third-party drivers, is to use:

cd %SYSTEMROOT%\drivers
findstr /m /l TagT *.sys

In that example, TagT represents the tag text to search for. If that example doesn't work, change the part after the word cd to use another one of the recommended directories to search in.

A method using a graphical interface may also be available. In Windows 2000, use the Start button and locate a “Searchrdquo; option. Search “For Files or Folders”. “In the Search for files or folders named box, type *.sys”, and then locate a box labelled “Containing text”. In that box, “type the pool tag you want to search for.” Then in a box called “Look in”, “type the path to the system root drivers”. In Windows 2000, this may be called C:\WINNT\system32\drivers while other versions of Windows may have other directory names. When the needed data is filled out, “click Search Now.” If this does not show any results, try some of the other directories mentioned above. (Using the graphical interface, one (probably?) may need to manually expand the variables to point to directory names.)

DriverVerifier

This software probably isn't the first choice for many users, because it may not be pre-installed. MSDN: Pool Tag Driver Tips for Pool Memory Allocations notes “Driver Verifier is provided with Windows and documented in the Windows DDK.” MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” mentions some approaches, including using DriverVerifier to help with pool tag identification. Microsoft KB Q244617: Using Driver Verifier to identify issues with Windows drivers for advanced users.

TechNet Blog by Mike Lagase: How to monitor and troubleshoot the use of Nonpaged pool memory in Exchange Server 2003 or in Exchange 2000 Server may have details (e.g. pool tags).

Known specific (misleading) examples
Services

If the offending software appears to be svchost.exe then clarify further. (See the results of Matching services to a PID (in Microsoft Windows).)

Threads

If many handles are used, a part of the operating system called “Thread objects” (according to pooltag.txt, as noted by MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019”), which is identified using the memory “pool tag” of “Thre”. If that seems to be what is using up memory, check for high amount of used Handles. (It may also make sense to check for software using a very large amount of threads.) Perhaps seventy thousand (70,000) might be a reasonable threshhold to determine if software is using too many handles. Normally processes use many less handles than that (3-4 digits), although an offending program may use many, many more than that (possibly millions?). There is nothing extremely magical about the number 70,000: MSDN Blog: Tate's article about “Understanding Pool Consumption and Event ID: 2020 or 2019” says, “typically we should raise an eyebrow at anything over 5,000 or so. Now that?s not to say that over this amount is inherently bad, just know that there is no free lunch and that a handle to something usually means that on the other end there is a corresponding object stored in NonPaged or Paged Pool which takes up memory.” (However, the number 5,000 may actually be approached by some software which isn't too terribly uncontrolled.)

Mmst

If paged pool memory is low and the related pool tag is MmSt, the Memory Manager may be causing the issue because of a large number of files being opened. See Microsoft KB 312362 (and, for additional reading, the similar Microsoft KB 304101). TechNet Blog: “Ask the Performance Team” blog called “Network Stored PST files ... don't do it!” says about Mmst, “This tag represents Mm section object prototype PTEs - a memory management-related structure used for mapped files.  Put a different way, this is the pool tag that is used to map the OS memory used to track shared files.  MmSt issues often manifest as Paged Pool depletion (Event ID 2020).”

Further information might be available at: TechNet Blog: “Ask the Performance Team” blog entry about the MmSt Pool Tag.

LSwn

TechNet Blog: “Ask the Performance Team” blog called “Network Stored PST files ... don't do it!” mentions this tag.

Responding

See if there are updates to the software. One possible way to resolve the issue is to remove/replace the software that is causing problems, and the most preferable way to do that may be simply updating the software so that the old problematic version is replaced by a newer version of the software. CTHaun's archived page about NPP Memory Depletion (causing web pages to stop being served) states, “Root cause often points to an outdated third-party driver which, when updated or uninstalled, solves the NPP leak.” (On a side note, which might be a little unfair: at least older versions of Symantec products have been known to cause this: Microsoft KB Q272568 mentions this for Symantec's Norton AntiVirus Corporate Edition versions 7.00 - 7.03 and 8.0. BackupExec seemed to cause this as well.)

If there isn't an update to address the problem, the most ideal situation would be that an update becomes available to address the problem. One option on how to proceed may be to contact the software vendor and see if it is possible to work with a software developer to resolve the issue. This may take more time for an end user than some other resolutions, but it may also help other users of the software, so this option may be altruistically preferred. (See the section about providing information to a developer.)

Check the \BOOT.INI file for a /3GB switch. If one occurs, find out why, and if it is necessary. If it is not, perhaps just remove that switch. The section about determining if NPP is too low discusses the impact that switch has on NPP.

Another option involves restarting that software (which can be done by restarting the entire server).

Upgrading to a 64-bit operating system (perhaps as part of a process of replacing a 32-bit CPU with a 64-bit CPU) may make the problem become insignificant. Software that is using up too much NPP may still have the problem that memory is not being handled correctly, but the impact may become negligible.

Another way to respond may be to move the software to another machine. (Moving the software to another machine may fix the issue. More likely, doing so will simply move the problem to another computer. However, that may be preferable so that the problems don't keep affecting a computer that also performs other important functions.) This may or may not be feasible depending on what the software is. There may be some advantages to running software on a specific machine.

Another option may be to replace the software (with a competing software solution), or if the software is no longer needed, to just stop using the current software.

TechNet: Exchange Server: Improve Kernel Memory discusses kernel memory, and says, “One consideration for freeing up server resources is to look at the user mailbox load on the server. Consider moving user mailboxes to another server to reduce server load and see if the problem stops.” (TechNet: Exchange Server: Kernel memory depletion may also have information and/or hyperlinks to other resources.)

[#npphlpdv]: Providing information to a developer

See the section about Troubleshooting memory leaks in Microsoft Windows. Specifically, some information about proper handling of Pool memory is provided by MSDN: Pool Tag Driver Tips for Pool Memory Allocations. Half of the memory-related tips from “MSDN: Six Tips for Efficient Memory Use” relate to using pool memory.

Here is some other documentation that may help some developers: PREfast looks to be a development tool to catch issues that compilers may not catch. TechNet blog: “Ask the Performance Team” blog about “Using Special Pool to find out who is allocating a Pool Tag” provides “some very advanced troubleshooting steps” that may help in extreme cases.