BugCheck/STOP/BSODs

WDK, WDK info (WDK Visual Studio 8.5 info?)
System restarts

There are fewer reasons for BugCheck codes to be shown than for a system to unexpected restart. Often a BugCheck code will result in a system being largely unresponsive.

There can be multiple causes for unwanted System restarts.

Microsoft Windows
If the system does boot back up successfully...

The following info was written using Windows Vista. Variations may be appropriate for other operating systems.

Gather info, before it is lost!

This text may sound borderline paranoid. That is simply because the author of this text doesn't like having useful details getting deleted, especially when the deletion is a side effect of actions where deleting the useful data is not an obvious repercussion.

Maybe you might start to think that this is a bit unnecessary, because information doesn't get deleted quite as quickly as this text indicates. However, I've found that the point when the useful data gets deleted can be inconsistent. (Perhaps the difference was a matter of different operating systems.) The safest route is to try to get useful details copied right away, as described by this guide.

The system may show some error messages before the user is logged in. Often these are recorded in “Application Popup” entries in the System OS log, and often there may be more than one, identical, popup. However, these generalizations may not be able to be guaranteed.

The first thing one may need to do is to log in. Depending on the system, the system might prompt the user to provide a reason for the restart. Of course, a detailed and accurate description may not be readily available if the system hasn't yet been logged in by a technician who can help identify the details. However, before entering some generic details, record any information that the screen provides. In a remote access session, this might be as easy as creating a screenshot (and pasting the screenshot into a local copy of Paintbrush so that the information is stored somewhere other than a clipboard and other than on a machine which might reboot before the data is processed.

Once logged in, the first thing to do is to gather what information can be obtained before that information gets cleaned up. There are some specific details about the order to gather information, because some information may be lost.

First, if information shows up in a balloon, try to capture that information. It may disappear once the balloon disappears on its own, or when another balloon appears. Many times, the text in a balloon will appear in an OS log, although not always. Jotting down the information in a balloon is not always feasible as it may disappear before the entire note is jotted down: A way to deal with that is to press print screen and dump the contents of the message into a Paint window. (This is especially likely to be good if using a remote session and if the Paint window being used is on a remote system.) If a second balloon message of interest appears, and the first window has the balloon text pasted, then place the new message in the clipboard. Then either paste it into a new Paint window, or use it in the same paint window (and count on Undo to restore the first message when things are less hectic). Beyond capturing the text, though, sometimes it is useful to click on a balloon message. Whether this is wise to do immediately or later is something that may not have a single, universal, consistently correct answer, so judgement calls may need to be made.

One approach: vistrbt1.png This might be WerFault.exe, described as “Windows Problem Reporting” in Task Manager (or perhaps Control Panel). (Possibly see: Windows Components.)

Naturally, a recommended step is going to be to get more info. Choosing to show more details will provide... some more details.

vistrbt2.png vistrbt3.png vistrbt4.png

If the system is probably going to be working fairly well, then do not close Windows Error Reporting yet. There are some things to do first.

Make a copy of all of the files that are shown by Windows Problem Reporting. Where they are copied to does not matter much, and chances are that the files won't be used. However, chances are even better that they will be deleted shortly, so copy them first. In the above example, a username of Sysop was used. So, copy the %TEMP%\ (a.k.a. %LOCALAPPDATA%\Temp\ a.k.a. %USERPROFILE%\AppData\Local\Temp\ a.k.a. C:\Users\%USERNAME%\AppData\Local\Temp, it isn't clear which of these paths is used ) WER*.*.

Carefully check that the files were successfully copied. (UAC may prevent the *.xml file from being copied using some copy methods.)

Also, if there is sufficient disk space, copy C:\Windows\Minidump\*.* (If checking from the command line, and if that folder looks empty, be sure to check via a GUI in case UAC is hiding the files.)

Still, just ignore the Windows Error Reporting window: do not close it yet.

Next, check the system logs. (Info on viewing those logs: Viewing the Windows Event Logs.) Identify the most recent time that an EventLog entry indicates that the system was in the process of booting up. (In other words, look for entries in the event log that have “EventLog” as the “Source”.) Then, for the time being, ignore all older log entries: only look at the newer log entries. Look for an entry from the Source BugCheck, Event ID 1001, Task Category None). There might be other resources to check (such as SaveDump perhaps?. If no such log entry exists, then go ahead and look for such an entry before the last reboot: If the bugcheck entry exists before the reboot, that suggests that the system rebooted again after the first reboot.

Now, for each such log entry, if there are any files referenced, copy those files to a safe location (unlikely to be touched by automation).

The BugCheck entry is likely to show five numbers: the “BugCheck Code”, and the four “BugCheck Parameters”. Chances are that the BugCheck entry will include a lot of the same information as the Windows Error Reporting program.

Now, compare those five numbers from the Windows Error Reporting program as well as numbers shown in the prompt that shows when logging into the system. If they are identical, and if the logs are likely to remain in tact, then there is probably little reason to need to copy that data.

Copy any other data which Windows Error Reporting might be showing which might be needed to effectively troubleshoot the issue.

Only at this time is it considered safe to allow Windows Error Reporting to proceed, by either closing it or clicking the “Check for solution” box. vistrbt5.png

Next, look for any other error messages. If a dialog box with only one button, an OK button, shows up, see if the message also appears in the System menu. If so, the information is captured and the window may be closed.

Gather more info

The first things to check are the visible error messages and logs. They might provide clear indications of what occurred.

In environments where machines are well maintained professionally, there will hopefully be some recorded history of prior incidents. Check those before spending a lot of time troubleshooting, as the results may have been figured out before.

Check scheduled tasks (including tasks by the Task Scheduler (which in Unix is called Cron), tasks using the AT command, and tasks initiated by running services. The most common running service with an associated task may be backup software. If tasks might be initated by a remote connection, such tasks will hopefully be documented: check if there are tasks running at those times.

If info is still not found, perhaps check the official documentation on the BugCheck code. The info might be overly technical to be easily useful, but it is official documentation and will hopefully be applicable and accurate, unlike some search engine results (which might point to a forum with incorrect information on it). This info could (may) and probably/likely will, need the first bugcheck parameter, and maybe another one, in order to fully comprehend the meaning. Often the meaning may be the following: a part of the operating system encountered an error. In such a case, the actual cause could be an issue with the operating system, but often is caused by another source.

Checking for the meaning of Bugcheck codes

Options may include:

Online info

See:

help file from debugging tools

e.g. C:\Program Files (x86)\Debugging Tools for Windows (x86)\debugger.chm

(if the Debugging tools have been installed. Info on that: crash handling.)

Recommended by 16min27sec into Defrag Tools: WinDbg/Bugchecks. “All of the bugcheck codes are defined in the help file for the Windows debugger. This is probably the single best place to see what they all are.”

If documentation for a specific Bug Check Code number isn't being found online, Bug Check Code Reference says more information can be obtained by using “!analyze -show 0x#” (customizing the part after the 0x). That would be done using Debugging software like WinDbg. (See: crash handling.)

Next, check the system logs. Identify the most recent time that an EventLog entry indicates that the system was in the process of booting up. Then, for the time being, ignore all older log entries: only look at the newer log entries. Look for an entry from the Source BugCheck (Event ID 1001, Task Category None). There might be other resources to check (such as SaveDump perhaps?. If no such log entry exists, then go ahead and look for such an entry before the last reboot: If the bugcheck entry exists before the reboot, that suggests that the system rebooted again after the first reboot.

Now, for each such log entry, if there are any files referenced, copy those files to a safe location (unlikely to be touched by automation).

The BugCheck entry is likely to show five numbers: the “BugCheck Code”, and the four “BugCheck Parameters”. Chances are that the BugCheck entry will include a lot of the same information as the Windows Error Reporting program.

Now, compare those five numbers from the Windows Error Reporting program as well as numbers shown in the prompt that shows when logging into the system. If they are identical, and if the logs are likely to remain in tact, then there is probably little reason to need to copy that data.

Copy any other data which Windows Error Reporting might be showing which might be needed to effectively troubleshoot the issue.

Only at this time is it considered safe to allow Windows Error Reporting to proceed, by either closing it or clicking the “Check for solution” box. vistrbt5.png

Next, look for any other error messages. If a dialog box with only one button, an OK button, shows up, see if the message also appears in the System menu. If so, the information is captured and the window may be closed.

Gather more info
The first things to check are the visible error messages and logs. They might provide clear indications of what occurred. In environments where machines are well maintained professionally, there will hopefully be some recorded history of prior incidents. Check those before spending a lot of time troubleshooting, as the results may have been figured out before. Check scheduled tasks (including tasks by the Task Scheduler (which in Unix is called Cron), tasks using the AT command, and tasks initiated by running services. The most common running service with an associated task may be backup software. If tasks might be initated by a remote connection, such tasks will hopefully be documented: check if there are tasks running at those times. If info is still not found, perhaps check the official documentation on the BugCheck code. The info might be overly technical to be easily useful, but it is official documentation and will hopefully be applicable and accurate, unlike some search engine results (which might point to a forum with incorrect information on it). This info could (may) and probably/likely will, need the first bugcheck parameter or two in order to fully comprehend the meaning. Often the meaning may be the following: a part of the operating system encountered an error. In such a case, the actual cause could be an issue with the operating system, but often is caused by another source.