Crash Reporting

Setting memory/crash dump settings

See: Enabling dump files.

[#mkcrshrp]: Intentionally creating a “crash report”
Creating data in Microsoft Windows
Creating a dump/minidump file using the keyboard (driver)
MSDN: Forcing a System Crash from the Keyboard (previously at http://msdn.microsoft.com/en-us/library/cc266483.aspx ), Microsoft KB Q244139: Windows feature lets you generate a memory dump file by using the keyboard
Using NMI to make the dump file

For 2K/XP/Sever 2003/Vista/2008: KB 927069 (English version). This guide involves using an NMI switch (or an “Integrated Lights Out” (“iLO”) port, which uses an RJ-45 style jack like an Ethernet port does), and does not go into much detail about how to do that. The main point of the article is to point out how to make crash dump, which involves setting a specific registry entry to a specific value. That may be done by running:

REG QUERY HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl /v NMICrashDump /z >> oldreg.log
REG DELETE HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl /v NMICrashDump
REG ADD HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl /v NMICrashDump /t REG_DWORD /d 1
More options

TechNet blog: capturing application crash dumps mentions UserDump, ADPlus, and more. For instance, in Vista and newer, Task Manager has a “Create Dump File” option for running processes. (On the Processes tab, access a context menu for a running process by right-clicking on the name of the process.)

Getting debug/dump information in OpenBSD

To create dump information, use the command described by OpenBSD's manual page for the savecore command.

If there is a desire to run the debugger, Reporting a problem with OpenBSD (“How to create a problem report” section, item #5) has some details, as does OpenBSD's Man Page for ddb. Try one or more of the following techniques:

  • sudo sysctl ddb.trigger=1

    or use another number. The operating system will go into DDB, then sysctl will show some output showing that it made the change, and then the value will (silently) be set back to zero. Any attempt to set the value may start ddb, even if the value is set to zero (so no real change is made). Attempting to do this remotely may not work: instead an error may show “sysctl: ddb.trigger: value is not available”.

  • If using a serial console, send a BREAK signal. (Perhaps the ddb.console should be set to 1?)
  • Otherwise, be out of X. (If quitting X may be inconvenient, read the upcoming information about checking a sysctl value to see whether quitting X now will even be worthwhile.) Press Ctrl-Alt-Esc.

    Note: This might work, and might launch the debugger on ttyC0, even if another console such as ttyC1 is active. Terminal switching may be disabled while the debugger is running, so the user may not be able to see the debugger. The debugger may then respond to keystrokes, even if the output does not seem to be responsive. Without output being apparently, and perhaps other effects like networking not seeming quite as operational, the system may appear to have locked up. However, pressing Enter (or pressing the c key, in order to run the continue command, and then pressing Enter) might work to leave the debugger without needing a reboot.

If that doesn't place the machine into ddb, check if the “ddb.console” sysctl is already set to a value of one. It probably is not, and instead has a value of zero. That will prevent this from working.

However, remedying this problem isn't as easy as just running sysctl. OpenBSD man page for securelevel notes that, “Because securelevel can be modified with the in-kernel debugger” called ddb, “a convenient means of locking it off (if present) is” effective if the securelevel is set to 1 or higher. This is generally going to be the case (on most kernels) except for when the system is restarting. Therefore, the method to do this is to back up the /etc/sysctl.conf file, and then run:

echo ddb.console=1 | sudo -n tee -a /etc/sysctl.conf

Then reboot.

At least when the problem resulted in the operating system experiencing a “panic”, once in ddb, follow the steps in using ddb to gather details about the crash.

Creating a panic in FreeBSD

FreeBSD Developers Handbook: Kernel Debug Online: On-Line Kernel Debugging Using DDB

Obtaining more useful information in response to a crash
[#obsdcrsh]: Gathering data about crashes in OpenBSD

(This section is hyperlinked to, as part of a broader topic on handling OpenBSD panics.)

Gathering data
Getting information from OpenBSD's debugger

See: using ddb. Especially, always do this if OpenBSD shows a ddb> prompt (unless it is known that the system cannot be interacted with due to a lack of a functional keyboard).

That guide to using ddb has its own separate section about “Gathering data”. Follow that fully.

In addition to the information provided by the debugger, the information in the following sections may be helpful.

X data
Reporting a problem with OpenBSD (“How to create a problem report” section, item #6) suggests include “the full /var/log/Xorg.0.log file”. (Surely this only applies if X Windows was running or if there was any manual or automated attempt to start using X Windows.)
objdump info

The recommendations are: before sending the bug report are to work with the source code, and use objdump. This does NOT need to be done immediately; if the system may be rebooted and placed in a more stable state then doing that first will only make good sense. However, there might (or might not?) be some advantage to doing this work on the system that had an issue (or at least a system using the same platform).

Details are noted by OpenBSD FAQ 2: section on Reporting Bugs (OpenBSD FAQ 2.4)

A requirement for this to work is to have the “Stopped at” message.

This involves:

  • Getting the source. (Information on this is currently in the guide about how to install OpenBSD updates/patches: Getting OpenBSD Source Code)
  • Compiling with debug info
  • Then disassembling that debug info with objdump.

Locate the name of the function that had some difficulty. (If the author of this text is understanding things correctly, the example at OpenBSD FAQ 2: section on Reporting Bugs (OpenBSD FAQ 2.4) is a bit unfortunate of an example since the function name in the first column, which is what we're seeking, matches the location name at the very end (after the word “at”)).

What we use these traces for is to take the information after the word "at", compare it to the results of the "trace" command of each processor, and hope there's only one match. (If not, we may have an ambiguous situation, and need to follow the remaining steps for each case.)

Then we look at the first part of that trace output, where it shows a function name before the "(".

grep for the function name that is listed in the first line of the trace command (_pf_route_ in the example at OpenBSD FAQ 2: section on Reporting Bugs (OpenBSD FAQ 2.4)). In a multi-processor system with multiple traces, refer to the "Stopped at" line. The expectation here is that grep will give an offset of the function (e.g. 00007d88).

Now, the “Stopped at” message showed the name of the function (which in this case was _pf_route_) plus an offset (0x263). Find the relevant disassembly code: add the offset of the function (0x7d88) to the offset of the offending line (0x263) to arrive at 0x7feb. Naturally this involves knowing a way to add hexadecimal.

Then view the disassembly and locate the instruction at the location that was just calculated. That instruction in the disassembly should match the instruction shown in the “Stopped at” message.

Once that instruction is seen in the disassembly, scroll back up (probably a small number of lines) until the disassembly references the file path and source code line that generated that assembled code.

That shows the source code line that generated the code that produced the crash. Note that line might not be the true cause of the issue. For example, if a line of C source code looks something like “x=y/z;”, and if that line seems to be crashes because of a divide-by-zero error, the problem may be that z was incorrectly set to zero at a previous location. So identifying the line that seems to cause a problem isn't necessarily the end of the whole debugging process, but this process may help a programmer to know what line of source code generated the assembled code that triggered the crash. That may be a starting point for the programmer to review things and find out what led to the problem so that the crash could later trigger.

Helping whoever debugs

If reporting bugs, OpenBSD FAQ 2: section on Reporting Bugs says, “If you provide both the ddb> trace output and the relevant objdump section, that's very helpful.” The relevant objdump section shows the source code file and the line number, as well as the machine code. It would seem sensible that something else that would be convenient for people would be to also include the relevant source code. (One reason that may be helpful is because other people may not conveniently have the relevant version of the relevant source code file.) Surely there is a way to cut out just that line of source code: perhaps using head line_number | tail -1 (head would show the first line_number amount of lines, tail would show only the last line of output of head). Or using awk.

Once all this information is gathered, the important first steps of being a useful problem reporter has been done. The next step to helping to get this problem resolved most fully may be to review OpenBSD FAQ 2: section on Reporting Bugs. The remaining steps of being a hero include being able and willing to re-create the problem (preferably on a test/debug environment) and performing requests made by anyone helping to troubleshoot the problem. (Of course, any such requests should be considered before being acted on, especially to make sure that no problems like a security breach or important downtime will result.) Discovering and implementing a solution, especially if it is complex (possibly due to requiring great programming skill), is also helpful, and especially heroic when the results are shared and end up helping other people.

[#wnerrrep]: Windows Error Reporting (“WER”) (previously “Online Crash Analysis” (“OCA”))
Used by Microsoft Windows. May have been used with some versions of MS Office?
Other crash reporting systems
Mozilla Crash Reporter, Wikipedia's article on “Crash Reporter” (and, naturally, debugging info)
ddb
[#ddbgetez]: Information is currently in the section about ddb: details on gathering information during a sysem panic.