Many errors generate log files and/or error messages, so some of the first sections of this website to check are the sections on log files and error messages.
- System restarts
- [#syspanic]: Operating System Panics
One may wish to enable and configure automatic dumping which may help provide information for later reference. Also, one may wish to log information related to dump file creation which, among other possible ways of being useful, can desirably lead to an alert being created if the system is set up to be reporting events. Beyond these activities which may help trace problems later, immediate actions that may be done are to automatically run debugging software and/or automatically reboot. Details may be on the page about handling operating system panics.
- [#crashndl]: Crash handling
- Gather info
- (See the page about handling crashes to get more details about what types of information to gather in response to software (and/or hardware?) becoming unresponsive.)
- [#crashrep]: Crash Reporting
- [#wnerrrep]: Windows Error Reporting (“WER”) (previously “Online Crash Analysis” (“OCA”))
- Mozilla Crash Reporter, Wikipedia's article on “Crash Reporter” (and, naturally, debugging info)
- Debugging/analyzing crash info
See: crash handling page for a section about “Debugging/analyzing crash info”. That section has information about using software such as ddb or WinDbg.
- Sharing issues
Permission denied because resource in use
- File in use
- TCP port, or UDP port, is in use
- See: what process is using a TCP/UDP port.
- Security permissions
- Resource exhaustion
See very similar content: Permissions/Sharing Issues. e.g. a resource (like a TCP/UDP port number on a specific IP address) may be exhausted by having a single process using that port exclusively, at which point the issue has to do with the process being used and allocated for exclusive use by that one program.
To some extent, all resource exhaustion fits that category: there wouldn't be memory exhaustion if bits weren't reserved for a specific process. However, this section is also about simply not having enough resources, or resources being used excessively (such as an infinitely recursive function that uses additional resources with each recursion).
- Resource limits
Some resource limits may be hardware-induced, such as disk space. Others may be software-induced, such as quotas of how much disk space is used by certain user accounts.
The section on used resources has information about handling some commonly encountered limits.
There may be some other artificial limits that can cause software to not have all of the resources that a computer has. For instance, Unix may have user limits specified by
Here, quickly, are a few pointers to handling some common resource exhaustions.
- What is using up all the disk space?
See: Finding out what is using up disk space and Dealing with having too little available disk space where it is needed. (Some additional information used to be here, but has been moved to the section about finding out what uses up space.)
- What is using all the memory?
- Realize that when an error message relates to running out of memory, the information provided and/or needed may relate to physical memory. Or, the error message may be referring to all memory, including virtual memory.
- What is causing the system to be so slow?
- Checking CPU usage or other things that sap a system of speed
- Unkillable program
- Some text for debugging: Mark Russinov's archived info
- In Unix
- [#zombiest]: A defunct process in a Zombie state
Sometimes it is possible for a process to get in a particular situation where it looks like the process is unusable, and the status of the process is that the process is in what is called a “defunct” state, which is also known as a “zombie state”. The process that is in this state may be called a “defunct process” or a “zombie process”.
Check if the program shows a capital “Z” in the “
” column of
. If so, then this “zombie” process is “dead”. However, the zombie process has not been “reaped”. So, the creator of the dead has not yet reviewed the log of what happened dring the lifespan of the now-perished.
Okay, that was some colorful terminology. Now, time to start explaining it so there's some actually useful technical details being provided. The “dead” process has already had its memory de-allocated. (This means that any memory that was reserved for the program to be able to operate has been un-reserved, and the operating system may be using that memory for another purpose (such as providing the memory to be used by a different process.) Stack Overflow: commentary on a defunct process describes the process of “reaping”: a parent process calls a subroutine such as “wait” (or perhaps “wait3”) and this causes the parent process to read the return/exit code of the now-defunct process.
A process identifier (“PID”) continues to be allocated for the defunct program, but the issue is not with the zombie process which has already completed its task. The reason the process is still in a zombie state, instead of having already been cleaned up, is because the assigned parent task has not yet accepted the return/exit/error code that was generated by the child process. If any interaction is required at this point, the resolution involves interacting with the parent task. The process in a defunct state cannot provide satisfying interaction. (It is already “dead”).
A zombie is not going to be consuming substantial CPU time or memory. In such a state, the zombie has no expected future life. The zombie process only remains for the purpose of being able to participate in the reaping process. After the reaping is completed, the information related to this zombie process can be used to supply the return/exit code to the parent process. Once that is done, the operating system can really start to forget about the zombie process (and can then do things like re-using the PID number.)
So, how to clear up any defunct processes? The first steps are to determine which process is in a zombie state, and what process is acting as the parent of this zombie state. Then, sending a SIGCHLD signal to the parent may resolve this. The CHLD signal may be useful (by using “
” or perhaps “
”). (OpenBSD's “
” listing identifies the CHLD process as meaning “Child exited”.) (If the CHLD signal doesn't work, perhaps sending a SIGCLD signal with
-CLDmay work in some operating system environments?)
It may be useful to figure out if any processes identify the zombie process as a parent. (Source: Stack Overflow comment #356899.) However, addressing the children is said (third paragraph of Stack Overflow comment #629855) to be “unlikely to help unless” a child process is “somehow related to” a specific, “particular bug you are seeing.”
Note that having a small number of Zombie processes is not necessarily a terrible thing. This may just mean that the operating system has given a low priority to the clean-up processes, and that may indicate that task scheduling/multitasking is being done fairly efficiently (giving low priority to a low priority task). (Stack overflow comment 356841 cites them appearing to be harmless.) However, having many zombie processes (sometimes thousands have been sighted on a single machine) may be an indication that a parent process is not properly reaping the children. This can be caused by some poorly written software. (Device drivers have been known to cause this.) Updating the software that causes this may resolve the issue. (The software causing this would be the parent process, not the individual process that is causing the problem.) The number of zombie processes may easily be found (the
command may report this; an operating system's logon (which was probably a default, uncutomized logon) has been known to report this.)
Another task that may potentially help in handling the situation is to figuring out which processes are responsible for running other processes (and which processes are getting run by the parent process). The
program may help to accomplish that. Otherwise, the
command may provide some of this information. Unfortunately, the syntax of the
command varies quite a bit among implementations, so any specific details given may only work in some operating systems. (In OpenBSD, both
-lwill show additional information including PPID data, and using
-kmay show info about kernel threads. In some variations,
-emay be the same thing as
-Awhich shows all processes, and
-fmay show a “full-format listing” while
-Fshows an “extra full format”.)
OpenBSD manual page for
: “Caveats” section notes that the
command may show a status such as “<defunct>” (or perhaps “<exiting>”). That manual page also has some information about the reliability of information reported.
Another possible approach is to see if closing the parent process is feasible, as that may cause the child to become a parent of the process which is called
will then typically reap the process quickly, which will resolve the issue). (This information is largely taken from Wikipedia's article on Zombie processes. For further short reading, see FAQS.org Unix FAQ part 3 section 13: ridding persisting zombie processes.)
(In Microsoft Windows, Sysinternal's Process Explorer may be a convenient graphical method. (See information on Sysinternal's Process Explorer.) Otherwise, using WMIC may also show details about parent PIDs. See the section on viewing what is running.)
The term “zombie state” was one of the terms featured on the fun-looking cover of Andrew S. Tanenbaum's book called “Modern Operating Systems” (“Second Edition”).