Troubleshooting info

Scope

This section's initial primary purpose was to try to focus on processes to determine what a problem is (and then provide a pointer to subsections, or different sections of documentation, possibly documentation from third party resources, which end up explaining how to effectively deal with a problem). However, solutions to problems ended up creeping into this section, so this page does also have/reference information/resources about how to resolve problems that may exist.

There's more details than what may be immediately evident just by looking at this first main page. For example, in the subsection about errors is a crash handling section that provides some details about how to use dedicated “debugging” software (even when a program's source code is not available).

Art

The art of troubleshooting can involve performing some very specific technical processes. However, there is a certain aspect to the process that involves making judgement calls.

OpenBSD FAQ: Reporting bugs says, “Before crying "Bug!", please make sure that is really what you are dealing with.” “If this is your first” time obtaining any “experience” with the software being used, then “be realistic: you probably did not discover an unknown bug. Also note that faulty hardware can mimic a software bug”.

Troubleshooters (meaning: people who troubleshoot) are often happy to find what appears to be the cause of a problem. However, any apparent diagnosis should be assessed/analyzed to consider whether an apparent problem is actually a reasonable culprit of being the initial cause of problems.

Know that finding a problem does not necessarily address the cause of problems that are experienced. As an example, when BitTorrent became popular, several network drivers had bugs that hadn't been causing problems until computers started creating a larger numbers of network connections to different computer systems. Until BitTorrent started to be used, many computers never created so many connections to different networks. As a result, those bugs weren't being noticed until the introduction of BitTorrent made changes to the types of network communications that were occurring. Certainly using less buggy drivers would be nice, but a system administrator who discovered these bugs would not have accurately identified what recent change has led to recent problems on the network. In fact, fixing the problem could cause more problems. By fixing a broken network driver, a person might cause a computer to operate with more stability. This could cause the computer to function more successfully, and enable BitTorrent to start to make many more network communications, which might use up resources (like a limited amount of network traffic that a router is capable of handling within a certain amount of time), and so fixing one problem may cause other, more significant problems. This is simply an example of how a person should not immediately jump to a conclusion (like, “the network drivers are causing recent problems”) just because they discover a new problem. Even if bugs do exist, they might only be happening because of some unusual behavior.

Keep in mind that recent results tend to have recent causes. Trying to identify what has changed recently can often help to identify what is causing issues.

Some more information, about troubleshooting steps, is discussed by troubleshooting guide.

[#logs]: Logs

The section about types of log files includes further information about various types of log files, including:

Unix log files
(See the section about types of log files for further details.)
[#wnevtlog]: Windows Event Logs

If someone is actually looking for information about a specific event that is found in the log files, see Windows Event Messages. These log entries can be created as described by the section about “reporting events”. To learn more about the actual log files, see the sections about windows event logs.

Specific error messages
Windows Bugcheck/STOP errors (a.k.a. “blue screen of death” (“BSOD”))
This is covered in the section about system panics. (See also: Specific Windows Event Log field entries?)
Errors

Many errors generate log files and/or error messages, so some of the first sections of this website to check are the sections on log files and error messages.

See: Errors section for details about how to respond to certain specific error messages, such as a system panic (including, in Microsoft Windows, a “BugCheck”/“STOP Error”/“Blue Screen of Death” (“BSOD”)).

System reboots/restarts, Bugcheck.

See the sections about system reboots/restarts, and Bugchecks.

Intrusions
Intrusions are documented more in the seperate section about break-ins.
[#breakin]: Intrusions/Break-ins

In a scenario when it seems like an intruder has gained unauthorized access, the best first response to the scenario might not be to take direct action to stop the intrusion. The following may be some initial responses approaches. The order in which these are taken may vary based on some environments.

  • Report the incident to somebody with more IT access, such as an IT staff member. The best move by an IT staff member might be to report the incident to a supervisor.

    If criminal activity is involved, there may be a decision on whether legal authorities (like law enforcement and/or judges) are likely to be getting involved. The section on providing professional services discusses “Computer Forensics”, and is reading that may be helpful for someone who is going to be getting legal authorities involved. In the case of an insignificant attack from a foreign nation located on two remote hemispheres (e.g. South hemisphere and East hemisphere, instead of North hemisphere and and West hemisphere), the likelihood of getting authorities involved may be remote (in the sense of probability, not just location).

    However, in other cases, the generally recommended course of action may be to not make changes to files.

  • Gather information, such as:
    • Where the attack is originating: Attempt to find out what IP address is connected to the system. Record that information at very least. Especially if the connection seems to be coming from a loopback interface, record who is logged in and, even better, a list of all the processes which are running, and then respond appropriately to the intruder's unauthorized access.)
      • If the unauthorized access involves a network connection, attempt to see where the network connection goes to. This could be the most time-sensitive thing to check, since the active network connection may go away due to many reasons (such as a power outage of some networking equipment at some location). Also, if possible, find out more information about the network connection, such as what local process is using that connection and what username that network connection and/or that local process is used by.

      See: Finding out what is using TCP and UDP ports

    • See what users are logged in, and what processes are running on the system. Ideally, also record which processes are “owned”(/“controlled”) by which user accounts.
    • Determine what files are open, and what username has those files opened (and by what method, such as if using file sharing or using a process on the local system). (Often, the way to figure out what user has a file open is to figure out what process is using a file and then figure out what user account is using that process.)
  • Assess what is being done, or is likely to be done, by the intruder, and how damaging that is likely to be. That may determine what sort of priority is given to stopping the attacker compared to other possible courses of action such as gathering more inforamtion about what the attacker is trying to do. In the case of a honeypot, the goal might actually be to continue to allow the current attack and future attacks in order to gather more information.
  • Determine how the unauthorized access had been granted, and what can be done to stop such unauthorized access from occurring again.

Did an intruder add a file to the system? For example, is there a history of all commands that the intruder executed? If logs are not already (routinely, automatically) stored in a still-safe location, they should be salvaged (and stored on another system, if appropriate), so they should be moved, not deleted. This may include files in a user's account. Other files, such as executables, may need to be saved to be possibly analyzed. However, if the unauthorized access is to be stopped, it is generally a good idea to move such files to a secured location. (Since files are often re-used, often automatically, by accessing the files at known previous locations, moving the location is often helpful to stop such files.) On the other hand, moving files to a location visible by the attacker may help the attacker (by letting the attacker know what steps are being taken, and/or providing convenient access to whatever attack tools are available).

System usage
Seeing how a system is used may help determine a problem. Possible resources may be at: reporting events, or information at used resources.
Physical issues
[#smoke]: Smoke
Smokin' Experiences
Power supply issues
[#psubulge]: Power supply shape incorrect

If a side of a power supply is a bit round (especially if swollen, rather than dented inwards), the PSU is likely already damaged to the point that something broke, and it should not be treated as a part which is safe to use. The basic reason it likely bulged is one, or more likely, two of two reasons: The metal is flimsy, and the parts inside bulged.

The latter, a situation involving bulging parts, is a death sentence: The bottom half of JonnyGuru.com's “The Bargain Basement Power Supply Roundup Review”: Page 4 shows testing on a Topower TOP-300SSA unit. (The voltage graph tells wonders, as can be understood by the page's commentary and a comparison to the voltage graph shown on page 2.) Granted, this may not be a fair test for that line of unit, as the PSU supplied may have been something already damaged, but the review is useful anyway as some of the comments in the review mention a bulging cap and what the results were.

For such a PSU, prevent it from being used: If you have the authority to do so, declare that it shouldn't be used and take it away; Cutting wires so that it is unusable may help to make sure somebody doesn't try to act smarter by using it anyway. Any attempts to use such a power supply as a functioning unit are likely dangerous and should be avoided.

Power supply won't power on
Fan isn't spinning

Did the computer get unplugged? Is it plugged into a UPS or a surge protector that is turned off (easily remedied by a power switch)? Check the switch on the power supply. An “All-in-One” book by Mike Meyers @ Total Seminar, about the CompTIA A+ Exam (probably 7th Edition?), on page 391 (which is in Chapter 10) stated, “Power supplies break in computers more often than any other part of the PC except the floppy disk drives.”

[#psuallof]: Power supply's switch automatically flips off

This problem is when the power supply's power switch, physically located on the power supply, stays in the “off” position. If the switch is pressed so that it is in the “on”, the switch automatically flips back to the “off” position as soon as it is no longer held down.

Long story short: This has been witnessed by multiple power supply units which have permanently failed. Use a better power supply unit. (Some identifying traits of good power supplies are in the Hardware section: section about power supply units.)

This may be an automatic response to a problem with a broken power supply. Check to see if it is bulging on any side. It may even be worthwhile to remove the power supply and see if, once some pressure (like being pressed up against the side of the case) is released, if it then starts to bulge.

If the issue goes away on its own, be concerned. Getting a new power supply may be a good idea, particularly when budgets easily allows for taking care of problems. If the issue comes back an additional time with the same unit, definitely stop using the power supply unit at that point. That suggests there may have been some internal swelling, which might have been relieved when the unit had stopped for a while, but the fact that it happened again suggests that the PSU is not reliably able to reliably take care of the desired task.

A new power supply may be an unwanted expense, but having a very high chance of damaging other components (whether immediately or not) is simply not likely to be worth the risk in this sort of situation. If data absolutely must be copied off, try to use another computer to copy the data.

[#hrdwrgon]: Hardware isn't visible to software
Generic steps

Which step to perform first may depend on factors. For instance, if the power is off and the case is open, then checking internal wiring may be more convenient than seeing whether a driver successfully loads. If the computer is powered on, but placed in a bad location without convenient access to wires, then a quick detection of software, or CMOS settings, may be more conveneint than re-verifying wiring. (This text is calling a place with little access to wires a “bad location”, rather than just a more neutral “inconvenient location”, because a place with little access to wiring probably means a place with little airflow, which is generally bad.)

If the CMOS is likely to show some information about the drive (true for storage media and many components built into the motherboard), ensure the devices are enabled and see what the CMOS detects, if anything. See if the device shows up by the OS. If available, try connecting to a different port/jack (or, for an expansion card, plugging into a different slot).

The section about detecting hardware may be helpful.

Check the wiring. Perhaps a technician forgot to plug the device in, or accidentally knocked a power or data cable loose. Make sure the wiring connects to both the device and to the motherboard (directly, via a port or set of pins on the motherboard, or indirectly, such as connecting to a port that is in the case or on a card. If connecting to an external connector on the case, such as a USB port, make sure that port has a wire that successfully connects the port to the motherboard.) (A mistake that could be easy to make is to try to supply power using a molex connector, when that molex connector comes from something like a splitter used to help power a fan which might be disconnected. This may not be super common, but if there are multiple such cables, then the make can be very easy to make. So, tracing the cables may reveal something that could be missed just by checking both ends.)

Of course, the process of checking the connections is meant to also involve making sure that any removable device is plugged in as needed. If a drive supports removable media or a slot supports a removable card, make sure anything that is removable, but required, is properly connected to whatever it needs to be connected to.

In general, try not to support more than two hard drives with one set of molex connectors, and share the video card with as few other devices, especially long term storage devices, as possible.

Some more specific examples
Drive isn't visible

This section is basically about hardware not being visible. For help with a similar situation, see the section on dealing with “Drive's Filesystem/Volume is not accessible”.

For some types of drives, if the system is powered on (or very convenient to power on), it is generally interesting to see if the drive is visible in the BIOS/CMOS setup program. If it is not, then wasting time in the operating system is generally not needed. This isn't quite as true for, say, USB drives, but for PATA and SATA drives that connect directly to the motherboard, it generally is true.

Drive not visible in System Startup (e.g. BIOS)

Is the BIOS setup configuration in the CMOS set to disable the drive? Some (particularly older) BIOS setup programs (which are new enough to support auto-detection) may require the user to instruct the BIOS setup program to go ahead and perform a manually-initiated Auto-Detection. If the BIOS setup program is so old that it pre-dates Auto-Detection, then make sure that the drive's geometry is correct. (All zeroes will not work. Incorrect values, other than all zeroes, may risk the drive not working properly, which at the very least may involve disk corruption.)

Check PATA master/slave settings. If there is just one optical drive, and if that drive may write to media, it is sometimes better for it to be a master, so that drive takes priority. Generally it is preferred for it to be a secondary master so that the bootable long term storage device can be primary. Beyond that, it is generally best to make the new hard drive master, as it is more likely to be compatible with any older addressing methods used by the older hard drive, and the newer circuitry may handle higher speeds better. The nicest way to jumper things is generally to use Cable Select on all equipment, relying on cable positions to determine which drive is the master drive. This way, no jumpers need to be adjusted if a single drive is removed, or if another drive using Cable Select gets added later. The main reasons to not use Cable Select on supporting drives are if cable length becomes a factor (being unable to reach one of the drives with one of the desired ports), or if compatibility is affected. Some equipment may have some compatibility issues with Cable Select: if such problems are actually encountered, including if at least one drive on that primary/secondary channel doesn't support cable select, then relying on Master/Slave settings may be a worthwhile step, at least for troubleshooting. For other systems, perhaps mainly laptops, there may be an increased chance of things working well when Cable Select is being used. For many drives, Master and Single are the same jumper setting (or the drive will work in either mode, even if there are separate documented settings).

While discussing IDE-level technology: DebianInstaller Wiki on errors reading data (particularly a CD image) stated, “A tightly folded IDE cable can also cause read errors, try repositioning the cable.”

Check to see if the drive may use some sort of hard drive technology which isn't fully supported by the BIOS.

Drive not seen in some software

If the drive isn't visible by some software, try other software.

One example of this: see if a hard drive partition program can see the drive. If it can, and if it says there are no partitions on the drive, that may be why other software is not showing the mounted partition.

Another example: If the drive isn't visible in software, check to see if it is visible by the operating system. (For example, if some CD-writing software doesn't show a drive, perhaps it cannot write to the drive due to not having some sort of drivers needed to write to that particular type of drive, even though Windows can detect the drive. Seeing whether the device shows up in “(My) Computer” and/or “Device Manager” may be interesting.) If a network drive doesn't seem to be visible, check to see if other forms of network connectivity work. (If the network drive is based on SMB, see if connecting with SMB to a different path on the same computer works.)

HDD Controller failure

Generally, the failure was that the controller did not indicate that there was a connected hard drive. Generally, the reason that the HDD controller did not present the hard drive to the BIOS is because of an issue with the hard drive. (This is particularly true starting with ATA hard drives that used “integrated drive electronics”, a.k.a. IDE, which basically refers to the fact that the hard drive controller circuitry is part of the physical hard drive.) See the section about hard drives not being detected, as that is by far going to be where the solution is likely at. (This includes checking some things other than the hard drive, such as data cables being plugged into the motherboard and the BIOS setup having decent configuration options visible in the CMOS. Particularly for older systems where the BIOS is not automatically detecting geometry, having incorrect settings for the hard drive, such as all zeroes, may be an issue with the data stored in the CMOS. So also check the BIOS configuration that is stored in the CMOS.)

Hardware doesn't seem to be responding

See if device is visible (in the operating system, not just the BIOS/CMOS setup program). (If not, see the section on hardware not being visible.)

See if the software is configured to use the correct device. A classic example is if sound does not work because the operating system is sending output to another sound “device” object, which might actually be an internal/virtual device that doesn't correspond to any actual speaker ports. (For Microsoft Windows, this can be fixed in the control panel applet's Playback tab, and this may happen more with Windows Server 2003 and XP and earlier operating systems.)

Verify intensity. This especially refers to making sure that sound volume is high enough and is not muted. (Just don't increase intensity too much that, after fixing a problem, causes an unpleasantly loud sound when a test is performed later.) There may be multiple volume and/or mute controls: the software may have its own, the driver or operating system may provide its own, speakers may have their own, headphones may have their own, and sometimes they may be found part-way down an audio cable (rather than on a device at the end of an audio cable). In addition to basic volume level, check if there is a separate option for muting the sound, or if there are multiple controls (e.g. one for master volume, and one for Wave output). (Sometimes some of the audio controls may be hidden, so check if that is the case.) For monitors, make sure brightness is not at zero.

Verify connections. Fully verify connections, including internal cables, cords going to another outlet, and if that the outlet works: If the outlet is a splitter that requires power, make sure that it has power and appears to be working. For wired Ethernet, check for a “link light” which is almost always included on both ends of the connection. (Verifying either end should work.)

Check if the device has been turned off by software. (At least in theory, this may be more common on portable devices where power saving is more highly valued.) For Microsoft Windows, some devices will have a setting visible in Microsoft's Device Manager. Another potential option, perhaps more commonplace but a bit less standardized in implementation, is to locate a solution using some software that is specific to the device. To locate this software, find out who what company manufactured the device and see if there is some software by that company. Such software may be in the “System Tray”/“Notification Area” (which is the better first place to look for this) or on the Start Menu. For some portable devices, some common devices may have a physical power switch (particularly for devices with antennas), so check for that. Laptops might also have a keyboard shortcut combination involving holding a key called Fn and also pressing another key, most commonly a numeric function key (F1 - F12).

Nirsoft's DevManView is similar to Device Manager. One benefit to using this software is that it can take command line parameters.

Unusually, Wikipedia's article about ntdetect.com states, “Though it has the .COM extension, it is not actually a DOS application.” Some “debug” versions may be available. Microsoft KB 927229: Win2KRK Tools for Administrative Tasks refers to an ntdetect.com which is “a debug version of Startup Hardware Detector to use for troubleshooting hardware detection issues.” See: Installer for ntdetect.com. For later operating systems, Wikipedia's page about ntdetect.com: “Troubleshooting” section mentions ntdetect.chk included with Windows Support Tools.

[#unmtedfs]: Drive's Filesystem/Volume is not accessible

If a drive is not accessible, there might be various possible causes. Know what kind of drive this is. If the “drive” is actually a filesystem volume on the local hard drive, check that the data storage device itself is visible. (If the drive is a remote “network drive”, this could be caused by network issues.) For filesystem volumes on a local hard drive, check out software that reports local devices (like “Device Manager” in Microsoft Windows, or boot logging), and/or software that can report the disk layout details, and/or software familiar with any sort of RAID hardware being used. Such software will generally report what drives are visible. If the hardware isn't being see, that explains why the filesystem volume won't mount. In that case, see the section about “Hardware isn't visible to software”.

In Unix (noticed with OpenBSD), errors with mounting a drive might provide a rather ambiguous, possibly misleading error message. For instance, the destination directory may be output, followed by words like “Invalid argument”. (e.g.: “mount_cd9660: /dev/vnd1c on /media/cdrom: Invalid argument”.) That can be caused from various reasons such as invalid permissions, an incorrect filesystem type being specified, or the drive physically not responding. If syntax has been triple-checked and really looks right, ignore the “Invalid argument” output, and check other causes.

Resources being used
File

See: section about File in use for open files (or perhaps files that are, for some other reason, considered to be “busy”/unavailable).

Disk space
Seeing how much space is free

See the section about Checking for amount of free disk space.

Finding out what is using up disk space

See the section about Finding out how much disk space is used.

Dealing with having too little available disk space where it is needed
See the section about Dealing with having too little available disk space where it is needed.
Registry in Microsoft Windows
See section about: Registry in Microsoft Windows
Disk space/responsiveness/time
Unix
OpenBSD

See: OpenBSD: Disk space/responsiveness/time which may contain topics such as:

Heavy disk usage

See: OpenBSD: Heavy disk usage

Unnecessary slowness

See: OpenBSD: Unnecessary disk slowness.

[#cpuusage]: CPU usage
Finding what is using the CPU
Microsoft Windows
Using information from command line programs
See: Finding what is using the CPU using a command line in Microsoft Windows.
Using a graphical interface
See: Finding what is using the CPU using a graphical tools in Microsoft Windows. (Contains information about using Task Manager, Resource Monitor
Verifying that the CPU is being heavily used

Information about seeing what is running may be helpful. There might be more targeted, efficient methods of checking CPU usage in the section about how CPU is being used.

The section about verifying CPU usage (in Microsoft Windows) has information about verifying CPU usage (in Microsoft Windows) using text-mode applications and verifying CPU usage (in Microsoft Windows) using graphical interfaces.

Controlling CPU usage
Fixing issues with high CPU usage)
There may not be a simple universal approach. Some approaches may be covered in the section about controlling CPU usage. Such approaches may include, having the operating system (or other software, like a driver) forcibly controlling multitasking so certain software may not use as much CPU, and debugging to see how CPU is used.
Maxing a CPU
e.g.: using WMI, playing games (which can be an effective method, and also can be a very ineffective method), or using specializied applications.
[#lowmem]: Memory

See: low memory.

[#netwktrb]: Network
Communications are not working

Perhaps some of the following tips may help:

See if name resolution is related to the issue. If the same computer can be accessed by using an IP address, quickly, then that does indicate the issue is not caused by physical hardware.

Try to localize the issue:

See if other systems, including the “default gateway” and/or the local firewall, can be reached quickly. If not, the issue is usually not an issue with the Internet service being provided (by the Internet Service Provider). This doesn't mean to suggest that an attack from the Internet isn't slowing things down, but it may indicate that the issue is local. If there is a “modem” used for a DSL, cable, or wireless connection, and it has a default gateway, see if connections to that system are fast. If so, see if other sites on the Internet are responding quickly.

See: IP network troubleshooting.

Speed
See: Hardware testing: Warning and then Network speed testing.
More network testing approaches/software

No strong recommendations of specific software are being made. Here may be some options broken down by operating system. (This may currently be very preliminary; far from complete. There certainly may be better options than what is currently mentioned in this section.)

Unix
StressLinux software provides a list of some software, some of which may be related to networking.
Microsoft Windows
NetDiag

The Windows Support Tools may come with a NetDiag utility. This isn't really meant so much for detecting slowness, but rather for detecting broken things. However, broken things can lead to slowness...

Download Details for NetDiag for Win2K provides a hyperlink to Redirection Page to NetDiag Setup/Installer, Microsoft KB 321708: Using Network Diagnostics Tool (NetDiag.exe) in Win2K, TechNet article about NetDiag, TechNet article about Win2K RK's NetDiag.

[#slowsys]: Time/Responsiveness (slow computer)

Especially when a person is running low on time, low responsiveness can be unpleasant. Here may be some tips on how to get some of that back.

This section may still be a bit incomplete. What this section is, or will be, about:

The first thing to do is to try to identify the cause of slowness. Run some tests. There may be some sort of all-in-one solutions that try to report on multiple tests, or individual tests may be performed. Possible causes that could lead to a computer seeming to be slow may include CPU usage, the computer being busy from accessing “long term storage” (a.k.a. a disk, or “hard drive”) (perhaps due to the usage of virtual memory when low memory is being encoutered), or network slowness. Checking interrupts may, or might not, help to identify hardware that is quite actively needing the CPU.

Sometimes the cause may be a bit less than fully obvious. For instance, a computer might be attempting to connect to a remote resource, and holding up an interactive process until the attempt to the remote resource becomes a completed attempt. The completion of the attempt may require waiting for a timeout, or (worse), user interaction. (Consider reviewing tasks or switching to other tasks, or minimizing a foreground window, to see if a password prompt sometimes ended up becoming a backgrounded window that the foreground window is waiting on.)

NirSoft WhatIsHang might help.

As a generalization, when something is very wrong, a computer may retry a process repeatedly. This may cause the system to crawl. Checking logs may help. Performance Logs and Alerts may help. (For Microsoft Windows, further details about “Performance Logs and Alerts” may be helpful.) Microsoft KB Q927229: Win2KRK Tools for Administrative Tasks mentions Relog, software that converts these logs to one of three other formats: CSV, tab-deliminated, or some other binary form. Installer for Relog may be of some use.

Additional software tools for checking performance
Resource Monitor

Other resources may include Resource Monitor (built into newer versions of Microsoft Windows). e.g., in Windows 7, Task Manager has a Performance tab with a “Resource Monitor...” button.

Process Explorer

See information about Process Explorer.

After going to View, “Select Columns...”, and adding some columns, perhaps from the “Process I/O” tab, or a more specific tab like “Process Disk”, or “Process Memory”'s “Page Faults” check box.

More programs

Microsoft KB Q927229: Win2KRK Tools for Administrative Tasks mentions showperf.exe, “Performance Data Block Dump Utility”. “This GUI tool lets developers dump and display raw performance data as it is read from the Windows 2000 performance registry. ShowPerf reads the performance data from the registry and then displays the unformatted and unsorted output in a list.” Get at: Showperf installer/setup.

Quite often, a huge factor that people use to judge a computer's speed is how quickly programs seem to start up, or even how quickly the operating system seems to be fully started. If the computer seems to be quite a bit slower than when the computer was initially purchased, check what programs are running. The speed that programs start can often be improved by freeing up memory, by removing any sort of “Quick Start” programs and unnecessary objects. Also, try disabling browser add-ons/toolbars/“Browser Helper Objects” (“BHOs”). Some other programs that might be completely unnecessary may be running, and the best way to locate those might just be manually looking at what gets automatically started.

Diagnostics-Performance logs

This guide does not currently describe how to use specific information, but there is a fair amount of information which presumably might be useful.

In Event Viewer, under “Applications and Services Logs” (not under “Windows Logs”), under Microsoft\Windows\Diagnostics-Performance”, there may be a log called “Operational”. (This was found in Windows 7, and was found by noticing Control Panel, Performance Information and Tools, Advanced Tools, “View performance details in Event log”.)

Some performance statistics might be able to be obtained by studying information in the “Diagnostics-Performance” log's events. (Event ID 100 may be the most general summary about a system startup, and 300 discusses resuming from standby. It looks like Event IDs that fit the pattern of x01 (e.g. 101, 201, 301) are related to applications (being slow), x02 are related to drivers (being slow), x03 are related to services (being slow). 1xx is related to system startup, 2xx is related to shutdown, 3xx is related to resuming/standby.

106 is “Background optimizations (prefetching)” taking longer to complete (affecting startup). 107 is related to slowdown from “Application of machine policy” (applying the policy), while 108 is a similar “Application of user policy” causing slowdown. 109 is a device taking long to initialize (affecting system startup).

302 and 303 are related to standby (and fit the standard patterns: x02 are drivers and x03 are services). 304 is slowness creating a hibernation file. (Also related to the hibernation file is 352.) 307 and 310 are “Preparing” [something] “for sleep was slower than expected”: 307 is for Winlogon and 310 is “system worker threads”. 350 is BIOS initialization time being slower than permitted for some sort of program that allows somebody to use a specific logo. (The time limit for that is 250ms.) 351 is a driver being slow to resume. 352 is slowness reading the hiber-file.

500 describes “The Desktop Window Manager is experiencing heavy resource contention.” 501 may have the same basic description, but then have additional details.

For some information that may be specific to Microsoft Windows, or maybe even more specific to Microsoft Windows computers that are part of an Active Directory domain, “Ask the Directory Services Team: “So you have a slow logon...? (Part 1) provides some tips (as does “Ask the Directory Services Team: “So you have a slow logon...? (Part 2)).

[#devaddr]: Device Addresses

The section about Device Addresses includes details about:

Device names
Modern operating systems provide “names” for devices. These names, which often look like filenames (or paths that include something that looks like a filename, possibly looking similar to a network-based path like a UNC), are often how a lot of modern software will refer to a device. (This information may be something that technicians do come across.)
Hardware addresses
These have generally become non-concerns with modern hardware, so most technicians won't need to be interacting with these much. Examples include I/O port addresses, IRQs, and DMAs.
Currently running software
[#whatruns]: Seeing what is running
Currently running software: Seeing what is running includes information about getting details on running processes, like a current PID (process ID) assigned to a specific running instance/copy of the software.
[#adjrunsw]: Adjusting what is running
Currently running software: adjusting running software includes information about altering software that is running, such as stopping software that may be running.
Scheduling tasks

Information should be in behind-the-scenes.

Specific problems/messages/errors/warnings/etc.

Problematic messages

Problems logging in
Keyboard not working as expected

Make sure the “Lock” keys are set appropriately. This especially means “Caps Lock”, but also “Num Lock” (especially on shrunken keyboards as found on some laptops).

If possible, make sure typing is working as expected. Typing a username may show results more visibly than typing in a password field.

If typing isn't working as expected (or if it is not certain whether or not typing is working as expected), perhaps the computer thinks that a key is being held down. Pressing the key will often help fix this, so try pressing each Ctrl key, each Alt key, and each shift key, one at a time. On laptops, also press the Fn key if there is such a key. If there is a Start button key (which may show a Microsoft Windows logo on the key), press each such key. Also press the Menu/“Shortcut menu”/Context/“Right-click” key.

Lost password

If the computer cannot be logged into, see the following section (about logging into a computer).

If the password is for a device, most devices have a password override option. This may involve holding down a button called “Reset” for some time (perhaps 5 seconds, or 30 seconds) when the device is powered on. (If a computer has a button labelled “reset”, this is not recommended for the computer.) If the device seems to be repeatedly resetting itself, this technique might not work or the bottom might be getting held down for too long.

Using an Internet search engine may help. In at least one known case, a password for RAID controller software is documented to be in a file on the hard drive. Any administrator could simply rename that file. Also, another account, which is not using administrator privileges/rights, might be able to rename the file if an administrator hadn't previously used filesystem security to prevent such a change. This sort of possibility was clearly documented on the manufacturer's website, so it is well-known and probably intentional. (The general expectation/hope is that administrators will not only set a good password, but also properly secure the file.)

Logging into a computer

The following steps may not typically apply quite so much for user databases maintained by third party applications, and apply more to logging into an operating system.

Using the right authentication source

Make sure the authentication is going to the right computer. For logging into Microsoft Windows, this may require putting the domain name or the computer name before a backslash before the username. For Microsoft Windows Server 2003 and XP (and perhaps earlier?), there may be a dropdown box showing a list of known accepted authentication providers. Make sure that is set appropriately.

Simple workarounds, requiring preparation but not difficulty

Know that Microsoft Windows may have some alternative methods: a password hint, or a “password reset disk”, or a well-known back door account that is left enabled on many machines. Details are in the section about user authentication.

Using known credentials for an Administrator

This step may be lossy (stopping an old password from working, possibly affecting other users/devices that do have the old password and may even be using the old password for a task like checking E-Mail). Try using an Administrator account, either locally or perhaps using a remote method to maintain user accounts, and using that empowered account to reset the password of the account that is not logging in successfully. (See the section about user authentication.)

[#lostpwd]: When it is established that no password is known
Using a known backdoor account
The section about Dealing with a lost password in XP Home mentions an Administrator account being available in Windows XP Home machines, even though it may not be visible in the default login screen.
[#comnpwds]: Using well-known passwords

Trying various passwords can sometimes have a pay-off. Then again, even such small-scale techniques of trying to use “brute force” might be enough to cause intrusion detection, or even less conveniently, intrustion prevention, to kick in. This may actually lower the ability to successfully login (at least during the short term). Weigh the likelihood of such a problem before performing guesses.

Here are some likely candidates for accounts that might have been intended to be entered (such as a new or recently reset device that is using a default password): Try a blank password. If that does not work, try: password, P@ssw0rd, n3wp4ss!, P@ssw0rd!, Password, admin, 1234, letmein, iloveyou, N3wp4ss!, pw, pwd, 1236 (especially on 3x4 grids of numeric displays, where the 6 is just below the three), (the username (e.g. “admin”)), (the username backwords, especially if the username is “root”), or user. If the user's actual name is visible/known/obtainable, then trying a fully lowercase version of that name, or well-known common variations of the name (if applicable) (e.g. such as “Chuck” or “Charlie” instead of “Charles”) may frequently be a working key. Another popular one is: iloveyou. For any of those that are longer than 8 characters, try using just the first eight characters of the password. There are many others, but the above are some of the most well known, and shared, passwords amongst IT staff or, in some cases, the general population.

These are passwords which may be included in quite a bit of documentation, class instruction, and/or default passwords. The password of P@ssw0rd is rather popular because it is a length sufficient to meet some requirements (including a length of being at least eight characters), and includes all four of the different categories of common characters: uppercase letter(s), lowercae letter(s), number(s), and punctuation mark(s). (Also, the puctuation mark used, the at sign, doesn't tend to alter command lines (in MS-DOS, Unix, or successors/derivatives) as much as many other unescaped punctuation marks.)

For usernames, try Administrator, root, admin, Owner, user, or username.

For devices other than computers, know that some devices have well-known default credentials, such as American Comcast cable modems that may use a username of cusadmin and a password of highspeed. Don't hesitate to use a search engine to type in the name of the device model, and the phrase “default password”.

If the credentials are known to be entirely numeric (because this is for a device that does not have an alphabetic keyboard), the most popular passwords are probably going to be 1234, 123, or 12345. Other popular ones may be 1236, or 7896, due to both the placement of these numbers on many numeric pads, and also the length being sufficient to often reach many minimum requirements imposed for such devices.

For websites, a very common password is simply the base name of the website (typically meaning the domain that is in the TLD, but most commonly not including the TLD), or perhaps the topic or other reason for going to the website (e.g. email, work).

There are various other wordlists, some of which are much longer (20 passwords, or perhaps even hundreds) available on the Internet. However, trying to use such common words often amounts to truely trying to use “brute force”, rather than hoping to stumble across something that was (possibly intentionally) left super-vulnerable. In many (legitimate) cases, taking more aggressive steps to modify the password (without using the same credentials that the user had been using) may be less time consuming and/or challenging. Trying to take more aggressive steps to “crack” a password may tip-toe along, or outright cross, the border into black-hat attacking techniques that are generally beyond the scope of this (more introductory) troubleshooting text.

[#chchospw]: More challenging steps to change a lost password for logging into an operating system

This is currently partially redundant with userauth.htm's section on changing passwords.

section about user authentication

If all else fails, it may be useful to start the computer in a way that provides access to the computer without requiring that the user can log in. This may involve booting into a “safe mode” or booting into a “single user” mode. If all else fails, booting off of different bootable media may be helpful.

Note that while it may seem most convenient, and processes perhaps most familiar, if this alternate bootable media uses the same operating system as the one that is not allowing a login, using the same operating system is not necessarily required. Trying to use a different operating system might work just fine (or might result in some sort of nasty incompatibilities causing troubles). Note that if incompatibilities do cause some troubles (like different versions of filesystem drivers), the problems could be quite difficult to handle (which is probably why a guide on Petri.co.il says “Always install the same OS.”) This might apply mainly to NTFS versions. Problems might occur when using some older Linux drivers. A guide on Petri.co.il says, “if you lost your password on NT - install a new instance of NT, not Windows 2000, as doing so will ruin your old NT installation (because of the difference between the NTFS versions). Same goes for W2K, XP and Windows Server 2003. Always install the same OS.”

If there is a desire to try using a different operating system, note that there are some requirements that will need to be met. The operating system that is used needs to be able to write to the filesystem volume that has the user database, and the operating system that is started needs to be able to run software that can handle modifying that database. The operating system will also need to be able to support the data storage being used, including any sort of complex data storage setup configurations (such as RAID that requires special software support, although RAID implemented purely in hardware might be completely transparent and not require special software support). The program also needs to come with, or to be able to easily make available (download/obtain, and install as needed), software that will be able to change the user database file(s).

Note that the above warnings are not meant to completely discourage the user from using a solution that involves a different operating system. Using a Linux-based solution to reset a Microsoft Windows Active Directory domain Administrator password has been a fairly common method of resolving that need.

Examples of how to get into a “safe”/“single user” mode:

Microsoft Windows Safe Mode
Try holding, or perhaps rapidly pressing, F8 and/or F5 and/or Ctrl when the computer is restarting.
OpenBSD

Reboot. If rebooting from the hard drive, when rebooting, at the “ boot> ” prompt, type boot -s. If rebooting from a CD, that may also work, but it might also be much nicer to boot from the hard drive if possible. For instance, using boot -s hd0a.

This whole process is documented quite nicely by OpenBSD FAQ 8: Handling a forgotten root password (FAQ 8.1).

On Unix machines, this may commonly be doable by interacting with the boot manager to send a parameter to the kernel. Otherwise, this is generally an option as long as the user has some sort of bootable removable media.

Something to keep in mind: make sure that the password that has been reset is for an account that has superuser/Administrator/root privileges. (In Unix, the account named “root” generally qualifies. In Microsoft Windows (both for local accounts and for Active Directory), the name Administrator might be the most common account that fits that description.

Once booted, the data on the disk with the user database will need to be fully accessible (readable and writable). In Unix, booting a machine in some sort of crippled (recovery/“single user”) mode may cause the disk to initially not be writable (especially if the machine was shut down without going through the proper safe shutdown process), so this probably will be a problem. Details on handling that problem are the issue of the main (root) filesystem being writable, as well as details on how to proceed further with the password change, see: changing the credentials for simple logins: Username and basic passphrase.

Restoring Data

Another way may be to restore, from backup, older versions of whatever files hold the user data. With Microsoft Windows Active Directory, this may mean restoring the “System State”. Note that this may incur some data loss, and so may not be a very good approach. In the case of a simple user database, this could affect changes to users (such as newly created users, or a user account that had a password changed, or perhaps affecting other recently changed properties of a user account). In the case of a Microsoft Windows Active Directory System State, even more data may end up getting reverted, so alternative options might be better to pursue. If the system is running, some software might, at least in theory, be able to have pre-authorized remote software initiate restoration of backed up data. In other cases, the ability to restore the data might require the ability to run a desired program, and so this approach might not be significantly less complicated than getting a password reset performed.

Other approaches

Rather than changing the password, if one has the ability to write to the hard drive, one can try replacing a file that the computer will be automatically executing. As an example of this approach, a guide on Petri.co.il describes replacing LOGON.SCR (by first renaming the existing file, so that it may be easily restored later) with a copy of CMD.EXE in Windows NT and in certain versions (service-pack levels) of Windows 2000. (This hasn't been tested by the creator of this text, but is simply being passed on as an alternative that reasonably sounds like it would functionally work.) Then the screensaver functionality will execute the LOGON.SCR file. At least in these older operating systems, permissions may then be suitable for using a command line technique to change a password (as described in the section about user authentication).

Events
The Events page discusses some specific incidents that have occurred. They describe how some very significant problems were handled.
Understanding the screen
ID Responsibility for pixels

Wondering what program is drawing the pixels that you're looking at? Here are some techniques that may help.

Microsoft Windows

Usually, in Microsoft Windows, the active forground application will be, in some way, highlighted on the task bar. (Granted, in Windows 7, that can be a bit hard to notice, but the box around the foreground application is shown.) However, sometimes something might show up on the screen, and it might not be from a program that is identified as the registered foreground application.

Process Hacker window identification

Some software called “Process Hacker” can help with this. This technique does seem to require using a rodent. Next to the easier-to-visually-notice “Find Window and Kill” icon (which is a red X), there is the “Find Window and Thread” button. Place the rodent over that button, and then hold down the primary rodent button (often referred to as the “left mouse button”). This will cause the “Process Hacker” window to be placed in the background. With the mouse button still held, hover over the window in question.

Unfortunately, this does not help to identify which program is responsible for a part of the task bar, including the “system tray”/“message notification area”.

Process Explorer window identification

Sysinternals Process Exploerer has an icon. This “Find Window's Process (drag over window)” icon looks like a circle with a cross (perhaps meant to suggest a sniper's scope), and is to the right of the “Find Handle or DLL (Ctrl-F)” icon which looks like a set of binoculars.

Finding a process in X Windows

I believe... there is a program similar to xkill but less destructive. Perhaps xdpyinfo or xwininfo? (More research needed...)

Warnings about running files
Microsoft Windows
User Account Control

In Windows Vista and newer, a feature named “User Account Control” may be used. See: user account control.

Attachment Manager

Attachment Manager Zone FIle Checking

Starting with Windows XP SP2 and Windows Server 2003 SP1, an “Open File - Security Warning” dialog box may show up. MS KB 883260 - Attachment Manager

Disabling warnings

One way is to have the file be signed. A software developer can do this. See: home page for Microsoft's File Signing Tool (Signcode.exe) for *.exe and *.DLL files. Perhaps see also: MS KB 247257: Steps for signing a .cab file

Or, the feature can be disabled. (See the next section about disabling features.)

Disabling feature

Note: obviously reducing this security check has some potential of reducing security. This guide is not recommending this process, but simply showing how to do it.

Local files
Single file/run fix

Phil's comment to Blorgbeard's SuperUser.com question on “Open File - Security Warning” speculates that this can be caused by data in a file's alternate file stream (“AFS”) on an NTFS drive. That might be right; MS KB 883260 - Attachment Manager does make it clear that NTFS is needed for this feature (and that FAT32 does not support the feature). Removing that data can make the file appear to be from a local source, instead of downloaded. To do that, he suggests:

move filename.exe > tempfile
move tempfile > filename.exe

(Hmm... does that really work? The following seems like it would work better...)

type filename.exe > tempfile
del filename.exe
ren tempfile filename.exe

Another way, via the GUI, may be to check the file's properties. On the “General” tab, below the file attributes, may be a “Security:” field, that says “Thisl file came from another computer and might be blocked to help protect this computer.”, and an “Unblock” button. (MSDN “We know IE!” blog : How to bypass the security warning "Unknown Publisher" with the checkbox "Always Ask Before Opening this File" shows an example: Sample properties from downloaded copy of PSPad executable.) Going through that process is likely *more* work than just getting the dialog box, but may help since it would only be needed once. If an executable file is being run multiple times, this approach may provide a bit of help by not needing to experience the dialog box multiple times. Note that this option might not even exist: MS KB 883260 - Attachment Manager documents HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments\HideZoneInfoOnProperties (able to be set in Group Policy under “User Configuration\Administrative Templates\Windows Components\Attachment Manager”).

To just affect a file for a single execution:

SET SEE_MASK_NOZONECHECKS=1
filename.exe
SET SEE_MASK_NOZONECHECKS=

(This is essentially the process taken by MS KB 889815, which uses VBScript.)

Feature disabling

Note: A lot of this is currently untested, and based on documentation. Test this before fully trusting it...

Environment

Note: This documentation has been udpated without being tested. An older version of this document referred to HKCU\Environment but that seemed highly unlikely compared to "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment", so all such references (here) have been changed to the more likely scenario. (This detail, and perhaps much of the rest of this section, have been untested so far.)

via command line

May need to be UAC-elevated. (See: user account control.) Using SETX as recommended by Seven Forums: “Windows 7: Open File Security Warning - Enable or Disable” : Comment #5 (by Cavaldi).

SETX SEE_MASK_NOZONECHECKS 1 /M
REG Query "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
REG Query "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS

via registry
REG Query "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
REG Query "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS

To change the setting for one user:

REG ADD "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS /T REG_SZ /D 1

or

SETX SEE_MASK_NOZONECHECKS 1
REG QEURY "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS

To, instead, rely on a setting that affects all users on the system (but which could be overridden using the above setting), you can use this:

REG QUERY "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
REG DELETE "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
REG ADD "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS /T REG_SZ /D 1

or

REG QUERY "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
REG DELETE "HKCU\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
SETX SEE_MASK_NOZONECHECKS 1 /M
REG QUERY HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /V SEE_MASK_NOZONECHECKS
via VBScript

MS KB 889815 has an example.

Per file via registry

Another approach is to affect handling by filetype. To disable the warning:

REG DELETE HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments /V SaveZoneInformation
REG DELETE HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Associations /V LowRiskFileTypes
REG DELETE HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments /V SaveZoneInformation
REG DELETE HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\Associations /V LowRiskFileTypes

To enable for one user, start with...

REG ADD HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments /V SaveZoneInformation /T REG_DWORD /D 1

... and also, use this...

(Following is untested; based on info that was found online, but syntax is unverified.)

REG ADD HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\Associations /V LowRiskFileTypes /T REG_SZ /D ".exe;.vbs;.msi;"

(This example set of extensions came from MSDN “We know IE!” blog : How to bypass the security warning "Unknown Publisher" with the checkbox "Always Ask Before Opening this File". For other examples/references, you may see: Comment by Oleksuy on a SuperUser.com question, and forum post about Open File Security Warning.)

To enable for all users (but can be overridden using above per-user registry entries), use:

REG ADD HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\Attachments /V SaveZoneInformation /T REG_DWORD /D 1

... and also...

(Following is untested; based on info that was found online, but syntax is unverified.)

REG DELETE HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\Associations /V LowRiskFileTypes /T REG_SZ /D ".avi;.bat;.com;.cmd;.exe;.htm;.html;.lnk;.mpg;.mpeg;.mov;.mp3;.msi;.m3u;.rar;.reg;.txt;.vbs;.wav;.zip;"

(This example set of extensions came from Seven Forums: Windows 7: Open File Security Warning - Enable or Disable.)

Remote files

Files retrieved over a SMB/CIFS network share may cause additional reason for a prompt.

Note: It is currently speculated that the information about local files may also be applicable/helpful.

Under Internet Options (via Control Panel, or Microsoft Internet Explorer's “Tools” menu, “Internet Options” menu item), find: Security\Local Intranet\Sites\Advanced.

Choose “Advanced”, and try adding a network drive (e.g. “R:\” or \\smbserver\smbshare ). (Note: This affects just network shares; not removable drives.)

Or, from Internet Settings, go back just one screen (to Security\Local Intranet\Sites, but not the “Advanced” button). Unchecking the “Include all network paths (UNCs)” may be helpful.

Group Policy

Under “User Configuration\Administrative Templates\Windows Components\Attachment Manager”, “Do Not preserve zone information in file attachments” and “Inclusion list for low file types” and “Default risk level for file attachments”. (See: Seven Forums: Windows 7: Open File Security Warning - Enable or Disable, MSDN “We know IE!” blog : How to bypass the security warning "Unknown Publisher" with the checkbox "Always Ask Before Opening this File".)

SmartScreen

The term “SmartScreen” seems to refer to some protective technology by Microsoft, and is used by:

Windows SmartScreen

Windows 8

This may be altered by changing a registry entry.

Go to Control Panel. Get to “Action Center”, which may require choosing the “System and Security” section. In the left frame, choose “Change Windows SmartScreen settings” (which requires UAC permissions if UAC is enabled).

More info: Eight Forums: Windows SmartScreen.

SmartScreen Filter (from Internet Explorer)

Intelligent Message Filter

May be related to Microsoft Exchange: Microsoft Exchange IMF. Microsoft Antispam technologies has been quoted to say, “Based on SmartScreen technology, Exchange Server 2003 IMF provides” [protection].

Other perhaps/misc/related:

"HKCU\Software\Microsoft\Internet Explorer\Download\CheckExeSignatures"
data: "no"

Key "HKCU\Software\Microsoft\Internet Explorer\Download\RunInvalidSignatures"
data: 1

Under:
HKLM\SOFTWARE\Microsoft\Internet Explorer\AdvancedOptions\CRYPTO\
the CHECK_SIG\ may have checkexesignatures
and RUN_INV_SIG\ may have runinvalidsignatures