Troubleshooting Guide

Note: This guide started out as a few points jotted down, and then expanded. This guide should be reviewed later, and may be updated rather substantially.

New technicians often desire to identify some additional steps that may be taken for troubleshooting problems. This guide is mostly designed around some steps that can be taken to try to figure out the cause of a problem.

Knowing how to fix the problem might be an entirely different story, and might require additional knowledge and research. This guide doesn't necessarily provide all of the details for fixing problems. However, fully understanding problems is an important role in troubleshooting. Many problems can just be simply challenging to identify. Once the cause of a problem is identified, at least on a basic level, then that success can simplify the task of getting more details to identify how to properly fix the problem.

Winning the race isn't quite as important as completing the race in a fairly impressive (quick) fashion, while not having any problems (like injuries in a race, or lost data when troubleshooting computers). Knowing how to proceed is immensely useful. Experience will help a person to have a better feel for what situations are most likely to produce successful results. Good technicians will constantly be evaluating such ideas, and will think about whether a resolution will be more likely to be reached by changing a course of action. In some cases, some benefit may be achieved by completing a course of action which is unlikely to produce success, because then the results of the attempt don't need to be restarted (often from scratch) which can be even more costly in time.

Know that even some of the most skilled troubleshooters available may have different initial hunches, and sometimes one expert may be right sooner while another expert may be quicker at coming up with the right solution a different time. Coming up with answers quickly is a bit impressive. However, simply coming up with the correct answers consistently is much more valued.

General troubleshooting process
The first steps

The following are the first steps to take:

Determine the actual issue

One way to waste a lot of resources, including at least one person's time, is to fix a perceived “problem” that is not affecting anybody.

The primary reason that this must be done early on is so that resources do not get spent on attempts to try to take care of a different and less important issue. Such a problem is particularly bad if time is wasted on an issue that does not need to have time spent on it. Usually the most important issues are the ones that are actively bothering people (with “security” perhaps being the key exception, because that may be even more important). So, interact with people enough to really figure out just what is actually making people less happy.

Without accurately knowing what the perceived problem is, there may be know way to reliably know when an issue is resolved. Then, after making an adjustment, a technician should wrap up the troubleshooting effort by making sure that impacted people know the problem is resolved. However, if somebody is struggling with a different issue, and that issue is not resolved, then the technician may look quite unglamorous.

The right people

Consider who should be troubleshooting an issue. In many cases, the answer will be the same person who has been alerted to the issue. However, some situations may be treated in special ways by specialized staff members. If another person should be handling an issue, determine whether that person is available to be working on the issue, and determine what should be done. Even if a technician isn't going to personally work on an issue, alerting a person about a problem that they don't know about may be extremely helpful (or quite annoying, depending on factors like just what the issue is).

In some cases, certain issues may be considered to be more “sensitive”/“confidential”, and there may be certain specialized staff who are assigned to task of handling those issues. When a technician is determining what an issue is, that technician should keep in mind whether the issue should be getting resolved by that person. If the person figures out that a specific computer is related to the problem, and the technician knows that another staff member resolves all problems related to that specific computer, then fully figuring out the details of the problem might be inappropriate. That would be because fully understanding the problem might require obtaining information that should not be obtained. On the other hand, a technician might be able to gather some details that are not quite so sensitive, and those details may be helpful for the technician that does end up resolving the issue later. A technician should use wisdom/discernment to properly determine just how much information to gather.

In a team-based approach, consider whether other members of a team are likely to be working on an issue at the same time. For example, check if there are any other tickets that seem like they may be related (like they involve the same server, or perhaps the same customer). If there are related potentially issues, see if anyone else is working on those issues. If not, go ahead and start working on those issues. The first thing to do will be to make a note that other people will be likely to see if they decide to start working on the same issue or a related issue, so that they know that work is being actively done. (This might be done by posting a note on an electronic board, sending a message to a chat channel, or using software to change the status of a ticket to a state that indicates that active work is being done.) Make sure that people can also tell who is actively working on the ticket (which might be fairly easy, if logs indicates who either wrote a message or caused some other type of update).

Note: this guide recommends that the first two steps are to figure out what the problem is, and to figure out who should take care of the problem. There are some indications that CompTIA expects troubleshooting staff to start by fully identifying the problem. (If a test/exam by CompTIA asks about what step must be performed first, there is the official answer that is being sought.) In most common cases, that is going to be the correct first step. If CompTIA was probably just trying to focus more on the common cases, that would not be surprising. There may be some exceptions, which is why a second step is also mentioned by this guide.

Errors

Check any error messages that show up on the screen. (This may typically be a process that is less commonly used for programs that are not being interacted with.)

If at all possible, try to make sure that all potentially relevant information from an error message is captured. For short messages, that may be done by simply typing the information. Make sure to double-check the accuracy of any typed information which is easy to make unrecognized mistakes, like any numbers. For longer messages, a less time-consuming method may be to take a screenshot of the error message. (Taking a screenshot of the entire screen could also work.) Using a problematic computer to take a screenshot might not be a good idea if there is a suspicion that saving the data to the hard drive may cause further problems. In many cases, though, it is a good alternative.

Additional resource(s) related to errors messages may include:

Check logs

If the program creates logged messages, then check the log(s) that the program uses to record messages.

If those logs do not reveal problems, check some more generalized logs, such as logs used by the operating system. Such logs might indicate problems with hardware, or any other task that the operating system handles, such as denying permissions.

Checking other logs, like authorization logs, may provide details about why something is not working. (If a user's password is not being accepted, the authorization logs might indicate that.)

For more information about logs, see:

One thing to consider looking for is if an activity keeps repeating at a certain time, like 2pm every other Friday. If so, then the consistency in time is very often related to something else that consistently happens. For example, if an error in a log file keeps occurring at a similar time, check whether there is any sort of activity scheduled to occur at the same time. Modern systems will often have scheduled activities be started by a centralized scheduling method (cron in Unix, at in Unix or Microsoft Windows, or Task Scheduler in Microsoft Windows), or by a running service. There are, of course, other possibilities, like a person who sits down at a particular desk at the same time each day, or a person who unplugs a mysterious device from the wall every other Thursday (perhaps so the person can heat up some food, after which time the person dutifully plugs back whatever was unplugged). If the software on a computer doesn't cause the problem, other possibilities incluide other activities that are performed by other computers or by other actions people routinely do.

Communications

Attempt to verify if necessarily communications are working. Network troubleshooting may be needed.

Following are some notes that were made relatively quickly. There is another available guide, IP route check traffic, which might be largely redundant with this guide; content should be getting merged.

Some people prefer top-down, while others prefer bottom-up. Which works best can vary. If the problem is with a remote system, a quick “ ping 8.8.8.8 ” can quickly establish working network connectivity on the local subnet, and allow many troubleshooting steps to be skipped. This is often done by experienced technicians, but that test may end up providing very little useful information if the problem is on the local subnet. In that case, information might be gathered most quickly by starting with gathering network settings, and then trying to ping a device on the local subnet.

Experienced troubleshooters know about both approaches, and typically just try to make an educated guess about which approach may save the most time. The success ratio of that type of guess may be increased by using good listening skills, asking intelligent questions (like whether any co-workers also seem to be affected), and analyzing the sparse information that is initially available.

Attempt to verify that all necessary software is actively running, and that necessary hardare is functional (including being powered on).

If communications is not working, try to figure out why. One of the most basic tests is to verify what communications do work, and what communications do not work.

Using ICMP(v6)

The most common communications protocol used for testing network communications will typically be ICMP(v6). These protocols (ICMPv6 and ICMP/IPv4) can generally be used by using a command called ping, or perhaps a command called ping6 on some computers. This tool may be described more by the section describing ping.

Also, ICMP(v6) is used by the program called “TraceRoute”, which may be used by running a command called traceroute on some computers, and tracert on other systems. (The abbreviated name tracert is well known from MS-DOS systems where commands did not exceed eight characters in length. Newer Microsoft Windows systems may still use the same shorter command name, since continuing to use the same name provides compataibility and helps people utilize familiarity.)

Other tools may also be useful. See TraceRoute (and similar), which also discusses the mtr program and the PATHPING command.

Misleading results

Realize that some firewalls (including dedicated firewall devices, and “host-based” firewalling software) may prevent responses from ICMP(v6) packets. This practice is probably more common with ICMP/IPv4, rather than ICMPv6. When such ICMP(v6) packets are lost, that does not necessarily mean other functionality is also broken. For instance, an E-Mail server might respond to SMTP traffic, but not ICMP(v6) traffic. So, do not automatically assume that there is a problem just because ICMP(v6) failed.

Likewise, working ICMP(v6) does not guarantee that other communications work. If ICMP(v6) works, but another protocol fails, then programs may be experiencing problems.

Despite the potential of these results being misleading, the ICMP(v6) protocols can be quite useful. Although there is some potential for misleading results, these tests can frequently confirm that some types of communications do work, or confirm that other types of communications don't work. For instance, if ICMP(v6) communication to a router on a remote subnet is working, that strongly indicates that there is working functionality of all of the network infrastructure (including any wires or antennas) that is necessary for that functionality to work. Chances are that the problem is realted to a system on the remote subnet. On the other hand, if ICMP(v6) communication fails with other devices on the local subnet, and if such ICMP(v6) probably worked earlier, then there may be an issue with a device on the local subnet. An issue with a server on a remote network seems like a less probable problem.

The ICMP(v6)-based tests may produce clear results that have less clear meanings. However, in many cases, these ICMP(v6) can help by quickly making changes to the likelihood of certain types of problems. Strongly raising or lowering the probability of certain problems can make effective troubleshooting go much more quickly.

The other great advantage to this is software using ICMP(v6) is frequently available. The “server” code for these protocols is generally built into the TCP/IP stacks, which means that this functionality is typically enabled without requiring any effort except for making sure that firewall software doesn't prevent this from working. Even dedicated network infrastructure devices, like hardware-based firewalls and WAPs, may often provide a graphical configuration option (often using HTTP) which provides the ability to initiate ICMP(v6) connections and report results. Very old computers capable of TCP/IPv4 typically included support for basic ping and TraceRoute commands. This includes systems that ran Unix, BSD (which was made to be similar to Unix), and versions of Microsoft Windows released in the late 1990s (and newer). Many older operating systems did not support any version of TCP/IP in a default installation, but support for TCP/IPv4 could be added. When adding such support, basic “ping” and “TraceRoute” software was typically installed.

Things to remember

Do not make more changes than what is documented or memorized. Try very hard to not over-rely on memorization. The easy solution to this is to document things. Documentation can help to reverse actions that caused additional problems, and documentation can help to communicate what approaches have already been attempted. This may even mean communicating those details to the person who has been working on a problem. If no progress is being made and a person is feeling rather stuck, then reading about earlier findings may help to suggest a new course of action.

Problems can occur because of a problem with the sender, or a problem with the receiver. Problems can occur because of issues in between the computers (like cabling problems, or interference affecting airwaves.) Problems can occur with outgoing traffic, or incoming traffic. This is not just referring to the direction of the original packet: reply packets may be handled different and they might not reach their destination.

The most likely cause of a problem can often be narrowed down quickly by considering what is affected. If just one computer is having problems, but other computers are capable of performing a task, then knowing that fact reduces the likelihood of a problem with centralized infrastructure, such as a server that is used. Brand new equipment, which might not have been heavily tested yet, is highly suspect. Even if the parts did not physically malfunction, some sort of incorrect setting may be causing a problem.

Remain flexible. Just because something worked once doesn't mean that it still works. If seemingly impossible conclusions are being drawn, consider quickly performing some tests of prior results to verify whether identical results still occur.

Steps to check:
IP addressing information

Make sure that the system has an IP address that makes sense. A simple typo in an IP address could break things substantially.

Also check that the default gateway is correct.

While at it, check that the system is using a correct value for the subnet size. The setting for the subnet size may be called a CIDR “prefix length”. The setting has also been called a “subnet mask” for TCP/IPv4. In fact, for TCP/IPv4, that was the most common terminology.

Check communication with the subnet

Check communication with the device on the local subnet that is used for communication. If the desired destination is not on the local subnet, then all traffic will need to be routed, typically through a device called the “default gateway”. Figure out the appropriate “default gateway”. (There may be one “default gateway” for IPv6, and one for IPv4.) That “default gateway” must be on the same subnet. So, there should be some device on the same subnet to communicate with.

If communication with the default gateway works, that generally reduces the likelihood of some possibilities (and might even entirely rule out some other possibilities). Network cabling is likely good enough for at least some communications. An incorrect subnet mask might be possible, but a vastly incorrect subnet mask is less probable.

If communication on a subnet fails
Check link state

For details, see: media sense. See also: network “link light”.

Neighbor cache

If media state doesn't appear to be an issue, then check if the remote system's physical address (e.g. MAC-48 address) is being detected. This will indicate whether ARP or NDP requests work, which can sometimes be quite useful in substantially narrowing down the troubleshooting.

This guide uses the term “neighbor cache” to refer to both an ARP cache when troubleshooting IPv4, and an NDP neighbor cache when troubleshooting IPv6.

Checking ARP cache for IPv4 communications

For IPv4, this invovles running “ arp -a ” (which works for both Microsoft Windows and also root users in Unix).

In Unix, the arp command might not be in the path of non-root users. Note that checking the ARP cache does not likely require root access (although other functionality with the arp command might require root access). The issue is simply that the command might not be in the default path of non-root users. In that case, the following may work: $( sudo which arp ) -a ”. (In Debian, the command has been found in /usr/sbin/.)

Determing addresses at different layers, Neighbor addresses.

Note that the goal is not just to see if there is a cache. The goal is to make sure there is a cached entry. First, make sure that the IP address shows up in the neighbor cache.

If NDP fails or ARP fails

First of all, double-check that the destination system is on the same subnet. If not, then NDP neighbor detection and ARP are expected to fail. However, if they are on the same subnet, then the relevant protocol (NDP neighbor identification for IPv6, and ARP for IPv4) should be working.

One approach that usually causes no harm, and might fix the issue, is to flush the neighbor cache. (This may be especially likely if there are intermittant issues.) First, determine if there is any old information that is incorrect (including corrupt information, like showing a MAC address of ??-??-??-??-??-?? instead of hexadecimal digits). If so (especially if the information is showing an incorrect hexadecimal address), record the old information before flushing it away. That old information might be helpful in tracing down some other problems, like an IP address conflict.

How to flush ARP

Flushing the ARP cache is not likely to affect IPv6 communications. Do this when trying to troubleshoot IPv4 communications.

For IPv4, the neighbor cache may be flushed by using “arp -d badInfo”. The word badInfo” represents a part of the command line that must to be customized in a way that is about to be described.

For Microsoft Windows, this is often easy: try just using an asterisk. So, “arp -d *” may work well.

For Unix, using an asterisk is not recommended, because the shell will likely interpret that as a wildcard that represents filenames. Instead, an ARP flushing might look something like: “ sudo arp -d 192.0.2.1 ” Note that root-level access may be needed for this. There may be a way to delete all ARP entries, if desired. That way might be to just leave off the IPv4 address, or might also require using the -a switch. Don't hesitate to do a quick check of the man page. e.g.: OpenBSD manual page for the arp command.

After flushing a neighbor cache, try to use ping again.

If flushing the neighbor cache did not fix the problem

Well, flushing the neighbor cache is usually a quick process, so it was worth a shot.

Realize that all IPv6 communications to a system will require NDP neighbor detection, and similarly all IPv4 communications will require ARP to be working as expected. Until the correct “physical address” (e.g., MAC-48 address) shows up, typical communications will not be working.

As an example of what does not need to be troubleshot: The problem is not related to firewall settings that block only IP communications. The broken communications appear to be a problem that is actually unrelated to IP. (IP occurs at the third layer of the OSI Model. The neighbor discovery (NDP or ARP) occurs at the second layer of the OSI Model.) The reason that the remote system's firewall is not blocking an incoming ICMP(v6) packet is because the sending system needs the neighbor cache to work in order to be able to properly send the IP packet. So, if the packet is not being properly sent, obviously the packet doesn't get received. If firewalls are an issue, that is because the firewall is affecting basic neighbor discovery (NDP or ARP), or lower layers of networking communication. Since such firewall configurations are fairly rare (because firewalls more commonly drop traffic based on IP protocol details), physical connectivity issues are more likey.

One of the most common reasons (that people do not tend to instantly think of) is incorrect network settings. Having the wrong size of a subnet might be affecting communications, and asking for the wrong IP address could cause things to not work as expected. Make sure that both devices have correct IP addresses and subnet sizes. Unix users may wish to make sure that the interface is “UP” (as reported by ifconfig).

Other than that, the most common issue is typically an issue with the “physical” layer of networking. This could involve cabling problems, or issues with an antenna (which could be physically broken), or issues with airwaves (which might have some sort of interference). Really check those settings, and consider replacing cables if other troubleshooting isn't resolving the issue.

For those who are familiar with the OSI Model, troubleshooting efforts should be focused on the Layer 1 and Layer 2 levels. Those issues must be resolved before the Layer 3 communications may stand a chance.

Old text

This communication is quite useful. For IPv4, if ARP requests don't work, then what is also likely to be broken is any communications requiring Layer 3 or higher. Similarly, if NDP is not able to provide the needed MAC addresses, all IPv6 is likely to be broken. In such cases, all troubleshooting efforts involving IPv4 or IPv6 addresses may cease until these Layer 2 addresses are able to be seen.

If flushing the neighbor cache did fix the problem

Great! Almost.

The problem is, this sort of problem is frequently one that repeats itself. So, try to address that possibility.

Try to determine what caused the issue. If a system changed its IP address to use an IP address that was previously used by another device, then there is actually a good chance that the system might actually be totally resolved.

Otherwise, try to check if there is an IP address conflict. In theory, that could be done by sending out an ARP/IPv4 “WHOHAS” request and checking for multiple responses. If those responses have different MAC-48 addresses, then there is probably a IP address conflict. If those responses have the same MAC-48 address, then there are either two (or more) devices using the same MAC-48 address (which can, but never should, happen), or (probably more commonly) traffic is being duplicated (perhaps due to a switching loop).

In practice, though, checking for multiple responses to an ARP/IPv4 “WHOHAS” request might not be super trivial to do. An alternative, which might unnecessarily introduce downtime (and so may be undesirable), but which might be faster (and so may be desirable), is to change the IP address that is suspected to be duplicated, and then flushing the neighbor cache on the other system, and then trying to ping the address. See if the neighbor cache gets information. If so, then an IP address conflict has been pretty well confirmed. Fix the issue (by changing the IP address of the other machine, or removing it from the network). After the issue is fixed, do remember to end the downtime by making sure that the first device has its IP address returned back from the temporary address to the desired value.

If neighbor cache has the right information

If the communication involves using TCP port, then make sure that the receiving system is listening on the appropriate TCP port. Likewise, if the communications involves using UDP, then make sure that the receiving system is listening on the appropriate UDP port.

If no software is listening on the expected TCP port or UDP port

Try starting the software. If no error messages appear, then check if the problem is resolved. If the problem is not resolved, look for logs, and double-check configuration settings.

If software is listening on the right TCP port(s) and/or UDP port(s)

Check to see if network communication may be getting blocked by firewall software. Check if network communication may be getting blocked by anti-malware software.

Having the wrong program listening to the correct TCP port number or UDP is a less common error, but is a possibility that is sometimes seen. Verify that the program listening on a port is the right program. See: identifying which program is using a TCP or UDP port.

If communication on the subnet works

If communication to the destination device is working, then the issue may be resolved. If the issue is that only certain types of communication are failing, then that is a different issue. In that case, at least basic communication has been confirmed, which rules out several possibililities (like a physically broken cable).

Try communicating with a remote subnet. For example, try sending ICMP(v6) packets to one of the well-known public DNS servers. (There is no reason why the remote system needs to be a DNS server, because this test should not be relying on the results of DNS communications. However, the well-known public DNS servers typically do respond to ICMP(v6). Also, many of them have relatively easy-to-remember IP addresses.

If communication with a system on the Internet works, then a technician can realize that “Internet access” is working. A lot of people assume that DNS is part of a working Internet setup, so do check that as well. If DNS also works, then there's really no reason that “Internet access” should be getting blamed. Either the issue has been resolved, or there is actually another issue. Figure out what program the end user has actually been having a problem with.

If a remote system has issues

If the problem appears to be an issue with a site that is not on the local subnet, try to determine what subnet is having the problem. One program to do that is TraceRoute. Starting a new command prompt window, or using a terminal multiplexer, might be handy because the most important results of TraceRoute will probably be figured out substantially before the completion of a test.

Run TraceRoute, and wait until there are two rows of asterisks. Then, copy the results and paste them in a document that may be referenced later. Also, sharing the document might be desirable (perhaps to help someone else with troubleshooting). Usually once there is a single row of asterisks, all the remaining rows will also be asterisks. However, in some fairly uncommon cases, a network device might not respond to TraceRoute traffic but might relay such traffic.

Once asterisks start appearing on the second row, consider moving on with any other troubleshooting steps that might remain. When the TraceRoute does actually complete, make sure the document is updated with the full results. If there are many rows of asterisks, consider removing many of the redundant rows (perhaps by replacing them with an elipsis).

There may be a desire (perhaps especially if the problem looks like it is a third party site) to repeately run TraceRoute. Actually, some alternate software might be the nicest way to handle that. Check out mtr, which might involve downloading a program.

Figure out the last site that is responding to the TraceRoute traffic. If that equipment is controlled by you, then try to perform troubleshooting by using tools (like ping and ARP/IPv4 or NDP/IPv6) from the connection from that device. If that device is not owned by you, but if you control other equipment (like the subnet at the destination), try also troubleshooting communication from the remote end. That troubleshooting effort may end up creating a similar document of TraceRoute results. After gathering all of the data that is quickly available and likely to be useful, try to contact the organization that runs the equipment in the middle. Alternatively, contacting a local ISP might be another option, as they might be willing to do the work of coordinating with the other Internet providers to help make sure that successful Internet communications can occur.

When other organizations get involved, troubleshooting can take a bit more time. Also, getting status updates can be a greater challenge. There may be a length of time where no progress is apparent. Do figure out which company seems to be dropping the ball, and keep contact information (like phone numbers) handy. If speaking to someone over the phone, make sure to ask for a technical tracking ticket number, if one is available. Keep that number handy, next to the technical tracking ticket number. Big organizations do sometimes fail to provide the communications that people would like to see working, but they are usually on top of problems well enough that the problems don't commonly last for more than a day. When problems do last that long, the big companies typically are able to identify the problem and a potential solution, and so they can provide people with an estimated time of repair.

OLD TEXT
However, some systems might block such communications Determine the IP addresses of the default IPv6 gateway, and the default IPv4 gateway. Determine whether works. , see which ones of these types of communications works:

Searching for answers
Consider the source

Perform a search using a major Internet search engine. See what information comes up. Do not necessarily trust that the information is accurate, unless the information comes from a reliable source. (The Techn's section on the ][CyberPillar][ website tries to be pretty accurate. RFC documentation is generally pretty high quality. Official technical documentation from the vendor that created software is often useful information. The ][CyberPillar][ list of sites lists several sites, most of which are quality sites, like the sites related to the organization named “Stack Exchange”.)

Do not just blindly trust all information found on the Internet! Even some inforamtion on vendor websites might be questionable. For example, Microsoft has had a site called “social.microsoft.com where people can ask questions, much like a user forum. Information from that section can be useful, and can also be unreliable. So, feel free to review such information, but realize there may be a high chance that an unqualified person may provide wrong information.

If an internal help database is available, check that. For example, if a “help desk” uses a ticketing system, perform a quick automated search through prior tickets to see if key words show up in those tickets.

There are some resources that may describe some types of situations that may be fairly common, and provide generalized techniques that may be helpful for many of those situations. For example, ][CyberPillar]['s guide to handling crashes might be helpful for some times of situations. The even-more-generalized topic of ][CyberPillar]['s section about troubleshooting is even more generalized. People looking for more help with troubleshooting may wish to check whether any of the sections seem related to an issue which is still unresolved.

Trying additional resources

Check ][CyberPillar]['s Troubleshooting section.

Determine if ][CyberPillar][ contains other information related to the current topic. To do this, perform a Google search for:

“site:cyberpillar.com topic

(Do not include the quotation marks. Do replace the word “topic” with a customized string describing what is being searched for. For example, searching for “site:cyberpillar.com DHCP” should bring up a list of pages that mention DHCP.)

Longer shots

Some of these techniques might not be worthwhile to try right away. Instead, getting more specialized expertise might be more useful.

Checking very unrelated logs is less likely to find a resolution, although there might be some potential that such logs might reveal something. For example, logs of anti-malware software might show that some activity has led to some actions that might be causing problems. If the issue does not appear to be malware-related, then this may be a long shot.

Creating a summary

Prepare to discuss the likely impact of any existing problem. In production environments, a key thing to consider is how are customers likely to be impacted by any ongoing problem. Will other technical staff members be likely to be affected by the issue?

If none of the prior steps have clearly led to a situation be resolved, prepare a question. Determine what problem or issue is existing, what steps seem likely to be helpful in resolving the issue, what significant changes have been made so far, and what resources look like they might be helpful in quickly determining a situation.

Also consider what impact(s) are likely. If the problem description accurately describes the likely consequences, then there may not be any further commentary that is needed. (For example: for a sitaution described as “all file servers are offline”, then that problem could have major consequences. Mentioning that this breaks some other software might be information that is easily implied, and so is not necessary. A ticket like “microwaves are disrupting Wi-Fi communications in the cafeteria” might sound like it is describing a nuisance. However, an added note that “the credit card scanner requires wireless transactions” will help other staff to easily notice that this could be leading to a loss of sales, and so may be an issue that is extremely important to some people.

Submitting a help request

Determine the appropriate course of action. In a highly interactive classroom, this may involve raising your hand to locate help. If a system is used to keep track of technical tracking tickets, then the appropriate course of action could involve updating an existing ticket, or creating a new ticket.

Also, consider the urgency of the issue.

Determine who should be informed of an ongoing problem. The answer might be a supervisor, an external organization's support team (like a vendor's official support), a customer, or multiple of those types of people.

If the issue is of normal importance, then updating a ticket may be appropriate. However, if the issue is actively causing significant problems, such as costing a company substantial amounts of money, then the most appropriate response might also involve pointing the problem out to a supervisor. This might involve walking to the supervisor's office if the supervisor's door is opened, sending an E-Mail to the supervisor, knocking on a door to get the supervisor's attention, or interrupting a meeting and informing the supervisor of the critical issue. Which of these approaches is most appropriate may depend on factors such as the supervisor's general preferences about getting contacted, how critically urgent the issue is, and (in the last example) how important a meeting is.

Finally, another issue to consider is how much time to spend on a task before getting a supervisor involved. The desired length of time can vary significantly. Some organizations might not want supervisors to be bothered for issues that get resolved in about half a day, while others may recommend that technicians should have a task be reviewed by a supervisor if it looks like more than 60 minutes or time will be spent on a project, and other organizations might place that threshhold at 30 minutes, and other organizations might think even that is too long. There are a lot of possible options that may vary significantly between different organizations.

Additional information:

Wrapping up issues
Resolving the issue:
Ensure resolution

Make sure issue is resolved.

Just because one thing is fixed, does not mean that everything is fixed. Consider all directly and indirectly affected systems. Think about what methods that can be used to verify that those systems are working. In most cases, verification is rather feasible: required equipment is available to be used, any required people are reachable and available, and verification is fairly quick. So, in such cases, verify that the systems are working.

Communications about resolved issues

Try to make sure that users who have experienced the problem know that the issue is resolved. If multiple people were affected, this may mean contacting another person, like a supervisor, who will relay the message by whatever method the supervisor prefers. Many organizations do not trust many of their non-supervisory employees to relay messages, so do be careful about relying on non-supervisors to be relaying messages to other employees. (Proper communication should occur. Pinning this task onto the wrong other person may be considered to be inappropriate handling of that responsibility.)

Remember: Dissatisfaction among end users is undesirable. Therefore, checking with an end user is typically a step that should be performed in order to make sure that an issue is resolved. There may be exceptions, such as when an end user needs functionality but the end user is unavaialble and so will not able to receive the communication. In most cases where there is a problem that one user has reported, effort should be made to make sure that person agrees that the problem is resolved.

A common situation of this not being done is when a technician has identified a technical problem, and fixed the technical problem, and believes that the issue is resolved. However, the end result might look a bit different than what an “end user” expects. Therefore, technicians should verify that “end users” understand that the issue appears to be resolved, and attempt to determine whether the “end users” consider the end results to be satisfactory. If the “end user” is not satisfied with the end results, then the technician should determine what options may exist to satisfy the “end user”.

Documentation

Document what happened. This documentation can be useful in many ways:

  • If a problem re-occurs, then remembering what tactic worked may be useful. This may be particularly useful in case a different approach isn't working. Simply remembering, “I know that I fixed this once before, but I what actions led to easy success last time” is not very useful.
  • Documentation might even be useful for resolving different, but similar, problems. For example, if a problem on one computer ends up happening on a different computer, then the problem is a bit different, but there may be similarities that allow the documentation to be useful for troubleshooting.
  • If anybody disagrees with what was done, having a documented experience can be useful. Many people (possibly including yourself) might be more ready to believe the documentation that you made at the time, rather than how your memory perceived an incident months later.
  • Documentation can help show your worth. The fact that a technician appears to be working hard does not prove that the person is effectively accomplishing much. Many people may think that a relaxed and unstressed employee does not appear to be working very hard, and having a record of accomplishments can help to offset that impression. Having documented information can be useful when trying to submit facts that might be useful when new policy, including an updated budget, is getting decided.
Clean-up

Consider what temporary resources may be in an unideal state. For example, if there was a screenshot taken of an error message, that screenshot might not be needed anymore. The file will just increase disk space by a negligible amount, and the time required to back up a disk by a negligible amount, but also increases clutter and the risk that someone might waste time trying to figure out whether that file contains any important detail. Depending on what is contained in that temporary information, there may be some potential of a security risk. Even if that risk is small, there is no advantage to keeping a completely unnecessary risk which provides no benefits.

Perhaps some effort should be applied to resolving some other sort of temporary situation, like a workaround or unplugged equipment (because the equipment wasn't working anyway). If notes have a section describing situations that should be temporary, then resolve the remaining work.

(If the remaining work is rather insignificant, but will be time consuming, consider whether that work should be split off as a new and minor task. Doing that might allow a more significant task to be categorized as being handled, so that a larger task does not need to keep being tracked as an active and ongoing issue.)