Handling Data

Handling directories/folders
[#mkdir]: Creating directories/folders

mkdir dirname works in both Unix and DOS. In DOS environments, a shorter md command, built into the command line interpretor, does the same thing as the internal mkdir command.

Creating subdirectories

In some cases, the mkdir command might not make a subdirectory of a directory that does not yet exist. In this case, a Unix command like mkdir a/b or a DOS command like mkdir A\B won't work until the A directory already exists. One way to work around that is to make sure the needed directory exists first. However, there may be a way to get the mkdir to just make all the required subdirectories.

With Unix, try using mkdir -p a/b

With Windows XP, the bundled CMD.exe file's internal mkdir supports “command extensions”, which adds features to some internal commands. One added feature is mkdir can make all subdirectories as needed. Command extensions are likely enabled by default, and may be manually enabled by running “ CMD.exe /e:on ” (or they may be disabled when “ CMD.exe /e:on ” is used). If that parameter is not specified, the command shell looks for a REG_DWORD registry entry named EnableExtensions under Software\Microsoft\Command Processor under either HKLM or HKCU. A value of one means the command extensions are enabled, and zero means they are disabled. (This is documented by MSDN: XP Pro Product Documentation.

Some implementations may support a parameter of s. For instance, JP Software products allow using mkdir /s a\b.

[#rmdir]: Removing directories/folders

rmdir dirname is one option.

This command might only be able to remove empty subdirectories. In DOS environments, a shorter rd command, built into the command line interpretor, does the same thing as the internal rmdir command.

Commands to remove files might also have the ability to remove directories (including subdirectories that may have content in them).

Referencing the Microsoft Windows platform, McAfee advice on removing a COM4 directory discusses the effect of a piece of malware that created a directory that starts with the name “com4.{”. The solution which worked was running: “ rd "\\.\%appdata%\com4.{241D7C96-F8BF-4F85-B01F-E2B043341A4B}" /S /Q ” (Further discussion)

[#lsdir]: Listing contents of directories (a.k.a. folders)
Basic display(s) of files
Unix

ls which is often used with a -l parameter to cause a long view.

Note that filesystem entries (files and directories) which start with a period may often be hidden from the default view. This may be worked around using techniques described in the section about Unix asterisks. (Users familiar with other operating systems should read the warning in the section about Unix asterisks.) Alternatively, there may be command line options to help show those directories. Using “ ls -la ” is one method. This will most certainly include the directories . and .. which are rather pointless to report since they are expected. Using “ ls -lA ” may show all filesystem entries, including “hidden” content, except for those two relative directories.

Another option which can be nice to use is -F which causes special markers to appear next to the name of every directory, executable file, and at least some special types of filesystem objects (such as, for example: symlinks).

To search for a specific filename, simply place the filename at the end of the command line. e.g. ls -laF filename. The ls command does not (need to) support wildcards: the shell will simply take care of those. (So typically wildcards may be freely used and will work as expected, because the shell takes care of them.)

If a directory is provided on the command line, then the ls will usually list the non-hidden contents of the directory (similar to if “ ls dirname/* ” was run). For the DOS-like behavior of actually showing the directory, instead of its contents, add the -d command line parameter. (e.g. “ ls -Flad dirname/ ”)

DOS

DOS: dir /a shows files, including those with the hidden attribute. This command does support the common DOS wildcards, so listing all files that start with the letter Z may be done with dir /a z*. Like other commands in DOS, this is case-insensitive. So, newer operating systems that support lowercase filenames will have those filenames match even if the wrong case is specified.

Users of MS-DOS and newer operating systems like Microsoft Windows 7 may find that the dir command does not support nearly as many command line parameters as some of the newer versions of the command. Such users are encouraged to find the legal freeware version of 4DOS.

Microsoft Windows

One option is to use the command line, and run the same sort of command that DOS would show. (There may have been some historical issues, with Windows 2000 and NT, with the command line showing directories and/or filenames that had names that would not be valid in DOS. Running CMD instead of COMMAND might help.)

For a graphical approach, Microsoft Windows generally/always comes with a program called File Manager or Explorer (a.k.a. “Windows Explorer”; this is different than Microsoft's “Internet Explorer”).

Cisco IOS

First, the standard Cisco IOS warning is worth mentioning before people delve into plans to try using this option.

(Similar to other Cisco IOS commands, typing “fsck ?”) ought to be an option. That is discussed more in the Cisco IOS basic usage guide.)

Perhaps dir or show, as shown by Cisco IOS file handling: viewing files.

(Similar to other Cisco IOS commands, typing “dir ?”) ought to be an option. Getting help from the operating system, like that, is discussed more in the Cisco IOS basic usage guide.)

[#lssubdir]: Listing subdirectories
Unix
ls -R will show non-hidden filesystem objects. ls -a -R will show “hidden” filesystem objects as well. That simply means filesystem objects (both files and directories) that have a filename that starts with a period.
DOS

dir/s (or, perhaps for older versions of DR-DOS, xdir/s) will show what is using a single directory, including all subdirectories. To see hidden files, don't forget the /a switch, e.g. dir/a/s

[#customls]: Customizations to the display of file listings (that show what files are in a folder)

See also: lswcolor (which may be a bit duplicated with information here)

For commands that are run from a command line, there may be additional options. (Try running the command with a -? or /? or --help or -h parameter, or check standard documentation such as using the man or help or info commands.)

Unix

ls -F (or something similar, like ls -lF for a long format) may show additional information about file system objects that are not standard files with standard permissions.

Colorized output is also an option. This may require using third party software, or the built-in software may have some options for displaying color.

FreeBSD's ls command supports a -G option which causes the program to be able to use environment variables named CLICOLOR, CLICOLOR_FORCE, and COLORLS. OpenBSD Ports: “colorls” added similar functionality.

One option, which may involve using “third party” software that is not considered to be part of the core operating system, may be gls (which may be in a package called gnuls). In other operating systems, the GNU ls command may be the standard ls command. (The GNU ls command may frequently be getting included as part of a package named “coreutils”.)

Another option may be a third party program called colorls.

Perhaps see also: man dir_colors(5), man dircolors(1)

The remainder of this section may need review.

http://blog.wompom.org/index.php/2008/01/26/ls-color-considered-harmful/ calls it useful but harmful, has applicable warnings.

Perhaps See also: lswcolor (which may be a bit duplicated with information here)

Also, a comment says: Search for Ubuntu man page related to dircolors shows various versions, and offers information about files: /etc/DIR_COLORS and ~/.dir_colors are for “(Slackware, SuSE and RedHat only; ignored by GNU dircolors” “and thus Debian.” The /etc/DIR_COLORS file is described as a “System-wide configuration file.” The ~/.dir_colors file is described as “Per-user configuration file.” Also, dircolors -print-directory doesn't work on Debian.

DOS/Windows
DOS

In DOS, the %DIRCMD% environment variable may be set and then used by default. (This is one of the rare cases where, for seemingly no super compelling reason, the default behavior of JP Software products are incompatible with the command line interpretor built into the mainstream operating system that the JP Software product is designed to work in. JP Software's online help for DIRCMD discusses this topic and provides a workaround using something such as such as alias dir=`*dir %dircmd%`.)

For DOS and similar type platforms, the dir command is an excellent demonstration of the power of JP Software's products. While a newer CMD.EXE (using Windows Vista's as an example) may have 15 configurable switches, and older versions of “CMD.EXE”/“COMMAND.COMpresumably may have less, JP Software's advanced dir command may have 30 command line switches. For those looking to cram maximum functionality into a small space (particularly if using DOS and boot floppies), including one JP Software product (especially if using executable compression) may provide more functionality in less space than the amount of space taken by using a standard command line interpretor plus additional software that JP Software's product can effectively replace, such as the operating system's XCOPY*.* file(s).

JP Software productions also have the ability to show a Description of files. The description gets stored in a file called DESCRIPT.ION and uses a very simple format: a filename, and then a space, and then a description. (To be able to describe files that may have a space in their filename, the line could start with a quotation mark that matches another quotation mark after the filename.) It was extraordinarily rare for software to come with these descriptions, but they could be custom-made with ease using the built-in describe command. On platforms with long filenams enabled, these descriptions might not be shown unless the /Z command line switch is used. ((The file is also used by Oliver Fromme's QPNG/QPV/QPEG? The graphics viewer uses an ASCII character which allows data to be stored in a way that won't be viewed by JP Software products.)) (It would be nicer if these descriptions could start to be automatically extracted from certain known formats of files, like Windows executables.)

Perhaps some versions of DOS may have an xdir command?

Windows

It is popular among competant technicians to let files and folders be visible. To stop hiding things like file extensions: Go to Windows Explorer, and press Alt-T to select the Tools menu. Then select “Folder Options...”. From there, choose the “View” tab. In the “Advanced settings:” area, in the subarea called “Hidden files and folders”, change the default from “Do not show hdden files and folders” to “Show hidden files and folders”. Also uncheck the next checkbox(es) near the next option(s) about hiding things (such as extensions for known file types, and protected operating system files).

FTP clients

The FTP protocol allows for the NLST and LIST commands. FTP software may allow such commands to be sent using an internal SITE command. The FTP protocol's LIST command might often be supported by clients using a “dir&rdqo; command within the client. The FTP protocol's LIST command might often be supported by clients using a “ls&rdqo; command within the client. However, client implementations may vary. An FTP server's resposne to the FTP protocol's LIST command may vary in format. The typical way that graphical clients (which typically show individual filenames) handle this is to try to support multiple formats of file listings that servers will commonly output, and generally to automatically detect the format of the file listing text being output. Related RFC information may include RFC 959: FTP protocol (page 32) and the next page, and RFC 1123 page 31 (section 4.1.2.7).

[#pwd]: Reporting the command prompt's current directory
In Unix
Methods that work
  • pwd
  • With some shells, the following may work: echo $PWD
A method not to try:
Using cd with no parameters. An implied parameter (of $HOME and/or ~?) may send the command prompt to the user's home directory. This approach, which does not work in Unix, is mentioned because the same approach does work in other operating systems. If this is accidentally attempted, then using “ cd $OLDPWD ” may have a desired result.
In DOS
Running the cd command with no parameters should report the currect directory. Many people also like to use a prompt of $P$G. (It would admittedly be nice if a hyperlink at this location pointed to some further details.)
Handling files and the data within files

This is largely covered in a subsection: handling files and their data.

Overview of files
Describes a data stream and metadata
Copying files
...
Removing files
...
Moving/Renaming Files
...
File links (Symbolic links/juctions)
...
Filesystem attributes/permissions/ownerships
...
Backing up, restoring, and verifying data
...
Modifying data
Editing/copying a binary file
Modifying/Editing a text file
Viewing files
...
Working with lines of text
...
Comparing files
...
Patching Files
...
Appending Files
...
Handling data on storage devices
Types of data storage “devices”
A local device

e.g. fixed disks and removable drive/media.

Network share/drive/location/site

See: filesystem provided over networking (and perhaps also the section about file transfers).

“RAM drive”/“memory file system”

Information about using a “RAM drive” (also known as a “memory filesystem”) is at: filesystem formats: “RAM drive”/“memory filesystem”.

[#dsklaout]: Disk layout

MBR-based partition scheme/layout and similar such things.

Filesystem operations
(This section may cover creating, defragmenting, checking/scanning for errors, etc.)
[#mntpoint]: Mount Points
See the section about Mount Points.
[#optmzvol]: Volume optimization

This may sometimes be referred to as “disk optimization”, although more specifically the process involves optimizing a filesystem volume/instance.

[#defrag]: Defragmentation

Fragmentation can lead to disks being unnecessarily slow. A huge factor in whether fragmentation is even likely to be a noteworthy issue worth being concerned over is which filesystem is used. For instance, FAT16 in DOS (and FAT32 in Windows 95) may have been rather prone to speed degradation caused by heavy fragmentation. Other filesystems and/or implementations (filesystem drivers) may be designed to try to resist having noticeable fragmentation. An example of how this could be done is for the software to identify how much data is likely to be written, and to use a large chunk of free space when writing data.

As an example, OpenBSD's manual page for tunefs notes that in many cases, “fragmentation is unlikely to be problematical”. (Err... [sic] as that should probably be the word “problematic”.)

A forum post about defragmentation states that Linux filesystem volumes “get fragmented (though admittedly not *that much* as in windows). They key point are I/O schedulers, that reorder operations in a smart manner. So the whole point is not the lack of fragmentation, but the smart scheduling of the I/O which prevents all the stress associated typically to fragmented devices making the performance penalty pointless. However this can vary from fs to fs. Reiserfs has known problems with fragmentation, but the rest of the fs's should be fine unless the disk is almost full” ... “Even fat will perform ok on linux, unlike windows xp and” at least “previous versions” of Microsoft Windows.

Some operating systems really may not come bundled with software that significantly helps reduce whatever fragmentation does exist. At least for many such systems, it is possible the fragmentation does exist. There may be ways to reduce such fragmentation, such as copying all of the files to a new hard drive (and copying them back, if desired), as such a copying task may be designed to try to minimize fragmentation on the destination drive. Some programs may exist to help implement such strategies. (One may be named shake. Perhaps xfs_fsr being another? ArchLinux bugs report mentioned possible corruption with xfs_fsr usage. Perhaps defragfs may be helpful.) However, such techniques may often not be worthwhile. Usually techniques other than running specialized software are practices that are widely considered to be unworthwhile.

The benefits that may exist could include a reduction in lost speed (so, if there was quite a bit of speed being lost, this could cause a speed increase, although there may be very little impact if there wasn't much speed being lost in the first place), and perhaps some reduction in a drive's wear and tear during regular use. (However, defragmenting itself does cause some wear and tear, so defragmenting frequently could certainly offset that benefit.) For many/most popular filesystem types, fragmentation generally does not cause any loss of available space, and so defragmentation will not result in regaining any space that has been lost.

Here are some options that may exist for various filesystem formats:

Defragmenting FAT
(Information used to be here, but has now been moved to: FAT: Defragmenting.)
Defragmenting NTFS
(NTFS: Defragmenting.)
Defragmenting Ext2

Forum post says that the e2defrag “program is dangerous to use and any attempts to use it should be stopped.  It hasn't been updated in such a long time that it doesn't even KNOW that it is dangerous (i.e. it doesn't check the filesystem version number or feature flags).” A different forum post describes requirements that would be needed for a safe and useful program that really does a decent job of defragmenting data.

See also: ext2 Filesystem volume fragmentation.

FFS(2)/UFS(2)
[#ffsdefrg]: Adjusting how fragmentation may exist on FFS(2)/UFS(2) drives
Adjusting how fragmentation occurs on FFS(2)/UFS(2) drives

In addition to trying to re-organize data that is already on the disk (which is a process that is often called “defragmenting”), filesystem drives may be able to make some choices on how to place new data on the disk. With FFS, it seems this may be controlled by using “ tunefs -o space driveName ” (instead of the default of “ tunefs -o time driveName ”). The OpenBSD's manual page for tunefs (in the section about “ -o ”, which takes a parameter that has been named “optimize preference”) states that in some cases, “fragmentation is unlikely to be problematical, and the file system can be optimized for time.” (The manual page references “minfree”, which is related to the “ -m ” parameter to the tunefs command.)

(The tunefs command affects FFS drives in BSD, while the tune2fs command from e2fsprogs affects Ext2 drives in BSD. The tunefs command affects Ext2 drives in Linux-based solutions. These programs are mentioned further in the section on adjusting/tuning filesystem/volume parameters.)

Defragmenting OpenBSD FFS

Presumably this would also affect FFS2, UFS, UFS2 (and be able to be used in other BSD operating systems). If such capabilities were not available in an original version, such abilities would probably be very easy to adapt into a subsequent release.

udefrag site on code.google.com says “OpenBSD FFS defragmentation tool. (Thesis)” Perhaps this is simply a research project? It does appear that code is downloadable; see http://code.google.com/p/udefrag/source/checkout for some details.

FFS in NetBSD
Linux Forum Post: FreeBSD/OpenBSD UFS/FFS Defrag
NetBSD Wiki: Defragmentation for FFS
GSOC 2013 proposal by Manuel Wiesinger (a.k.a. “meadow”) for defragmenting FFS (for NetBSD) Defragmenting FFS (in NetBSD) by Manuel Wiesinger states, “As time permits I may continue to work on this project, so that it can do online defragmentation. &nsbp;But this is not part of this year's Google Summer of Code.” The phrase “online defragmentation” likely simply refers to defragmenting a disk that is mounted (and really has nothing to do with computer network communication).
More generalized information

(Speaking rather broadly, this information might be rather generalized, meaning that the information is not just being specific to any one single type of filesystem.)

Some further options may be mentioned by Wikipedia's article on Defragmentation: “Approach and defragmenters by file-system type” section, Wikipedia's list of defragmentation software, Wikipedia's comparison of defragmentation software.

Other optimizing

Optimizations other than just fragmentation may exist, such as trying to place the most commonly used files (like executable files) first in a listing of a directory's contents. Such techniques might be able to be implemented using options with software that supports defragmentation.

Other examples from the Win98 era: Windows 98 Resource Kit Chapter 10: Disks and File Systems mentions WinAlign program, and there is the more limited WAlign program that optimizes just Microsoft Office 7 for Win95 and Office 97. The programs are discussed further in MS KB 191655. This KB says it applies to “Microsoft Windows 98 Standard Edition”.

It seems very likely that these techniques may be ineffective (and, at least theoretically, even detrimental) depending on how the operating system handles the data on the disk. A change in the disk cache implementation could change the effectiveness of the techniques. (Perhaps that is why the KB article about WinAlign applies to just Win98: Standard Edition, and not also Win98SE.)

Interacting rather directly with disk sectors
Searching disk sectors

Possibly useful for data recovery.

Unix can do this by using grep on the device object that is mounted.

Editing disk sectors

Editing the contents of a disk sector, or perhaps copying some data (such as the data stream of a file) to the block device (at a certain offset) with dd

It is hereby acknowledged that this section does not currently have a lot of information about editing disk sectors. Support for such a thing might be available by some hex/“binary file” editors.

Otherwise, an option (which should technically work, although it may be rather cumbersome and so unlikely to be very worthwhile) is to use a binary file editor as needed to make a file that has the desired contents, and then to place that file's contents into the desierd spot on a disk by using dd (and relevant options, like “seek”).

[#wipemdia]: Wiping media
[#rcovrwip]: Recoverability of data that's been wiped

If data is forensically recovered simply by using software, that would imply that the device hadn't been sufficiently “zeroed out”, perhaps because of data that failed to be wiped.

Data wiped with software can often be recovered through specialized hardware. The way that data gets stored may have some physical traces that help to indicate what the previously stored data was. A paper called Secure Deletion of Data from Magnetic and Solid-State Memory describes that a bit. Preventing such techniques from recovering data generally involves physically destroying the unit.

A post to Usenet, dating back to July of the year 1991 A.D., discusses this. A question was asked: whether the recommended method to erase data was to overwrite with all bits cleared to zero, then all bits set to one, and then some sort of randomized method. David Hayes, a former site manager of “U.S. Army Al Center, Pentagon”, responded to Jiro Nakamura with the following story:

That used to be the standard, circa 1983. They changed the standard when
the NSA discovered a way to peel back the current data, and read what
used to be under it.

When I had to declassify disk drives in 1987, the NSA suggested that I
take them out to the parking lot, and run them over with a tank. Seriously.
I told him that the Pentagon parking lot had about 12,000 cars, but no
tanks. His second choice was that we put the drive on top of a research
magnet the Navy had. We went for that. I don't know what the field strength
of that magnet was, but it had big warning signs all over the building.
You had to take off everything metal just to go into the same room.

The magnet consumed 186 volts at 13,100 amps. That's about 2.5 megawatts.
We left it there for about a minute and a half. The field physically
bent the platters on our 14-inch drive. OK, I'll agree that they
finally did erase all the data on that drive. Along with any possibility
that it could ever be used again. We should have tried the tank. It
would have been less destructive.

As a matter of perspective, the 2,436,600 watts used is more than two thousand times as many watts as a 1.1 kilowatt microwave oven.

Reviewing CMRR's findings

University of California San Diego (UCSD) Center for Magnetic Recording Research (CMRR): G.F. Hughes: Secure Erase, as archived on July 5, 2013 by the Wayback Machine @ Archive.org has a document that states, “Many commercial software packages are available using some variation of DoD 5220, some going to as many as 35 overwrite passes. Unfortunately the multiple overwrite approach is not very much more effective than a single overwrite”...

The next question in the document asks, “Does physical destruction of hard disk drives make the data unrecoverable?” Here is the answer provided:

The disks from disk drives can be removed from the disk drives, broken up and even ground to very fine pieces to prevent the data from being recovered. However, even such physical destruction is not absolute if any remaining disk pieces are larger than a single record block in size, about 1/125” in today’s drives (Note that as the linear and track density of magnetic recording increases the resulting recoverable pieces of disk must become ever smaller if all chances of data recovery after physical destruction alone are to be thwarted). Pieces of this size are found in bags of destroyed disk pieces studied at CMRR2. Physical destruction nevertheless offers the highest level of data elimination (although it is more effective if the data is first overwritten since then there is almost no potential signal to recover) because recovering any actual user data requires overcoming almost a dozen independent recording technology hurdles.

So, in a nutshell, it seems that even extreme physical destruction (leaving behind pieces as large as 1/125 of an inch) might leave behind enough indication of data that, if there is sufficient motivation, could allow properly equipped experts to re-assemble the data. (From the phrasing of a “single record block in size”, perhaps this threat of data re-assembly might only be able to recover as little as a half kilobyte of data.) However, for commonplace recovery scenarios, a simple wipe of all of the data is generally sufficient to make recovery exceedingly difficult. The key is to make sure that all of the data gets erased. Many drives may retain some copies of some data, with the intent that these copies may be used for error correction. Often, these copies might not be easily visible to standard operating system code. The effective way to erase such copies may be to utilize a standard that is called “Secure Erase”.

(Canadian) Communications Security Establishment: Clearing and Declassifying Electronic Data Stroage Devices (ITSG-06): printed page 7 (PDF page 13) states:

Since about 2001, all ATA IDE and SATA hard drive manufacturer designs include support for the “Secure Erase” standard. However, SCSI and Fibre Channel hard drives do not support the Secure Erase standard and can be overwritten only by using third-party software products.

(Footnote reference removed from quoted text.) The above text is from the year 2006. It seems likely that a drive feature like Secure Erase would not need to depend on the physical connection interface (like SATA or SCSI), and so this may have changed since the time of the publication of the quoted text.)

Some Netgear ReadyNAS forum posts have quoted a note, found on a Hardware Compatability List (or perhaps multiple such lists), which have stated, “Seagate disks with SN04 revision of firmware does not handle Secure Erase command properly and therefore will require RAIDiator 4.01+ which has the workaround to handle this problem. Firmware SN03 on these disks will work fine with” an older version. So, whatever “best case” results might be possible with this technology, just like any other technology, are results that are only likely to be achievable when when the technology is implemented properly. Some drives may have improper implementations that don't even achieve the generally expected results.

Additional options for physically destroying data are discussed by (Canadian) Communications Security Establishment: Clearing and Declassifying Electronic Data Stroage Devices (ITSG-06): Annex B (PDF page 29).

Slashdot released an article called The Three Ton Hard Drive Destroyer. (Sadly, the name of this unit isn't indicated that it is a destructor of legendary proportions that weighs three tons. Nor is it intended to be destroying hard drives that weigh three tons. It simply uses a drill powered by three tons worth of pressure.)

Sometimes people use the term “shred” to refer to a simple process like data wiping, presumably because the data is then generally considered to be unretrievable just like a cross-cut shredder would make information on paper to become excessively challenging to re-assemble. Analysis of Google's hard drive shredding provides some insight to a giant shredder visible in A video by Google about some of Google's internal security practices. Google's shredder involves a real piece of equipment that literally mangling the drives into “shreds” (pieces).

For those seeking a bit less expensive of destruction, an old report by Pater Gutmann has recommended ways to wipe a drive had recommended DiskStroyerTM's solution. Evidence (in the form of a refill kit of sanding material being available in the shopping cart) suggests this involves sanding the platters. Instructions to do that with some pre-available equipment are available at DominoPower's guide for destroying a hard drive: section 2 (destroying media) (and section 3), with included photographs.

Of course, even less expensive methods may be available: In some cases simply dropping the media onto the ground may be sufficient to cause the device to stop working. (Or it may cause the data to stop working easily, making it more difficult to do any additional software-based wiping to every sector but still being perhaps possible that somebody else, with enough time of retrying as needed, might later get it to work well enough to retrieve the data from at least some sectors.) The primary things to keep in mind are to always remain safe and to determine whether the damage to data is significant enough considering various factors such as budget, inconvenience, apparent need, and so forth. Determining how much damage is “enough” is a judgement call.

Wikipedia's article on “Data recovery”: section on “Overwritten data” states: “Although Gutmann's theory may be correct, there is no practical evidence that overwritten data can be recovered, while research has shown to support that overwritten data cannot be recovered.” Then again, there have been claims of an “electron microscrope” has been described as being useful to recover data that is not as easily available. A “Secure Erase Q & A” document found at University of California San Diego (UCSD) Center for Magnetic Recording Research (CMRR): G.F. Hughes: Secure Erase, as archived on July 5, 2013 by the Wayback Machine @ Archive.org refers to a “spin stand”. Despite a white paper specifically denying recoverability, noting “[t]he inability to recover data forensically following a single wipe”, there is the old (1983) quote mentioned earlier on this page, which said “the NSA discovered a way to peel back the current data, and read what” “used to be under it.”

Wikipedia's article on “Data archaeology”: section on “Disaster Recovery”, describes scenarios when “hardware was damaged from rain, salt water, and sand, yet it was possible to clean some” disks and save data. So, there are different approaches that may generate different results. Kroll Ontrack's 2007 list of data disasters documents a British scientist drilling a hole into a drive, and then pouring oil into that hole, yet even that scenario led to data being recovered. Critical Data's web page about a drive in oil mentions data being recovered. (These companies are mentioned on the page about Recovering files.)

[#faildwip]: Failing to fully wipe a drive

Failing to sufficiently zero a drive may happen for various reasons, including hardware that made zeroing difficult. (Wikipedia's artcile on Data remnance: section called “Data on Solid State Drives” refers to some findings from a PDF file that note that visibly “overwriting the entire” (drive) “twice is usually, but not always, sufficient to sanitize the drive.” The reason has to do with some disk locations not being as visible during the wiping process.) Another possible reason, though, may be a wiping process that wasn't thorough, such as simply erasing a partition table.

Center for Magnetic Recording Research: G.F. Hughes: Secure Erase has documentation that discusses a standard that may be supported by some hard drives. The web page also offers some (DOS-based) software.

Beyond using such a process that is dedicated to securely wiping the data, there isn't any set of generalized instructions that both describe how to use software, and also offer any sure guarantee that a drive's contents are erased even if the hardware is hiding data, and the hardware doesn't erase the data when instructed to by the software.

For instance, How Hackers hid for 14 years mentions malware that “rewrote the hard-drive firmware of infected computers—a never-before-seen engineering marvel that worked on 12 drive categories from” at least seven major manufacturers of hard drives. If hardware provides false information about whether information has been deleted, then software may be unable to correctly determine whether, or not, the data is actaully gone. (Some further discussion may be found in the section on hidden malware.) So, realize that there is no simple, common method to guarantee that data was removed simply by following a process of using software in a certain way. Basically, software (which is often described as the “blueprints”, or “plans” requires cooperation from hardware (which would be analogous to the “implementation”. If that cooperation isn't provided, then no amount of plans can guarantee anything when those plans are not implemented.

The popular disk wiping software called “Darik's Boot and Nuke” (“DBAN”) has Darik's Boot and Nuke FAQ: “Are you absolutely sure that DBAN works properly?” starts out by simply saying, “No.” (The FAQ then goes into more details of why that's true.)

[#howwipe]: How to wipe data from a drive

If the only data that is desired to be wiped is a single file, consider removing files and more specifically wiping files. For more information about making data irrecoverable, see the section about the recoverability of wiped data.

Note that failing to wipe a drive can, in some cases, be easy to do without notice. Not failing, in fact, may not be an easy task at all. The following information is not meant to be a resolution for very technically difficult scenarios. Following are some instructions of some basic methods that may work, at least at a rudimentary level.

Setting HPA

If you're not yet familiar with the concept of a Host Protected Area, see: Host Protected Area.

skrilnetz.net (“Taming the Penguin”) - “The Truth About” “How to Securely Erase a Solid State Drive” mentions how to use variations of “hdparm -N”.

Zeroing a drive

One of the most common ways to do this is to use the dd command. People who don't generally use any sort of Unix with much frequency may use Unix just for the purpose of having easy access to a working dd command for just this sort of purpose.

Zeroing a drive in Unix
Using dd

The biggest advantage of using dd may be that most Unix machines will have this pre-installed.

Unix software can generally be wiped by copying /dev/zero onto a device, generally by using a dd command such as:

dd if=/dev/zero of=/dev/devname

The output device, called “devname” in the above example, would be customized based on what device is being overwritten.

That will generally run until the destination device is unable to store any additional zeros, at which point the dd command will quit with an error about running out of space on the destination. (The error could be avoided by using the count= command line parameter, but it usually isn't worth calculating the correct size to specifiy just to prevent that harmless error message.) With some devices, dd may need a block size (perhaps of 512, 1024, 2048 bytes) specified.

Zeroing data in Microsoft Windows

At the time of this writing, this guide does not provide confirmed info. However, perhaps see one or more of the following?

Writing random data
In Unix
This section is based on some documentation of the programs, and may need further testing.

An option may be the badblocks program that tends to come with operating systems that support making new Ext2 filesystems (and possibly newer “Ext?” file system types: Ext3 and Ext4), and comes with the e2fsprogs package for yet more operating systems. Using “ badblocks -t random -w ” may do the trick. Add “ -s -v” for more output.

If it is suspected that there are some bad blocks that have been found previously, first use “ dumpe2fs -b /dev/devName >> listOfKnownBadBlocks ”. Then when running badblocks as described above, add the “ -i listOfKnownBadBlocks -o newListofKnownBadBlocks ” command line parameters.

However, note that by skipping those bad blocks, it means that there won't be an attempt to overwrite those bad blocks.

Using Secure Erase
Overview info

Using Secure Erase seems ideal very promising. This involves using support provided by the manufacturer of the drive, who presumably was very familiar with what would be an effective way to erase the drive.

Not all drives support this. (As noted in a quote found elsewhere on this page), (Canadian) Communications Security Establishment: Clearing and Declassifying Electronic Data Stroage Devices (ITSG-06): printed page 7 (PDF page 13) states, &lduqo;Since about 2001, all ATA IDE and SATA hard drive manufacturer designs include support for the “Secure Erase” standard.” Linux ATA wiki: ATA Secure Erase refers bugs, including firmware bugs. As noted elsewhere (earlier on this page, with at least one example), “Some drives may have improper implementations that don't even achieve the generally expected results.” skrilnetz.net (“Taming the Penguin”) - “The Truth About” “How to Securely Erase a Solid State Drive” notes, “Various scientific papers proofed that this feature is not always implemented the right way and sometimes the data is not even erased.” “you need to trust the vendor in this case, which is a bad idea in general.”

University of California San Diego (UCSD) Center for Magnetic Recording Research (CMRR): G.F. Hughes: Secure Erase, as archived on July 5, 2013 by the Wayback Machine @ Archive.org had a document which says:

an ATA disk drive user may want to do a "Fast Secure Erase" on a disk drive before disposing of it. ATA disk drives can have a user "password" that is used to access certain features of the disk drive. If a secure erase is started using a user "password" the disk drive must complete the secure erase before it accepting any other command. Even if SE is stopped before completion another user cannot acquire the drive and use the "password" to reactivate the disk drive. The SE must complete before the new user can access the drive.

That sounds like a very fast way to at least get the process started. However, Linux ATA wiki: ATA Secure Erase, “Step 2 - Enable security by setting a user password:” states, “When the user password is set the drive will be locked after next power cycle (the drive will deny normal access until unlocked with the correct password).” The wiki has some references to hdparm's --security-disable parameter. (The author of this text hasn't yet determined if the Linux ATA wiki's text is contradicting the text from the CMRR site, or if they might be talking about some different things.)

Instructions

Instructions for performing a “Secure Erase” in Linux are available at Linux ATA wiki: ATA Secure Erase has some instructions which refer to using hdparm, e.g. with --security-erase. The hdparm may also support --security-erase-enhanced. Sophit's Security.StackExchange.com question, “What is the difference between ATA Secure Erase and Security Erase? How can I ensure they worked?” seems to be calling --security-erase “ATA Secure Erase” and --security-erase-enhanced “Security Erase (SE+)”.

What's the difference between SE and SE+? skrilnetz.net (“Taming the Penguin”) - “The Truth About” “How to Securely Erase a Solid State Drive” indicates this may be manufacturer-specific, and quotes Kingston's answer which indicates that ATA SE uses zeros while SE+ uses random data.

skrilnetz.net (“Taming the Penguin”) - “The Truth About” “How to Securely Erase a Solid State Drive” Gerald's comment noted, “It’s definitely NOT a good idea to overwrite a SSD with zeros. Controller firmware implementation is often ‘intelligent’ enough to detect full-zero data blocks and will not write the zeros, but simply mark the block as ‘all zero’ in order to reduce the number of write accesses and therefore increase disk lifespan.”

Linux ATA wiki: ATA Secure Erase also refers to at least one alternative, which is the DOS program named HDDErase, documented by the resources available from University of California San Diego (UCSD) Center for Magnetic Recording Research (CMRR): G.F. Hughes: Secure Erase, as archived on July 5, 2013 by the Wayback Machine @ Archive.org.

Other disk wiping methods

There are other options in Unix. Some recommendations of ways to wipe a disk are available.

There are some software tools specialized in removing data. e.g. Darik's Boot and Nuke involves booting off of a CD, and then having a drive be wiped (automatically with no required input by the user).

Ways not to try to wipe a disk

Unsafe ways. Ways that a person should not try to wipe a disk include unsafe ways.

Dennis O'Reilly's article on destroying a hard drive notes some dangerous methods. “Put it in a fire? There are lots of toxic chemicals in that gadget. Do you really want to be breathing them or otherwise releasing them into the environment? Microwaves are handy for destroying CDs and DVDs, but you'd have to cook a hard drive for a long, long time to blister the drive's platters.” (Actually, this may also cause the microwave oven to become permanently damaged. To be more precise, microwaving such a metalic object will almost certainly have that undesirable effect.)

Dennis's article goes on to say, “Several Web sites suggest soaking the drive in diluted hydrochloric or muriatic acid. This might work, but you run the risk of burning yourself or breathing toxic fumes.”

Physical destruction
[#dskerr]: Handling errors with data

Sometimes a disk/disc may become damaged. Here is some software that is believed to be designed to handle such data:

  • Unstoppable Copier - file copier for Microsoft Windows. Freeware.
  • dd_rescue - For Unix, may help to skip parts of a disk that are not working. Mentioned by Wikipedia's page on dd for Unix: section on “Data recovery”. This may be able to help with phase 2 of Wikipedia's page on data recovery: four phases, which is the phase of creating a disk image. This software might also be useful to erase whatever is reasonably possible for software to erase from a physically damaged drive.
  • ddrescue - The name ddrescue may be referring to the package for the dd_rescue software, or to software in a package called gddrescue which uses GNU code.
  • savehd7 - mentioned, by Wikipedia's page on dd for Unix: section on “Data recovery”, as an alternative
  • BlindWrite - shareware that handlesCDs, and includes the code from the author's previous program called BlindRead. The name comes from the software acting “blind” to suggestions by a drive that suggests giving up on accessing a sector.
  • Neil Corlett's zipbit tries to fix zip files by changing the data until the CRC matches, using brute force. Some wild speculation about this approach: This is probably a ridiculously slow process, and is probably susceptible to an incorrect repair in the case of CRC collisions (where an incorrect result was found to satisfy the CRC32 criteria). This approach is not nearly as nice as some other ideas (like restoring from backup, or trying to undelete data) and so should probably be used only as a last resort... but could theoretically work and so could be potentially useful. A program trying to do this would require an understanding of the file format (Zip files, in this case), so substantial changes would be needed to try to support any other file format that may contain data with error correction information. The site's documentation notes, “In practice, zipbit has worked well for me on ZIP files from old floppy disks where the error rate is approximately 1 bit per 100KB.”

Some information on the section on disaster recovery may be helpful.

Viewing/Editing memory contents

(This section is still fairly sparse. Further detail is needed for it to be reasonably complete.)

Deleting memory from RAM

In general, data from RAM is usually lost when a computer is powered off. Exceptions may exist, most notably when a machine first copies the contents of RAM onto a disk as the machine enters a “sleep mode”, with the expectation that the RAM will be restored later when the machine starts to run with more power.

Well, that's conventional wisdom, anyway. Wikipedia's page on Data remanence: section called “Data in RAM” notes data being available for minutes, or even a week, after power stopped being in RAM. That information has been able to help recover encrypted data. However, note that the techniques used are beyond a simple procedure available for most people to easily deploy hours later.

Data may be even less likely to remain useful when the data is overwritten with random data, which is a common activity to occur during memory testing procedures.

[#sysofcfg]: Systems for storing/using configurations
Text files

A great option for a program to store configuration is to use text files. They are easily handled by multiple formats. There are some specific subformats that are commonly seen, such as XML or the “INI” file format.

[#inifile]: INI file format
This might be pronounced as “inny”, although it is often referred to by its letters, as an making a reference to an “I-N-I file” The most famous examples in the late 1990s was definitely the WIN.INI and SYSTEM.INI files from Windows 3.1. Wikipedia's article on “INI file”, which describes the “INI file format” as “a de facto standard for configuration files.” The syntax uses sections in square brackets, a name=value syntax with one entry per line, and comments with lines starting with semi-colons.
[#osenvvar]: Operating System Environmental Variables (used by the Operating system command line interpretor shell prompt's Environment variables)
Viewing values

The set set command may be able to view all values. That could be combined with another command (grep for Unix, find for MS-DOS) to find a single value. However, a simpler way is often to use the echo command with a reference to the value. The way that values are referenced differ between Unix and MS-DOS. With Unix, they are prefaced with a $ and with MS-DOS they are prefaced with a %. With MS-DOS, they may also be sometimes/optionally suffixed, with another % or a space (or the end of the command line). Using the optional character at the end of a variable can be useful when the variable name is followed by another character, including the prefix character of another variable.

Extra options in JP Software products

JP Software products can escape the prefix character with the escape character or the escape environment pseudovariable which been more compatible across different (historical) versions of JP Software products. For example, %=% would be an escaped percent sign and %=` would be an escaped back-quote.

Another neat trick in JP Software products, which may produce results that are desirable (or quite undesirable), is to press Ctrl-X, which may expand all variables on the command line (before executing the command line).

[#setosenv]: Setting variables
Setting environmental values in Unix
Setting a value to be used later

In Unix, the method may vary based on the shell. However, the most compatible method may be to use the following multi-line syntax meant for use with the Bourne Shell:

VARNAME=value
export VARNAME

However, that is mostly meant for older versions of Unix. Newer versions of the Bourne shell may often support the more streamlined syntax which combines those commands into just one command:

export VARNAME=value

Most other shells (such as the Korn Shell, ksh; or bash) will either support the syntax of the Bourne shell or the different syntax used by the C shell csh. The shell csh uses a command called setenv. This is also believed to be used by tcsh.org's tcsh.

Neither of these methods are necessarily a really compatible method that works with both a Bourne-based shell and also with csh. It would seem like a solution to increase compatibility could come from some sort of simple wrapper command called setenv so that using setenv would typically work with a Bourne shell. However, the more common implementation is for automated scripts to just be designed for the Bourne shell.

There may be one other standard way of setting a variable, temporarily. See the later section about temporarily setting a value.

Removing a value that has been set
unset VARNAME
Temporarily setting a value

However, there is a method that may be available whether a Bourne-compatible shell or csh is used, and that is to use the env command. (If desired, see OpenBSD's man page for the env command.) However, this command is designed to run another command, and then clean up any temporary variables that it set. For example, with OpenBSD FAQ 5: section on “Building the userland” (section 5.3.5), the following command is shown:

env DESTDIR=/ make distrib-dirs

In this case, the env command sets a temporary variable called DESTDIR to a value of /. (The fact that / is the name of a directory/folder is just a coincidence, and the env command is not giving any special consideration of the fact that it is a directory.) Then the make command (which happens to use the parameter distrib-dirs) will be able to notice the temporary value assigned to DESTDIR. However, after the make is finished, the temporary value assigned to DESTDIR is removed. Therefore, subsequent commands will not be able to access the DESTDIR value.

Misc

It seems that some variables may be able to be set with the stty command.

Setting environmental values in MS-DOS

In MS-DOS, the most common way to set variables is to use the set command from the command line. (Another option is to use the set command in the system configuration file, traditionally \CONFIG.SYS on the boot drive.) When setting a value, after the word set, there is a space, a variable name, an equal sign, and a value. So, the syntax looks like this:

set VARNAME=value

As an example of this:

set DIRCMD=/-W

The limit of how long an environmental variable may be is most traditionally 127 characters. However, third party products may be able to effectively extend this: JP Software products may support 255 characters and the shareware program XSET may support values even longer. JP Software products have a command called set which can be an easy way to edit a value.

To unset a variable, set its value to nothing. That will eliminate the variable. For example:
set VARNAME=

Traditionally, environment variables have always been in uppercase, even if they were typed in lowercase in the command line. This may cause some difficulties: for example, in Win9x there was a variable set in the bootup process which was entirely lowercase. That variable, called windir, was not always as easy to edit as other values. (However, that was largely okay because there was also very little to no common need to be editing that variable.)

With the software built into most versions of DOS, there is no real way to edit an existing variable. The solution is simply to overwrite the entire value by just setting a value with the new data. This is usually fine for rather automated solutions, and using command line history is commonly the way to modify a complex variable (such as if a mistake was found with a path). If a copy-and-paste operation is available, another option may be to show the existing variable and copy it, modify it, and then run a command to set the pasted value. Users of a command line shell that uses JP Software's command set will have another nice option: an internal command called eset. Such users may use “ eset VARNAME ”. (The command will generate an error, and set _? to a value of 2, if the specified variable name doesn't exist.)

Environmental variables in Microsoft Windows

Environmental variables can be, and are, set in a Microsoft Windows command line using methods similar to MS-DOS. They may also be set based on some registry entries. Variables are sometimes described as being different “types” (using the terminology from TechNet: WSH Primer: Environment Variables) or “levels” (using terminology from MS KB 100843). The different levels refer to whether specific locations in the registry point to certain values.

TechNet: WSH Primer: Environment Variables Table 3.12 (“Types of Environment Variables and Their Storage Locations”) documents these registry locations.

Type name Registry location Applies to Saved between logoffs/Restarts?
Process (None) Current process, perhaps child processes. No
Volatile HKCU\VolatileEnvironment Current login session No
User HKCU\Environment user currently logged on Yes
System HKLM\System\CurrentControlSet\Control\Session Manager\Environment All users of computer Yes
Variable type effect
  • This affects where the variable is in the registry.
  • This affects whether values persist when logging off (or restarting)
  • ss64.com: syntax variables mentions Volatile is “Read-Only”.
Commands to change values

To change the values, you may be able to use the registry (for some types), or use these commands:

Process SET
User SETX
System SETX -m

The SETX command was introduced with Windows XP Support Tools, and specifying a parameter required using a - (and did not support using a / for that purpose), while newer versions of the program tend to have internal documentation showing / used for the purpose of specifying a switch.

ss64.com: SETX shows which registry key is altered by the -m switch, and that is the same registry key documented elsewhere as being related to the “SYSTEM” variable type. The letter “m” stands for “machine”.

More documentation:

  • ss64.com: Syntax: variables also discusses these.
  • Info for WSH coders:
    • MSDN: Environment Property mentions “For Windows95/98/Me, only one strType is permitted - Process.”
    • ss64.com.com: Visual Basic: Env (Environment) shows how the WScript.Shell's Environment process can be used with different variable levels/types. The supported variable types are named “SYSTEM”, “USER”, “VOLATILE”, “PROCESS”.
      • Create an object of "WScript.Shell" type. e.g., named oWScShel
      • oWScShel.Environment("Process").Item("VARNAME") is treated like a variable that you can read from, or set. Note that setting a Process variable will be unset when the script ends (because that will end the current process).
Aliases
These are typically handled very similar to variables, but may use different commands like alias and perhaps unalias.
[#sysctl]: sysctl values

These are used in BSD systems as a way to perform some configurations.

The section on implementing basic traffic forwarding, and more specifically the subsection about OpenBSD, provides an example on how to work with the sysctl values. (That example is straightforward enough that there seemed little to no point in duplicating the example here. Even if a different sysctl is being worked with, and even if this is done on a different BSD operating system, don't be afraid to refer to that section as an example of how the sysctl values are being handled.)

OpenBSD Manual: Section 3 (Subroutines): page about sysctl (which is a different page than OpenBSD Manual Page for the sysctl command (from section 8 of the manual)) documents many of the values.

[#wnregsty]: Registry (implemented in Microsoft Windows and similar/compatible environments)

There was a sufficient amount of information about the Windows Registry to move that data to a separate section. See: System Configuration: Windows Registry.

[#wnrgcsav]: Exporting

Information has been moved. See: Exporting

[#viewreg]: Viewing Registry

Information has been moved. See: Viewing Registry

[#wnregedc]: Editing the registry from the command line

Information has been moved. See: Editing the registry from the command line

Historical notes

Information about the Windows Registry in Windows 3.x has been moved to Windows Registry.

Handling network data
To see what is happening on a network, see: “network monitoring” (on the Networking page). To see what would happen if certain traffic occurred, see traffic routing/firewalling. (Load balancing should also be in the traffic routing/firewalling section.) To automatically check for certain types of network data, and report certain things (with an IDS) or act on certain things (with an IPS), see sections such as: setting baselines, and reporting anomalies.
Reporting problems
[#hndloslg]: Operating system logs

This information might now have some details that are redundant with information in the section dedicated to Logs.

Unix
Log rotation

There may be multiple files that contain compressed older files. For instance, /var/log/ may have a messages file as well as a messages.0.gz file from last week, and a messages.1.gz file from the week before that, and a messages.2.gz file from the week before that. This may cap out at some point, such as after 8 weeks (with files named *.0.gz through *.7.gz. This compressing and filename rotating may be handled by a command called newsyslog or logrotate. The first command just mentioned may use a file named /etc/newsyslog.conf which may show some other logs that are used and regularly rotated.

General logs storing messages from programs
/var/log/* contains some files, including: /var/log/messages (many generic messages), and /var/log/daemon (messages from background processes).
Users/Authentication

A system may use more than one of the following types of logs.

authlog
/var/log/authlog (authentication successes and, probably more interesting, failures).
When users have been added
In OpenBSD, /var/log/adduser shows when new users were created (and which users those were).
wtmp database and more

Some types of activities often performed in attacks, such as adding unauthorized user accounts or changing the system time (which may be done to manipulate a security vulnerability, or just to alter timestamps in various logs) or shutting down the system, may be stored inside a “wtmp” database. This may consist of multiple files: wtmp and utmp may contain some information, particularly including valid login attempts. Unsuccessful login attempts may be logged into /var/log/fail*log* file.

There may be some variations in implementing this. Some systems may have a file named /var/log/faillog while other systems may have a failedlogin file. At least some version of OpenBSD 4.0-current had a 304304 byte /var/log/failedlogin file while AIX documentation points to a /etc/security/failedlogin file and a /var/adm/wtmp file. OpenBSD may place both of those types of files directly into /var/log/. OpenBSD's “manual page” for wtmp describes programming code as well as files, noting that /var/log/wtmp files contain logins, logouts, changes to system time/date, and system shutdowns (including reboots). The ac command may be of some use to review data from the binary wtmp* files.

[#lgusract]: Logs of other user activity
Processes

To see what has been done with the system, /var/account/*acct* probably does not exist, but may if accounting was enabled with accton. These logs may be managed with sa and a brief summary of recent actions may be found with lastcomm.

~username/.*_history files may provide some details of what was done. This would be most useful if a user ran bash and left behind a .bash_history files. Users of other shells may also use a history file, possibly only if the HISTFILE variable is set. Users of the text editor called nano may have a .nano_history file showing search/replace activities when editng a file.

If a user has elevated permissions, there may be a log of that. For instance, when running sudo, an entry may be added to the /var/log/secure file.

Printing/plotting logs
pac
Microsoft Windows
Modern versions
The primary operating system logs may be seen by Event Viewer. These may be in %windir%\System32\winevt\Logs\*.evt* files. MSDN: EventLog Class says “Applications and services should write to the Application log or a custom log. Device drivers should write to the System log.” MSDN: EventLogEntry.Source Property says “Applications and services usually write to (and therefore are sources for) the Application log or a custom log. Device drivers usually write to the System log.” The system log may also have some messages from the operating system. Very serious errors, such as problems writing to a disk (perhaps due to a disk dropping from a RAID array, which might be reported by the device driver) or running out of memory will likely show up in the system log, so that may be the first log worth checking. Although the Application log will be more likely to have some useful information, if there are problems then finding the serious errors may explain some of the other errors. Therefore, checking the system log first is recommended to be able to understand the bigger problem sooner, instead of spending time dealing with resulting problems that may be recorded in the application log. There may also be an EventCreate command. Note that, at least with some versions, EventCreate may silently exit without logging the requested information if the command line is too long.
[#wshlog9x]: Windows ME/9x

Data from Windows Script Host may be stored in Wsh.log which may be a standard text file stored in %windir%. There may be some other locations where information is logged (such as C:\boot.log?), but the Wsh.log is probably the most standardized location (which also has log entries that are similar to what is found in the operating system logs of other Microsoft Windows versions).

Programmers may make use of this by using Windows Script Host and using a “WScript.Shell” object's LogEvent function. The first parameter to that function is an integer which represents success (0), error (1), warning, info, audit success, or audit failure (with each type represented by a value twice as much as the previous value in that list), and the scond parameter being a string. MS KB 257541: “How to write to the Application log in Windows NT, Windows 2000, or Windows XP” (or Win9x) “using the Windows Script Host” mentions only NT-based operating systems in the KB's title, but the text stated, “If you are using this code on Windows 95 or Windows 98, it will write to Wsh.log in the user's Windows directory.” LogEvent Method details of WSH 5.8's WshShell object shows details of a method to write to the logs. (Information on LogEvent() failing in Win95/98/ME (documentation related to WSH 5.8) shows an error if wsh.log is locked.) Perhaps MS KB 301309: How to Log Events from Active server Pages shows information for Win95/NT4/98/2K with IIS/PWS.

There might be a minimum version of Windows Script Host for this?

Other log files
Network issues
Down/unresponsive systems

If a system's primary network connection goes down, it may be able to send an alert. For example, it may be able to use a dial-up modem to call a phone number. Perhaps this phone number provides some alternate Internet access so that an E-Mail or text message may be sent to a server operator's phone, or perhaps the system somehow sends a message to a pager which simply shows a numeric message.

However, many systems do not have a secondary network connection to simply fall back onto. There is another approach: a system may regularly send some traffic. This traffic may be ICMP traffic generated from a ping command, or a web request to a specific web page. If the server that receives this traffic notices that the traffic stops being sent, then an alert may be sent. In the case of looking for ICMP traffic, this may be meant to alert on a situation where there is a network outage or a system that stops working, perhaps due to a substantial hardware failure. If the system that is still working is capable of generating an alert that is responded to, then a human may be able to respond to the system with a problem even though the system with a problem isn't sending any alerts about the problem. This might not be a very secure approach, because an attacker could easily mimic the ICMP traffic. Another approach may involve a system uploading some data, such as details from logs and/or details about how the system is being utilized (including what programs are running and/or what usernames are actively using system resources). Clearly some information may be sent which could be considered sensitive, and such details should only be sent to a trusted system. This may involve encryption, which would also prevent the check-ins from being as susceptible to being faked during some sort of attack.

Attacks
Setting up an IDS/IPS. If a system seems to receive activities that are probably an attack, then respond to it. (This may be considred “Misuse” detection, and may be implemented similar to anti-virus software which relies on definitions.) (Another approach may be to compare some current activity to a baseline, and alert on anomalies.)
Setting baselines
file integrity checks
...
Network usage

Increased network usage could mean a huge attack, or something far more positive such as an increased amount of good activity. Having a baseline can help determine if things are awry, how much growth has occurred, and what the impact may have been from an event that caused some consequences (such as a short term surge in awareness/interest leading to people checking the website more often, perhaps shortly after an announcement at a trade show or to a press conference).

  • amount of data transfered (e.g. raw bytes, number of E-Mail messages, HTTP requests)
  • Number of errors. e.g. outgoing E-Mails which end up being failures (within a certain time period, and/or as a percentage of outgoing E-Mail), DNS lookups of domains that don't exist, TCP connections which are started but don't successfully establish, or communications to certain IP addresses, domain names, TCP port numbers, or other IP protocol numbers. If these are out of line, perhaps an infected system is engaging in an attack. Delaying/storing outgoing E-Mails, without sending them, may be appropriate, and alerting technical staff (so they can end the delay, and cause the E-Mails to be sent, after verifying whether such E-Mails are approved).
  • Usage of certain terms. For example, zero occurrances of confidential information would be normal: more may be a data leak that should be stopped and/or reported.

Perhaps a scoring system may be helpful: Time, or perhaps something else like a number of outgoing E-Mails that appear to be successfully delivered, may cause black marks to erase (perhaps at a different rate than they accumulate). Gaining too many black marks too quickly could be a cause for an alert.

Reporting anomalies
This may involve setting some baselines, testing, and being able to report problems. If there is growth in traffic, there may be a desire to update what is "normal", so thresholds may adapt and growth over time doesn't exceed old thresholds. There may be some sort of criteria used to determine how much growth is considered acceptable/normal/believable/okay. Similarly, loss of traffic may indicate a technical problem, possibly caused by an attack, or possibly a loss of business because customers are going elsewhere. A loss of business may not be a technical problem that technical staff will solve with technology, but still be worthwhile to report so that others can realize that there should probably be some changes made. (Unfortunately for the technical staff, that probably doesn't mean giving raises to all of the technical staff.)
Running automated tests
See if problems are detected.
Encoding Methods: Representing bits with characters

The following look like they may be helpful resources: UTF8 table, UTF-8.

[#ascii]: American Standard Code for Information Interchange (“ASCII”)

Although 8-bit character maps are often referred to as ASCII, they may be more prone to vary with the characters that have the high bit set to 1. These 128 characters are defined within 7 bits (with the high bit cleared to zero if using an 8-bit representation). Each of these characters is either a letter, or a number, or it is a character/symbol that has a name.

The first 32 characters (ASCII zero through 31 decimal / 1FH / 0x1F) also have a standard acronym of two or three letters, and are each considered to be a “control character”. They have a “control sequence”, and as a group are called the “C0 controls”.

[#ctlkbseq]: Control sequences
  • As a standard that is commonly followed, using the keyboard by holding Ctrl and then entering the input sequence for a character from ASCII 64 (decimal, a.k.a. 40H / 0x40) through ASCII 95 (decimal, a.k.a. 5FH / 0x5F) (and then releasing the Ctrl key as it no longer needs to be held down) is a sequence that represents the corresponding “control character” that has an ASCII value that is 64 (decimal, 40H / 0x40) less than the ASCII value typically generated with the same input sequence without using the Ctrl key.
    • For example, the “[” character has an ASCII value of 91 (decimal, a.k.a. 5BH / 0x5B). Subtracting 64 (decimal, a.k.a. 40H / 0x40) from 91 (decimal, a.k.a. 5BH / 0x5B) results in 27 (decimal, a.k.a. 1BH / 0x1B) which is the character called “escape” (abbreviated “ESC”). Using Ctrl and pressing left square bracket will generate the 27th ASCII code and supporting programs will accept that as character input.
    • As another example: Holding Ctrl and then holding Shift and pressing number 6 (after which point the user may stop holding down the Shift key and (then) stop holding down the Ctrl key) will be understood as Ctrl-^ and this generates the character represented by ASCII code 30 (decimal, a.k.a. 1EH / 0x1E). (The caret symbol (“^”) that is created by using Shift-6 is ASCII code 94.)
  • Holding Ctrl and pressing a lowercase letter (and then releasing the Ctrl key, meaning that it is no longer being held down) is a sequence for the control character with an ASCII value that is 96 (decimal, a.k.a. 60H / 0x60) less than the input character. So Ctrl-a refers to ASCII code 1, which is 96 less than the ASCII code 97 which corresponds to the lowercase letter a.
    • Lowercase letters are an exception to the earlier rule that says the ASCII code is 64 less than than the keystroke.
    • Using the lowercase letters a through z are generally more convenient to try typing than using the upper-case letters. (There is no need to uselessly press the Shift key, so people simply use what is easier to type.)
    • Uppercase letters are often what gets shown in output (despite the fact that lowercase letters are typically shown during input). So pressing Ctrl-z (lowercase) may cause the screen to show ^Z (with an uppercase Z) on the screen.
  • There is also one other exception to these general guidelines, which is Ctrl-? (which gets described further after this “bullet point” list).
    • Often these control characters are represented as a caret (“^”) followed by an upper-case letter. So, the seventh ASCII character may be entered as Ctrl-G but some software may show it as ^G or less commonly as ^-G. This has been commonly seen on so many computers that there is even a Wikipedia page on “Caret notation”.

    Another way to input specific ASCII codes, which often works, is to input letters by using Alt codes.

    Control sequence for DEL

    In addition to the first thirty-two characters being considered a “control character”, the 127th ASCII character (the “delete character”) has a standard abbreviation (“DEL”) and may be considered a “control character”, even if it doesn't have a Ctrl key sequence defined in quite the same way as most of the other control characters.

    However, some displays may map that using the Caret notation of ^?. For instance, at a Unix shell, pressing Ctrl-V and then sending keyboard input may result in the shell program outputing a representation of what is input. If the shell receives ASCII code 127, it may simply show “?”. PuTTY documentation: configuring backspace identifies this as CTRL-?.

    The reason for using the ? character is (presumed to be) because the ? character is at ASCII code 63 (decimal, a.k.a. 3FH / 0x3F). If the standard rule followed for ASCII codes 64-95 were also applied, ^? would map to the ASCII code at position negative one (-1), one spot before position zero. Some logic (that may seem particularly strange to non-programmers) suggests that this could be considered equal to 127: this logic assumes underflowing the seven bits used for the 128 characters would result in the last character of the set (and not set an underflow flag). The same logic doesn't get applied further simply due to a lack of need: characters 64-126 are generally printable, most of them being uppercase or lowercase letters of the roman alphabet.

    Note, however, that pressing Ctrl-Shift-/ (in an effort of typing Ctrl-?) is unlikely to produce the DEL character (ASCII code 127). Instead, there is a special shortcut for that: Ctrl-\.

    One might think that Ctrl-\ might send ASCII code 33 (since 97 - 64 = 33), but it doesn't. (ASCII code 33 is the exclamation point. To send that character, press the “exclamation point” sequence, which typical QWERTY keyboards have at Shift-1).

    Using Ctrl-\ may effectively send the 127th ASCII code on Unix systems as well as Microsoft systems. If that doesn't work, here is another approach that has been known to send the 127th ASCII character: An ASCII chart shown on MSDN: ASCII Character Codes Chart 1 says, “ASCII code 127 has the code DEL. Under MS-DOS, the code has the same effect as ASCII 8 (BS). The DEL code can be generated by pressing the CTRL + BKSP key.”

(Upcoming information/reference may provide one or more charts for: Decimal number, hexadeicmal number, symbol,symbol abbreviation,symbol name, UTF-8 representation, Wikipedia link

00... NULnull
8-bit character maps
e.g. code page 437
Unicode
...
UTF-8
...
Others

Perhaps the most famous character set that predates ASCII is EBCDIC (“Extended Binary Coded Decimal Interchange Code”). Wikipedia's page on EBCDIC: “Criticism and humor” section notes a reference to EBCDIC as if it was an encryption standard. (Encryption standards are designed to make information difficult to comprehend, which is quite the opposite goal of standards meant for information interchange.)

Jean-Maurice-Émile Baudot (perhaps more commonly referred to as Émile Baudot) created Baudot code. His Baudot code also became known as “International Telegraph Alphabet No 1” (with “No” likely being an abbreviation for the word “Number”). Unshockingly, “International Telegraph Alphabet No 1” was the predecessor of “International Telegraph Alphabet No 2” (“ITA2”).

[#kbaltnum]: Interpreting keyboard input

A lot of this may be covered by the user interface basics. There are some standards which aren't as well known by many typical computer users, but which are very commonly available.

One is to use control sequences. (Details are in the section about ASCII, and more specifically, the sub-section about control (keyboard) sequences.)

Another is to use an “alt code”. For many systems, holding the Alt key and pressing a (single-digit or multi-digit) number on the numpad may generate a character.

  • If the character is the number zero, or if it is less than 256 and does not contain a zero, it may be treated as a character from a standard code page. For many versions of DOS and Windows, this is code page 437 in North America or code page 850 for Western Europe. This also typically works well with Unix terminals and software implementing functionality similar to remote access communications terminals.
  • If the number starts with a zero, an ANSI code page is used. For most of North America and Western Europe, this is the Windows code page 1252.
  • If the number starts with a plus sign, Unicode is used.

Another way to input some common symbols is to use a “compose key”. (See Wikipedia's page about “compose key”.)

A similar topic is the “Meta key”.

FileFormat.info: How to enter Unicode characters in Microsoft Windows mentions this as “Method 3: Code-page Specific”.

[#datcmprs]: Data compression

Dealing with compressed files may be handled on the page about handling files or bit compression. Developers might find information in the section about coding. Compressing entire volumes (which can be used to compress just about everything on an entire disk) is also an option: see disk/drive data compression.