Handling Files And Data Within Files

Overview of files

A file is basically a collection of related information. The most valued part of that is generally a “stream” of data even though most computer users probably do not understand fully, if at all, what precisely is meant by a “stream” of data. Basically, a stream is a sequence of bits, so the main core content of a file is stored in the file's primary data stream. (Although a file can have more than one stream with file systems, this is not a very common practice. There is generally just one stream per file, although a zero-byte file might not need to have a stream.) (Note: the term “stream” does not necessarily have to be stored in a file: a sequence of bits sent from one computerized device to another is often referred to as a network data “stream”.)

A file basically consists of a reference to the data stream, a “size” which indicates how long the stream is, a name, a location (although in theory that could be stored separate from the file), and some other information which is typically not directly interacted with as frequently by end users: ownerships, permissions, and other miscellaneous file “attributes”/properties/characteristics/settings. (Although the computer may interact with some of this information, such as permissions settings which are checked when an end user opens the file, most end users do not typically interact directly with those settings by reviewing the settings or changing them.) The data related to a file other than the main data stream, such as the filename, is often called “metadata”. The term “metadata” is often used to refer to data that is related to, and possibly describing, other data that provides the value people are intersted in.

[#copyfile]: Copying files

A seperate section about input/output interfaces may have more information about copying files (with progress indicators). This section focuses more about simply copying files (typically with built-in software). (At the time of this writing, there may currently be some duplication in these sections.)

Copying files locally
Multi-method, multiplatform

Operating systems, as a standard, provide users with a way to copy files. (Extremely old operating systems, such as ProDOS for the Apple II, might not have a slick interface that is available often, but even this software came bundled with utilities that, if booted, allowed files to be copied.) On any modern system, safely assume that there is a method to copy files, so finding a way to copy files is simply a part of learning basic functionality of the platform being used.

The built-in tools may be fairly limited, such as not showing an estimated time for copying remaining data. There had been a reference here to Ultracopier, but reading section on Ultracopier is recommended before using that software.

Via command line
Unix

Note that filesystem entries (files and directories) which start with a period may often not be included in a default expansion. This may be worked around using techniques described in the section about Unix asterisks. (Users familiar with other operating systems should read the warning in the section about Unix asterisks.)

[#unixcp]: The most well known copy command: Using the cp command:
[#cpreqdst]: WARNING to people familiar with other platforms

Note that the final parameter must be a destination. This is an important difference from MS-DOS and similar copying systems. For example, with MS-DOS, copy A:\*.* implies a destination of “.” (a single period), which means the current directory. This is synonymous with “ copy A:\*.* .

With Unix's most well known copy command, cp, this is not how things operate. Using cp /somedir/* will generally cause the shell to expand the wildcard before the cp command sees the command line. Then each file listed will be copied to the destination, which is the last file on the command line (after the wildcard has already been expanded). This incorrect usage causes data loss because the last file ends up being the destination file. Instead of having the last file be one of the source files that gets copied, this ends up completely destroying the contents of the final file. (Of course, this data loss doesn't happen if the copy is prevented by something like file permissions.) So those who are used to MS-DOS and similar operating systems, beware that a destination directory must always, always, always be specified.

Recursive copying (in Unix)

If the only contents being copied are regular files and directories/folders, then the “recursive” option may be all that is needed to accomplish this task. This recursive option is generally specified with a capital R (as noted by the OpenBSD man page for cp). The fact that an option is used is determined by starting a command line parameter with a hyphen. So, for example, the following may work:

cp -R source_data destinationDir

Some versions of cp (such as what is shown by the man page for Debian's cp) may support -r. In the case of Debian, and likely many others, -R also remains an option. As a couple of quick side notes, the rm command also supports both -R and -r to do the same thing, but the mv command supports neither. Using the capital -R may be the nicer habit to get into for recursive functionality because the ls command also provides recursive ability by using -R, but the ls command uses the lowercase -r option for an entirely different purpose (which is to reverse the results of a sort).

As a point of clarification, specifying a directory in the list of what to copy will end up copying that directory. It doesn't just copy the contents of the directory. (This may differ from XCOPY /S used in DOS prompts.) If just the contents of the directory are desired, use something like “ dirName/* ”.

Note that if symlinks are part of what is specified in what is getting copied, specific options may be needed to copy those in the desired fashion. (This isn't an issue if the data does not contain symlinks.)

There are additional command line parameters for the cp command.
[#cpyprgrs]: Watch the progress of a copy command

If dealing with big files on a system which is not being heavily used by other tasks that involve the hard drive, and if the file is being copied from one mount point to another, one option may be to run df on the destination drive. Ideally do this before starting the copy command. Then multitask and check the free space as needed. It might also be internesting to use the time command. (e.g. time cp -p followed by the other needed parameters.)

Concatenating with a progress bar

This isn't exactly the same thing as copying a file, because what this does is it copies the file's main stream of file data and creates a new file. This means that metadata, such as the file's owner information and details like the creation time, do not get copied. That may or may not be very acceptable. (Some of that metadata information may be able to be restored manually fairly easily.)

Use the pv command. This command, which stands for “pipe viewer”, may need to be obtained separately. (See the section on installing software and details specific to pv at early/common software: pipe viewer.) This software, however, can be used to provide a status bar for quite a few operations, as shown by Examples of useful tricks with pv.

The simplest syntax unverified, but probably is:

pv inputFile >> outputFile

Some other similar software, that may be useful for reporting progress, is CStream.

[#doscpycm]: Copy commands for MS-DOS and similar environments
The default copy command

The copy command will generally allow a file to be copied. The simple syntax is:

copy inputFilespec optionalOutputFilespec

If there is just one filespec, the optionalOutputFilespec is assumed to be a reference to the “current directory”, so this becomes equivilent to adding a space and a period (“ .”) since the current directory is generally able to be referenced by a single period. (Be fore-warned that treating the destination filespec as optional is NOT a habit that works with the required syntax of Unix's cp command.)

The source filespec may be a directory, in which case the copy command will copy all of the files in that directory. (However, the copy command does not copy subdirectories unless the copy command has recursive abilities which are supported and which are explicitly specified.) The filespec may also refer to files (though, typically, not directories) using wildcards. Users of more advanced operating systems may wish to note that MS-DOS wildcards are a bit simplistic. As noted by User Interface Basics: Wildcards, * may be equivilent to *. (which may mean all files that do not have a period), and that may be different than *.* (which means all files regardless of the extension, including files that might have a blank extension (meaning having no extension).

The copy command is typically included inside the command line interpretor. Using a third party command line interpretor, such as one made by JP Software, may provide additional options. To see the options available, use copy /?

In DOS, the %COPYCMD% environment variable may be set and then used by default. The environment variable may specify some command line switches which will be used by default. (This is one of the rare cases where, for seemingly no super compelling reason, the default behavior of JP Software products are incompatible with the command line interpretor built into the mainstream operating system that the JP Software product is designed to work in. JP Software's online help for COPYCMD discusses this topic and provides a workaround using something such as such as alias copy=`*copy %copycmd%`.)

Recursive copying (in DOS)

For recursive copying, use copy /? to see if a /S command line parameter is available to copy subdirectories. If not, see if there is a command called xcopy. The external xcopy command may have more command line options than the command line interpretor's internal copy command, and may also be designed to use more memory so that copying may go faster. For very advanced file copying, a different command may provide more features. In Windows Vista, xcopy /? includes the text “NOTE: Xcopy is now deprecated, please use Robocopy.” There are other options. JP Software's 4DOS and 4OS2 were once shareware but both have been released as freeware with source code publicly available. More information about those products may be found by TOOGAM's software archive: page related to JP Software.

For DOS and similar type platforms, the copy command is an excellent demonstration of the power of JP Software's products. While a newer CMD.EXE (using Windows Vista's as an example) may have 8 configurable switches, and older versions of CMD.EXE/COMMAND.COM presumably may have less, JP Software's advanced copy command may have more than 33 command line switches. For those looking to cram maximum functionality into a small space (particularly if using DOS and boot floppies), including one JP Software product (especially if using executable compression) may provide more functionality in less space than using a standard command line interpretor plus additional software that JP Software's product can effectively replace, such as the operating system's XCOPY.* file(s).

[#wincpycm]: Microsoft Windows

The copy commands for MS-DOS and similar environments are generally available in Microsoft Windows environments.

[#robocopy]: robocopy : “Robust File Copy for Windows”

For newer Microsoft Windows operating systems, another command may be available: robocopy, the “Robus File Copy for Windows”. This copy command can be nicer as it typically shows progress as a large file is copied, although there may be some compatibility issues where different versions of Robocopy operate differently. Syntax of robocopy is also a bit different than the standard copy commands. The general syntax is:

robocopy sourceDir destinationDir optionalFileSpec moreOptionalFileSpecs optionalOptions

The default filespec is *.* although it can be used to copy a single file. The following command can be used as an example of how to copy one file.

robocopy C:\. C:\Backup AUTOEXEC.BAT /V /BYTES /ETA /X

The above example will verbosely copy a single file.

When tested with the version built into Windows Vista, the destination folder did not have to pre-exist: As needed, the destination folder was simply created by robocopy. Expect output to scroll beyond the screen size if using a 25-row command prompt session and if neither /NJH or /NJS to be used.

Copying files in Cisco IOS

First, the standard Cisco IOS warning is worth mentioning before people delve into plans to try using this option.

Although there is a copy command, and although that command does use the basic syntax of “copy src dest”, this command might not work quite the way someone expects it to. A key difference, between this command and a similar command in another operating system, is that this command may use shortcuts. Basically, there are multiple different ways to refer to a single file. In some cases, just the first three letters may be treated as equivilent to a well-known filename. This is discussed by the Cisco IOS file handling guide. Before trying to just use standard Unix commands, in hopes that they operate similar enough, reading that guide is recommended. Particularly when handling data that may be desirable, getting unexpected results, and the subsequent confusion that may result, is a fate worth avoiding.

(Similar to other Cisco IOS commands, typing “copy ?”) ought to be an option. That is discussed more in the Cisco IOS basic usage guide.)

Copying files with a GUI
DOS

These options may primarily involve “text mode graphics”.

For MS-DOS users who prefer to use a graphical interface, a command called dosshell came with MS-DOS 5. (Although it does not come with MS-DOS 6.22 directly, it is offered as part of the MS-DOS Supplemental files.)

Additional options include several products with the word “Commander” as part of the product name.

Microsoft Windows
An option is to use Explorer (or, for very old versions of Windows, the predecessor of Explorer, File Manager).
FastCopy

FastCopy (English Page) (excellently translated from the origial FastCopy home page (in Japanese). This software is for Microsoft Windows only, but does cite compatability with Windows 98 and NT and 2K and newer (including ME), so it is fairly compatible. Nicely, BSD-licensed. Can be started from a command line that specifies which files to copy (and may specify other options as well). Winner of a review from Gizmo's Freeware: review of file copiers.

The GUI seemed to ask for a directory/folder to copy (not a file). When the GUI is asking to selecting a directory/folder, then it also offers a button to switch to file-selecting mode. On the screen to select a directory/folder, holding Control when pushing OK will cause the file to be added to a list of items to copy. Then, in the main window, each item is listed, separated by semi-colons, in the “Source” field.

Options to consider changing

By default, this software also did not show information about estimated time remaining, including not showing a percentage bar. It does, however, show how much data has been successfully written, so there is a form of progress that is available. There is an option in the general settings called “Estimate FinishTime” which is unchecked by default. Apparently this is unchecked because checking the filesizes may take a bit of time, and this program is designed to try to minimize time.

Another setting that may be desirable to change is under “Misc Settings”, and is the “Don't wait for other running FastCopy to finish” option. The refresh rate can also be lowered, causing more frequent updates.

Various options, including running a command or (as a convenient option) playing a sound (optionally playing the sound only if an error existed) can be selected using Option, Post-Process, “Add/Modify/Del...”.

When a file is copying, watch how large the “TotalWrite” increases. For instance, when copying over a network connection, perhaps the number only grows every few seconds and grows by 16MB at a time. (That number, 16MB, was seen in actual practice. Perhaps that was because of the General Settings, “I/O Settings”, “Max I/O size (MB)” was set to 16.) If that's the case, a buffer size of 48MB (triple the amount written at a time) ought to be more than sufficient. (Single might be cutting it close; double might be perfectly sufficient. Triple is likely to be more than sufficient.) So, lowering the buffer from 128MB to 48MB would save 80MB of the system's memory (and would not negatively impact the file copying performance).

The name may come from the multi-thread capabilities: If the destination is detected to be on a different drive than the source, then writing will occur in a separate thread than reading. This allows for more parallel execution than simply pausing the reading while data is written. Such an approach is not used when the software detects that the same hard drive is being used (because this approach is expected to slow things down, instead of speed things up, when the written data is to the same physical drive as the read data).

[#msrchcpy]: RichCopy

Those who would like to use a GUI may want to use RichCopy. However, that utility has rather unpleasant Critisisms of RichCopy (as noted by Wikipedia): bugs that have been unfixed for years. So, other alternatives may be preferred.

RichCopy 4.0.217 Installer Installer asks for a destination folder, and creates a HoffmanUtilitySpotlight folder containing installation files. RichCopy is documented by Joshua Hoffman's Utility Spotlight and made by Ken Tamaru of Microsoft.

Perhaps another location to get the software: RichCopy mentioned on a blog of TechNet

Other Robocopy GUI options
Wikipedia's article on Robocopy: section on using a graphical user interface front-end has mentioned SH-Soft's “Robocopy GUI” as an option. More recently, additional options have been mentinoed as well.
[#ultrcpir] UltraCopier and Supercopier

WARNING: Data corruption has been detected with this software. When copying a 7680410 KB file (and pausing once or twice), the resulting file had some bytes set to zero. This included byte 0x228C1C17 and several before and after that. The copy was admittedly complicated, going from Windows over SMB to an OpenBSD machine that used Sharity-Light to copy data to a remote system using a Linux-based platform. There were multiple files involved, but the difference between the successful copy and the unsuccessful copy is believed to be the use of UltraCopier on one of the files. Another file copy solution worked fine, so the unconfirmable blame was placed on UltraCopier 1.0.1.3 for Microsoft Windows. If using this software, make sure to do a binary comparison to make sure the files successfully transferred!

The page's title graphic spells this as UltraCopier, although the web pages vary in whether the C gets capitalized. For instance, News on version UltraCopier 1.0's release has multiple edits, and they show both variations.

UltraCopier is GPLv3. UltraCopier is multi-platform, apparently functioning on Win2K, OSX Snow Leopard, and Linux-based operating systems. SuperCopier appears to simply be an older version of UltraCopier.

UltraCopier download page may provide hyperlinks to “Ultracopier Ultimate Free” ... “+ cgminer”. A Forum post about using cgminer is written in some very broken English, but it seems that a BitCoin mining software has been embedded into the program, and that the purpose is to help the program's author (not the end user).

When running the program, it appears in the system tray (using an icon that looks like a floppy disk). After installing, if less intrusion is desired, perhaps go to Options, General, and uncheck “Replace the default copy and move system”?

Options include Copy, Transfer, and Move. (What is Transfer?)

Wikipedia on Ultracopier shows this more advanced GUI program, with support for command line scripting, and this software can be used on Win2K, OSX Snow Leopard, and Linux-based operating systems.

[#unstpcpy]: Unstoppable Copier
Gizmo's Freeware: review of file copiers cites this as being quite successful at dealing with damaged discs. (For other software dealing with damaged disks, see handling damaged media.
Reviewing multiple options
Gizmo's Freeware: review of file copiers, referenced by Wikipedia's article on file copying software (section listing software).
Copying files remotely

One option may be to use file sharing. In some cases, a filesystem volume on a remote machine can be set up to look and act in many ways like a local filesystem volume. (In the DOS/Windows platform, this may involve making a local “mount point”/”drive letter”. Assigning a drive letter to a remote resource is often called “mapping” the drive letter.) Further details on setting up this sort of file sharing may be in the section about filesystems provided over networking.

For Microsoft Windows, some software may work with folder names that are specified by UNC paths that point to a shared resource. Software that has been known to provide this functionality, directly supporting any UNC paths that may be specified (on the command line), include the robocopy command and, at least in some versions, even the generic copy command.

For other environments (like Unix), or other details about file sharing methods, see the section on transfering files.

[#remvfile]: Removing files

Warning: Users familiar with other operating systems should read the warning in the section about Unix asterisks before trying to use such a wildcard. (That topic also discusses “hidden” filesystem entries.)

[#wipefile]: Wiping files

(Note: there are separate sections discussing the recoverability of wiped data and wiping media. Since some Unix-like operating systems may treat a file system/partition like a device, the section on wiping a device like media can apply to wiping a file system.)

Wiping files in Unix
Wiping files in OpenBSD
rm has a -P switch. OpenBSD “manual page” for rm.
Wiping files in Linux-based operating systems
OpenBSD101.com's security page has stated, “Most of the Linux distros ship with a nice file wiping utility called shred.” There is documentation available, including HTML page for the GNU shred command from GNU Coreutils and other options shown by GNU Coreutils documentation.
Wiping files in Windows
[#sdelete]: Using SDelete
SDelete from Sysinternals (available from the home page for Sysinternals SDelete) may be an option. See its home page for some further details.
[#cipherw]: Using Cipher /W

The cipher command built into Windows 2000 Security Rollup Package 1 and Windows XP (and Vista and presumably anything newer than Win XP) is available to remove a mount point. The syntax is:

Cipher /W:mountPoint

Although “ cipher/? command may specify that a directory can or should be provided. However, that is because mount points (in Windows 2000 and XP and newer) can be specified to locations where they appear like folders. Careful reading of the description shows that an entire volume is wiped with this command. So, although this has the advantage of being a tool that may come with the operating system (or a security rollup), it isn't nearly as flexible as the downloadable SDelete.

Deleting filesystem entries
DOS

Internal command del or erase. Traditionally these commands might not be able to remove directories, but there are internal commands available to be able to remove subdirectories that are empty. External commands, which may be able to remove subdirectories that still have content in them (which is an action that also removes all of the content of those subdirectories) may include MS-DOS's deltree, or an xdel command. Newer versions of Microsoft Windows may support “ rd /s /Q subdir ” as an option.

Jason Hood's page for Delen, Wipe, and XRD has xrd.

Unix
rm may be used to remove a file. (Subdirectories may be removed using rm -R (to recursively remove), or rmdir which removes subdirectories that are empty.)
Microsoft Windows

For most instances, the instructions for DOS will work well. Simply open up a command prompt (using CMD for newer versions of Microsoft Windows, or Command.com for older versions; some versions contain both. Or use PowerShell.) Then use the instructions described in the section about DOS.

[#mswpthln]: Microsoft Windows (filename) path length limit

Some versions of Microsoft Windows have a limit that prevents easy access to some actions when handling files. For instance, a file may be able to be created, but then deleting the same file may be challenging. This limit has also been known to affect some other actions, such as using some software designed to handle the task of backing up files.

MSDN: MAX_PATH says, “The shell and the file system have different requirements. It is possible to create a path with the Windows API that the shell user interface is not able to interpret properly.”

MSDN: MAX_PATH identifies the limit has 260 characters, including a terminating NUL character. (Some places document this as 259 characters, not including that NUL character.) The required drive letter (which may vary in its value), a colon, and initial backslash are part of those 259 characters, so there are 256 characters that are part of the path after the first backslash.

The best way of handling this is to not have such long paths. Experience indicates that such filenames are often unnecessary in the long term, and such files may often be automatically created for some purpose which is intended to be temporary. Such useless files are often given long filenames, but then the files may not be as easy to delete.

Before deleting the file, consider whether the data may be desirable. If so, moving/renaming the file may be more desirable, or copying the file before deleting it. Much of the rest of this text will seem focused on deleting such files.

SuperUser.com question about deleting files with too long of names provides multiple answers, including:

Unfortunately, comments about most of these various approaches have been mixed. Some results are quite positive, while others aren't. It appears that some solutions work fine for some people, but not other people. (Perhaps a key difference is which version of Microsoft Windows was used.) At the time of this writing, other suggestions weren't disputed but also didn't have any comments, including positive comments providing additoinal verification. Therefore, this guide doesn't have just one answer that is universally recommended. Experimentation with multiple potential solutions may be needed until one works.

The best way to handle this is to strive to simply avoid the issue.

Marking files for deletion

With some interfaces, files may often be moved to a containing object which might have a name such as “Recycle Bin” or “Trash Can”. Such files continue to use up disk space until they are actually removed (when a user decides to “empty” the “Recycle Bin” or other such object.)

In Microsoft Windows, this behavior can be noticed by Windows File Explorer (which has been named “Windows Explorer” in some versions of Microsoft Windows) and also the third party software known as 7-Zip's file manager. In both of these cases, using the “Delete” key (one of the keyboard keys) will show a dialog box which leads to attempting to move the file to a “Recycle Bin”, while holding Shift and pressing the Delete key will instead show a dialog box that leads to the file being deleted.

[#mvrenfil]: Moving/renaming files

In some cases, a request to “move” a file may actually be performed by renaming the file, making these two functions nearly identical in nature. In other cases, there may be significant differences between an attempt to rename a file and an attempt to move a file.

Renaming files

The classic file renaming operation simply changes the filename metadata for a file. If this fails, then the old file typically remains unchanged. That failure result can be better than what happens with some implementations of a command to move a file, and so an attempt to rename a command may be safer. Also, renaming can be much faster than an implementation of moving a file that involves copying all of the bytes of the file's stream of data.

However, there is one big reason to use a command to move a file rather than to use a rename command to simply change the filename: sometimes the command to move a file will work while an attempt to simply perform a file rename operation may fail.

Renaming a file typically requires that the file's destination is on the same mounted volume (a.k.a. being on the same device and partition, a.k.a. being on the same “mount point”/“drive letter”). Even more restrictive, the classic ren command from DOS's COMMAND.COM, and even the similar command from the much newer cmd.exe from Microsoft Windows (tested with Vista), is not designed to change which folder a file is in. (A way around the must-be-in-the-same-folder limitation is to use the ren command from a third party shell from JP Software. However, even that implementation will not work if the file's destination is on a different hard drive than the file's original location.)

Renaming files in DOS and similar

Those who try to rename a file across folders in a DOS-like environment may have unexpected results. For example:

C:\MSDOS\>ren C:\COMMAND.COM MYFILE.COM

Unlike a copy command (or perhaps a move command), this will not assume the destination folder is the current folder. It will locate the original file and then rename it to MYFILE.COM in the original folder where it was located.

Users of JP Software products can work around this by specifying a path in the destination, such as .\destfile instead of using something like just destfile as the destination filename. Without specifying the folder, the JP Software product will act compatible with the implementation expected from a standard command.com or similar/compatible command line interpretor, and so will have the destination file be placed in the location of the original file.

Moving files

Beware that some implementations of a move do NOT really perform any verification that the file was copied before deleting the source file. (This is true of the MOVE.BAT command described below. It has been said (by multiple technicians... It would be nice if there was a citation here to an authorized source...) that this is also true of some Unix mv commands.) A command designed to “move” a file may even output an error message showing that a file failed to be placed in the destination, and yet still be so carefree as to remove the source file, effectively deleting what may be the only copy of the file. Another possibility is that some move commands may allow a syntax that leads to all files being moved, sequentially (one at a time) to a single destination filename. By doing this, the second file that is moved will overwrite the first file. This is generally a mistake leading to data loss. In either case (whether a command failed or whether multiple files try to use the same destination filename), a file renaming operating may be safer for this one simple reason: a failed operation is more likely to leave the original files in tact. Finding out what file, if any, was moved or renamed may take a bit of time, but actually having a file being accidentally deleted is less likely with a rename command than a move command.

A move command generally has one signficant advantage over a file rename operation: it works in more cases. Furthermore, an intelligent move command may be able complete in about the same amount of time as a rename operation, because the intelligent implementation of a move command may check whether a file rename operation can be completed and, if so, perform that faster operation.

Generally, one option is to copy a file and then, after verifying that the copy was successful, delete the source file. This can be done if one knows of a copy command and an erase command (and some technique to verify that the data transfered okay). However, that process (copying, and deleting only after possibly verifying the source was performed) is often supported by a single command which is called a move command.

The command in Unix is mv.

In versions of MS-DOS prior to 6.0?, which was the first version to have a move.exe, the most popular method may have been to use a move.bat file. It generally consisted of three lines:

@Echo Off
copy %1 %2
del %1

Minor variations may exist (such as using @Copy instead of copy and @Del instead of del and skipping “@Echo Off” entirely, although that variation may not work with very old versions of the operating system).

Note that this batch file version does not check whether the destination file was successfully created. If the copy fails because the destination is too low on disk space, the delete operation still occurs. It is, therefore, recommended to use a safer implementation such as something made by JP Software, included in FreeDOS, or included with an operating system such as MS-DOS 6.22. (Attempting to check for the destination file by using batch file commands may not be as easy as one initially thinks. Although one could try something like if exist %2 this would not take care of situations where the destination is a directory or where the destination is implicitly implied.

Linking
Soft links/hard links/junctions
In Unix

This can be created with the ln command. The syntax:

ln -s srcfile oneOrMoreSpaceSeparatedDestFilenames

For example,

ln -s srcfile destone desttwo

The above syntax creates two symbolic links. The first one is called destone and the second one is called desttwo. If there are additional space-separated filenames, additional copies of the symbolic link occur.

Leaving off the -s parameter results in a “hard link” instead of a “symbolic link”. Symbolic links are like pointers to other objects on the file system (which are typically files or directories, although something like /dev/cdrom might, in come cases, be a symbolic link to another device object). The “hard links” are cases where multiple files, each with their own metadata (especially filenames and the full file locations) end up pointing to the exact same stream of data.

Cross-linked files in MS-DOS (or OS/2 FAT drives)

It is possible to have multiple directory entries point to the same file, much like a hard link in Unix. However, any occurrences of this are considered to be an error in the FAT-based file system, and this may be somehow “corrected” by filesystem repair software including chkdsk and SCANDISK. For instance, such data may have one copy saved (like a regular file), and other “copies” of the file may be deleted.

Windows NT/2K/XP/newer

This may be called a “Junction”. Further details are not currently available right here.

[#fileattr]: File system attributes/permissions/ownerships

As noted in users: basic operations, the permissions used on filesystem objects may often be similar to permissions used for other things. This section focuses mainly on metadata used with filesystem implementations.

[#ownerval]: Ownership

Note: the term “own” is not necessarily meant to suggest any real legal ownership. For instance, a user account may be considered to be the “owner” of a file, although the term “own” should only be used with the intended meaning which is to be used as a technical term. This term simply refers to the name of a setting. The person who typically uses the user account might not, in any way, shape, or form, actually have any sort of legal entitlement to have any control over the data (including being able to access the data).

The same would also be true for any other sort of thing that may have an “owner” setting. Examples may include another filesystem object (like a directory), and/or a process (a running instance of a program), and/or certain operating system settings (such as a setting in the Registry of Microsoft Windows).

In Unix: Ownership and permissions
[#ovunxown]: Overview of Unix ownership/permission metadata

Ownership may refer to UIDs and GIDs. Note the disclaimer about ownership: ownership is not being meant as a legal term.

Unix filesystem permissions assign each filesystem object (like a file, or a directory, or a symlink, or a device in /dev/) an owner, which is a UID, and a group, which is a GID, and also other meta-data such as individual permissions.

When a user authenticates, to determine what permissions are available, the first thing that happens is to place the user into a category. (These “categories” might also be known as “classes”, although it has nothing to do with a user's classification, often abbreviated as “class”, as specified by things like the passwd file and/or login.conf file. So, to minimize any further confusion, this guide will keep using the term “category”.) The three types of categories are most commonly named “user”, “group”, and “others”.

If the user's UID matches the UID of the filesystem object, then the user is called the “user” who is the owner. Otherwise, if the user is a member of the group related to the GID for the filesystem object, then the user is considered to be a member of the object's “group”. Otherwise, the user is considered to be part of the category known as “other”. Sometimes this category called “other” is also known as “world”, so a “world-readable” or “world-writable” file refers to the permissions for this “other” category. (However, despite the term “world” being somewhat common usage, the chmod command's symbolic mode is based on the word “other”, so the term “other” most certainly should not be dismissed as irrelevant.)

Individual permissions are assigned to these different categories: “user”, “group”, and “other”. Those terms are preferred mainly because they match some traditional command line parameters that may be provided to (and accepted by) the chmod command. So, before investigating permissions for a filesystem object, it makes sense to first clarify the ownership settings, as they have a major impact on how the permissions are interpreted.

It is uncommon, but possible, to have the “group” category be given less permissions than the “other” category. It is also possible, but even less common, for the “user” category to currently lack a permission that is provided to another category. Therefore, in Unix, all permission handling needs to start by determining what category a user is in, so the ownership values must always get interpreted first.

To contrast, in NTFS, ownership may often be a second thought which tends to only be effective if looking at an ACE where it matters, like the “CREATOR OWNER” ACE. (Also, in NTFS, deny permissions may be relevant due to the higher priority given to “Deny” beyond just simply not choosing to “Allow”. In Unix, such an act to active “deny” permissions (even if the permission is provided by other permissions settings) doesn't happen, simply because Unix will just be applying one set of permissions. The Unix partition scheme may be somewhat less flexible than NTFS permissions, which allow more complex ACLs that easily allow multiple groups to have quite different permissions (such as giving one group only write access and another group only read access). (A way to provide more flexibility may be to have files be stored remotely, and then to use a more complex permissions scheme offered by a file sharing protocol.)

[#unxhownf]: Interacting with ownership (in filesystems) in Unix

A discussion of how these are impactful is in the overview of ownership settings.

Viewing ownership

Use:

These are typically shown in a “long”/“full” directory listing, such as what is shown when using “ls -l ” (or another long variation, such as “ls -la ”).

In this long format, the first thing that is shown (probably due to being a set width) are permissions settings, although those settings aren't entirely useful to know until considering the value of the UID and, then, if needed, the GID.

The second column of the output may be a number of hard links, which is unrelated to ownerships/permissions. So, feel free to ignore that number.

The third field shows the user who is the owner of the file. The owning user is actually a UID, which is a number. However, typically the program displaying this information (e.g. ls) will attempt to look up a username that is assigned to the UID. If one is found, then the username is shown instead. Note: if the hard drive is viewed from another operating system, possibly booted from a different computer, then the same UID will be used, although a different username may be used.

The next column is the group. Similar to how the information about the owning user is handled, the program displaying information (e.g. ls) will attempt to look up the group, and if a name is found then the group's name, instead of the numeric GID, will be shown.

If the username(/UID) matches the group name(/GID), that is simply coincidental. This may often happen in some operating systems, such as OpenBSD, which may create a group named after each new user. However, even if both of those names look the same when displayed, the operating system will treat the username and the group name as separate data.

Controlling ownership
Setting the owner (UID)

Setting the UID may be done with the chown command. OpenBSD's manual page for chown notes, “Only the superuser is permitted to change the owner of a file.” (Other users may change the owner of a file if they run software as a superuser.) Solaris documentation about “Changing File Ownership” says, “you can enable the owner to use the chown command by” changing the default behavior. So, the precise requirements may vary based among different implementations.

chown owner file optionalMoreFilesToChange

The term owner (and later parts of the command line) should be customized. The specified owner may be a username (such as “root), or a UID (which would be a number, such as 0). The actual data written to the hard drive will be a UID, but for convenience the program allows a username to be specified on the hard drive. (The username is then looked up, and the UID is then stored.)

A common variation is to also include information on the command line to change the GID. (That is discussed further in the section about changing the GID.)

There are some additional command line parameters that may be specified right after the word chown and before the next command line parameter (which specifies any new value that is going to be set). Those command line options may affect things such as recursion (so that the contents of directories, including all subdirectories, may be affected) and how symbolic links get handled.

Setting the group (GID) for a filesystem object

A user who is the direct owner of a file may change the GID field by using chgrp, a command which is often forgotten or unknown by people who have simply become familiar with the more versatile chown command.

Perhaps more specifically, OpenBSD Manual section 2: chown states, “The owner of a file may change the group to a group of which he or she is a member”. Solaris documentation about “Changing File Ownership” says, “the owner can only use the chgrp command to change the group of a file to a group in which the owner belongs” although this is only the behavior that is enforced “by default”, and it is a behavior which may be changed.

Using chgrp to change a GID

The syntax for chgrp may be identical to using chown except that the group name does not need to (and must not) be prefaced with a special character (like a colon).

Using chown to change a GID

The precise syntax of the chown command may vary a bit. To change both the UID and the GID of a single set of standard files, the following is a syntax that is used with many operating systems:

chown owner:group file optionalMoreFilesToChange

The chown command may be able to set a UID, set a GID, or set both a UID and a GID. This involves using a syntax that identifies groups by including a special character just before the name of the group. In modern implementations, that special character tends to be a colon. In some older implementations, that character may be (and perhaps must be) a “full stop”, which is generally imlemented as a period. Debian bug report about chown's separator (Message #22) describes this change: a period is “ambiguous in the presence of usernames containing periods. Since a username on UNIX is never permitted to have a colon due to the format of the passwd file, colon is a better delimiter.”

To set just a GID, include the field-separating character before the group name, and simply leave off any reference to the owner. So, something like the following might work:

chown :wheel file optionalMoreFilesToChange

Similar to the act of just setting a new UID, the command may support command line parameters to affect things like recursion and handling of symbolic links. Also similar to the act of setting a UID, the data that gets stored on the hard drive will simply be a GID, even if a group name was given on the command line.

Affecting mount points

The mount command may use specified permissions for the mount point, instead of just relying on the permissions of the directory. This information may not apply to some filesystems. An example of a filesystem where this may apply is the FAT filesystem, which may be called “fat” or some other name such as “vfat”, “dos”, or “msdos”.

For example, OpenBSD's manual page for mount_msdos mentions being able to use the -u option to specify a UID, the -g option to specify the GID, and the similar/related -m parameter to affect permissions (which is discussed more in the section about permissions).

Permissions in Unix

Permissions are heavily affected by the ownership values. This is discussed more in the section about the overview of Unix ownership settings. To elaborate slightly: Permissions, like the ability to read a file, are assigned to a file's “owner”. So determining which username is the “owner” of a file certainly can impact whether a user is allowed to read a file.

Viewing permissions
The basics

These are typically shown in a “long”/“full” directory listing, such as what is shown when using “ls -l ” (or another long variation, such as “ls -la ”).

The most common attributes, as commonly output by the “ls -l” command, are drwxrwxrwx or, as an even more likely alternative, hyphens replacing one or more of those characters.

The first attribute, the letter d in the above example, specifies that the file is a directory (also known in some graphical environments as being a “folder”). The most common alternative possibility of the first character, generally more common even than being a d, is for the first character to be a hyphen (“-”). Another possibility that may be seen, and which is generally not particularly alarming, is l which specifies a symbolic link. The most common location for filesystem objects with a different first character is in /dev/.

After the first character, the remaining attributes in the example just mentioned refer to permissions.

Note that some attributes may show some other characters. These can affect security and may be worth knowing about. Before discussing such less common settings in length, though, this guide will cover the most commonly seen values.

After the filesystem object's type (denoted by the first character), the next three characters are the permissions related to the “user” who is the “owner” of the filesystem object. The next three characters after that are permissions related to the “group”. The final three characters shown are permissions related to the “other” category, which is sometimes referred to as the “world”.

For each category (user, group, and owner), the first of the three characters will typically either be an r or a hyphen (“-”). This indicates whether standard read-permissions are available. The second character will typically either be a w or a hyphen (“-”). This indicates whether the content is writable. The third character will typically either be an x or a hyphen (“-”). For a file, this typically indicates whether the file may be directly executed from the command line. For a directory, this typically indicates whether the contents of the directory may be listed.

Requirement(s) for running a script

Note: A script file may be executed if a user has either read permissions or execute permissions. If the user has just execute permissions, the user may not be able to copy the script file nor view its contents, but the user can run the command from the command line, using a simple syntax such as:

somedir/script

If, however, the user just has read access to the file, then the user can copy the file, and view its contents. The user may then run the desired commands. Also, starting a command line with just a period, followed by a space, and then the script's location may cause the script to be executed (at least by Bourne-compatible shells). Only read permissions are needed to run a file in this manner; execute permissions are not needed. The following is an example of what such a command line may look like:

/ somedir/script
Other variations

Other attributes, such as a sticky bit and/or setuid/setgid, may also affect one or more characters in the output. As a generalization, such values are not common, and might be a security vulnerability that may be pretty bad if exploited. Therefore, checking into such values may be very worthwhile. (However, sticky files had to do with memory utilization, and not security. OpenBSD's manual page for “sticky” indicates the sticky attribute may have no effect on files. Sticky directories, on the other hand, may have an effect. It may be common for /tmp/ to be a sticky directory.)

The setuid, setgid, and sticky bits are assigned to filesystem objects, but aren't really related to any of the user categories (of “user”, “group”, or “others”). However, the output of the “ls -l” command does cram this information into the first ten characters of the line of output. Here is how that works:

  • If an s shows up in the display normally used for the user's execute bit, the file has the setuid bit set, and the executable bit is not set.
  • If an S shows up in the display normally used for the user's execute bit, the file has the setuid bit set, and the executable bit is also set.
  • If an s shows up in the display normally used for the group's execute bit, the file has the setgid bit set, and the executable bit is not set.
  • If an S shows up in the display normally used for the group's execute bit, the file has the setgid bit set, and the executable bit is also set.
  • If an t shows up in the display normally used for the other's execute bit, the file has the sticky bit set, and the executable bit is not set.
  • If an T shows up in the display normally used for the other's execute bit, the file has the sticky bit set, and the executable bit is also set.

The above rules are just based on how the information is displayed. It is likely not meant to suggest that the “sticky” bit is somehow more related to the “others” category than another category like “group”. Rather than trying to interpret any specific meaning, it may be best to just accept that the way things were displayed is simply the method that gets used to display this information. In fact, with the chmod mask/absolute mode syntax, the setuid and setgid and sticky bits are assigned values that show up before any of the “user”, “group” or “others” permissions. As a generalization, there isn't a need to combine them, just because combining them is something that ls -l happens to do.

Reviewing all of the possibilities may be covered by the manual pages for ls and/or commands like chmod, or other man pages such as “sticky” (e.g. OpenBSD's manual page for “sticky”). Details may vary among operating systems, which is why this generalized documentation doesn't delve into the details. Be prepared, though, that reading about the details may commonly take a little while (a few minutes).

(This text really hasn't delved into the potential dangers of these bits, such as if a command line shell had a setuid bit that allows a user to gain root access.)

Setting permissions

The recommended way to set permissions is to use chmod after having an understanding about ownerships, discussed in the overview about ownerships, and also an understanding about of the permissions basics, which are discussed in the section about viewing permissions.

Symbolic mode

Symbolic mode may be easier to understand when being read. Studying an example will likely be easier to initially understand than studying the syntax, so here is an exmample:

chmod u+x,go-wx filespec optionalAdditionalFilespecs

The u+x in the above example provides the “user” (who owns the filesystem objects that get specified later) with execute permissions. After the description of some permissions changes that should be set, a comma indicates that there is another set of permissions changes that need to be set. After the comma, the go-rx indicates the following change: for the “group” and “other” categories, the “write” and “execute” permissions will be removed from any filespecs that get provided later.

In this section of the command line, where permissions are being specified, each letter before the arithmetic sign is treated as a single letter with an individual meaning. Each such letter is the first letter of either the word “user” or “group” or “other” or “all”. The word “all” will have the same effect as specifying “ugo” before the arithmatic sign.

English users must keep in mind that the letter “o” stands for “other”, not “owner”. (This is critical to keep straight, because often the permissions meant for the “user” would not be appropriate for the category known as “other” (or “world”). (Perhaps the reason that the word “other” has continued to be supported instead of “world” is because the initial of the word “world” would match the initial of the word “write”, and having the same initial mean mulitple things could make things even less clear.)

(However, to complicate this generalization, the Microsoft Windows command nfsadmin may provide some support for Unix-style permissions, but may use “o” for the user who is the owner, and may use “w” for the world of others. e.g.: TechNet: Specifying default UNIX permissions for new files.

The plus sign adds a permission which may be missing; the minus sign removes a permission that might exist. Another option may be an equal sign, which removes all permissions (for the category/categories being affected: user, group, and/or others, or all), and then only adds back the permissions that are specified.

The first filespec parameter is required; more may be specified on the command line if desired.

Mask/absolute mode

Many Unix users prefer this mode as it is shorter to type. Consider the following example:

chmod 0754 filespec optionalAdditionalFilespecs

The first character, a zero, specifies normal permissions, and is typically what is desired. Alternatives may involve things like the Set-user-ID bit, which is not typically desirable. (Details might also vary among operating systems?)

The next character, the digit 7 in this example, affects the permissions for the user who owns the filesystem object.

The next character, the digit 5 in this example, affects the permissions for the group assigned to the filesystem object.

The next character, the digit 4 in this example, affects the permissions for the “other” group, also known as the “world”.

The digits each have a specific meaning which can be created using the following method: The number 4 represents read access. The number 2 represents write access. The number 1 represents the “execute” permission. (The other of those numbers, as just mentioned, matches the order shown by ls -l output.) Those numbers may have boolean inclusive OR logic applied, which basically means that non-repeated numbers may be added together. So, the number 7 (which is 4+2+1) would represent providing read access (4) and also write access (2) and also the “execute” permission (1), while the number 5 (4+1) would represent only read and execute permissions. The number zero would represent not having any such permission.

Masks

On occasion, some software may have configuration that allows a mask to be specified. Examples may include a program to mount a FAT drive, or supporting a “remote” filesystem (like Samba).

Such a mask would batch the formatting style used by chmod's mask/absolute mode. To understand that syntax, understand the overview of permission ownerships, and then how to interpret permissions values that are set, and then see details about how to set permissions using chmod's mask/absolute mode.

Setting permissions for a mount point

The mount command may use specified permissions for the mount point, instead of just relying on the permissions of the directory. This information may not apply to some filesystems. An example of a filesystem where this may apply is the FAT filesystem, which may be called “fat” or some other name such as “vfat”, “dos”, or “msdos”.

For example, OpenBSD's manual page for mount_msdos mentions being able to use the -u option to specify a UID, the -g option to specify the GID, and the similar/related -m parameter to affect permissions.

The parameter to specify after -m is a mask, which would match the formatting style used by chmod's mask/absolute mode.

Inheritance

The permissions of parent directories do matter. (This may be different than NTFS, where the permissions of parent directories might be inherited. In Unix, they are.) OpenBSD's manual page for chmod provides an illustration: “In order to access a file, a user must have execute permission in each directory leading up to it in the filesystem hierarchy.  For example, to access the file /bin/ls,” (the needed) “permission is needed on /, /bin, and, of course, the ls binary itself.”

As for newly created objects on the filesystem, apparently the default may be affected by things like whether a directory has the sticky attribute set. OpenBSD's manual page for “sticky” may provide more insight. In most cases, where the sticky attribute is not set, Wikipedia's page about Filesystem permissions: section about traditional Unix permissions states, “When a new file is created on a Unix-like system, its permissions are determined from the umask of the process that created it.” (Therefore, running software as a superuser can often create files that are less accessible to a standard user that ran sudo unless, of course, the user again performs the process of running software as a superuser.

Similar

For some other information that may be related/similar, see: OpenBSD Manual Page for the chflags command, and/or documentation about umask.

Overriding ownerships/permissions

Running software as a superuser is often a method to temporarily workaround restrictions that may exist with permissions.

Note that this workaround might not always work. It may be possible for a program to drop (escalated) privileges, and then the program may later be affected by permissions limitations, even if the program was initially started with higher permissions. If this happens, the limitations are by design, and the preferred solution is to make sure that permissions are set as desired.

An example of changing ownerships/permissions may be seen at Getting OpenBSD with source code: section on checking permissions for CVS.

In MS-DOS

FAT drives support at least these five attributes: Directory, Read-Only, System, Hidden, and Archive. For the most part, these attributes may be viewed and modified with a command called ATTRIB. The “Directory” attribute is often treated different from the other attributes (often by not being listed as one of the attributes).

There is typically no sense of a file's “owner” when the file is on a FAT drive, and for the most part permissions are not handled through a standardized, centralized method. For operating systems like BSD operating systems that may interact with a FAT drive, and which need to act as if files have an owner, every file on the entire drive may be treated as if it had the same owner, which may be temporarily assigned at the time that the drive is mounted. Some versions of Linux supported using files to store this information, which is not typically stored on FAT drives: The name of this implementation was UMSDOS. (For some more information, see: Wikipedia information about UMSDOS.) UMSDOS is mostly not supported by modern distributions.

However, interacting with a file generally requires that the programs have an open “file handle”. At least some systems may have a SHARE command. This command denies a program's request for an open file handle when the specified file is already in use by some other software in a multi-tasking environment. That is the closest thing to standardized “permissions” in this single-user operating system. (If memory services correctly...) This may have come with an operating system, and/or included with an add-on such as Microsoft Windows 3.x. Possibly review the following files from Microsoft's old FTP site: WD0659.TXT: Questions About Share.exe, and WW1000.EXE: Updated VSHARE.386 for Windows

OS/2: filesystem attributes
OS/2 might support more attributes using FAT, by implementing “extended attributes” There is also the HPFS file system.
NTFS: filesystem attributes

Information was here, but has now been moved to: NTFS: filesystem attributes/ownerships/permissions.

[#icacls]: Information on Icacls has been moved. (See: information on Icacls.

[#icacls]: Information on Xcacls.vbs has been moved. (See: information on Xcacls.vbs.

[#icacls]: Information on Xcacls.exe has been moved. (See: information on Xcacls.exe.

[#icacls]: Information on using smbcacls has been moved. (See: information on smbcacls.

[#icacls]: Information on using SetACL has been moved. (See: information on SetACL.

Backing up, restoring, and verifying data

See also: Backup section.

There's generally two main ways to back up data. One is to simply copy the data with a copy command. The other is to use specialized software which is (or will be) covered more in the section about disaster recovery. Simply copying the file is often faster, although it may be easier to forget about that data and the copy uses up disk space. If the extra copy of the data is stored in a location which is backed up by a backup program, then the data may take up additional space in mutiple locations.

Naturally, if the data is backed up with a copy command, the data can likely be restored with an appropriate different command line which overwrites the original file location with the backed up copy. If the data is backed up with specialized backup software, that software typically has a method of restoring the data. (See the disaster recovery section for details.)

Verifying data: Comparing bytes, generating hashes and comparing them (particularly used when the hash can be made locally on both sides but transmitting all the data would be less pleasant). See comparing files.

[#datamod]: Modifying data
Editing/copying a binary file
[#editxtfl]: Modifying/Editing a text file

Some files are text files. (These files also fit all of the criteria of being a “binary file”, unless using a definition of “binary file” that specifically refers to all files other than text files.) Some software is designed to work with just these types of files, so here is a specific section to working with these types of files.

This often-simple task has so many different options, some of which are strongly preferred by some people in some environments, that there is now a separate section about “editing a text file”. Following are some of the sub-sections related to this task.

[#usetxted]: Using a “standard text file” editor

(See the earlier information about editing a text file.)

Choosing: Some recommended text editors
Installing a decent text editor
After choosing a text editor, see the section on installing software.
Using the editors
(See the earlier information in the section about editing a text file.)
[#cnvfmtxt]: Converting between Unix format (ASCII), MS-DOS ASCII format, and Unicode
See: Converting between text file formats.
[#hexedit]: “Hex”/“binary” file editing

Some software is designed to work with a file, and easily allow modification. A popular format of displaying the file's data was to show some amount, perhaps sixteen, of bytes in a row, and to display the hexadecimal numbers that correspond to each of those bytes. Such editors became known as a “hex editor”, and so editing the binary bytes of a file became known as “hex editing” a file.

Some of the more compatible solutions
Text/console and possibly untested
Command line
bined
(Untested:) bined is a Perl script.
Full screen text mode
TDE: the Thomson-Davis Editor

“TDE: the Thomson-Davis Editor” is public domain with available source code, and is available for DOS, Win32 (console), and will apparently work well in Linux (using Ncurses). It's full-screen interface may be nicer than may of the other options that use a text mode. (Help may be reached by pressing Ctrl-\, followed by left arrow.) This software supports regex.

Those are the features and good news. The bad news is that, although it is marketed as being a binary file editor, any real support for hexadecimal is, or appears to be virtually non-existant. There is a command line option to specify that a file is a binary file, but the search strings do not seem to support specifying hexadecimal characters. Furthermore, the command line can search for a string. If the string is found, the full screen editor will appear showing the string highlighted. However, this seems quite limited in how well it would perform an entire “search and replace” operation using just command line parameters, and especially in non-interactive enviornmnets.

Various versions exist: The binary executables package for 32-bit Microsoft Windows also includes DOS executables. Run tder.exe for the Real-Mode DOS program, tdep.exe for the Protected-Mode DOS program (which requires a 32-bit platform), or tdew.exe for the Microsoft Windows executable. Running tdv should be roughly equivilent to using the -v parameter with one of the other executables.

Here is an example, which was designed to look for the string “BC” within a file:

tde -b -G B[\:h41x] filename.txt

Actually, it looks like the \:h might refer to the set of hexadecimal characters: 0-9 and A-F. So referencing the hexadecimal character does not appear to be a unique/separate command line option. The [\:h41x] looks for a single character, which has a hexadecimal value of 41 (which is a capital letter C).

(The previous home page was at http://adoxa.110mb.com/tde/ but that does seem to have moved.)

HT Editor

The HT Editor exists for Unix-like environments as well as Microsoft Windows. At least in Microsoft Windows, this is a text mode editor that provides a “full screen” interface (meaning that it effectively uses up the full window).

Multiple names?

In some cases, the program's package and executable filename may be named ht. As noted by review of hex editors for Linux and FreeBSD, the program is called “ht” for FreeBSD. (However, the command lien programs don't match the package names: the program notes an “ht” package and an “hte” executable for Debian, and “hte” package and an “ht” executable for FreeBSD. Since that mismatches, something probably got swapped.)

Usage

In Microsoft Windows, the menus can be accessed by holding Alt and pressing the access key. This is similar to CUA expecations (described by user interface basics/details), although just pressing Alt first doesn't work. Under the “Windows” menu, the Tile submenu shows an arrow. Pressing right arrow won't work to expand the sub-menu (but pressing T, or highlighting it and pressing Enter, will). There is a “Find” command available with Ctrl-F, although that does not seem to be on the menus. (Replace, available with Ctrl+E, is on the “Local-Hex menu.)

The package for OpenBSD 5.1 seems broken.

(Unsorted)
BEAV: Binary Editor and Viewer

BEAV 1.4 might be one of the most compatible options, and documentation states “BEAV source and executable can be freely distributed for non-commercial purposes”. Using Esc followed by ? may provide help. (To quit, try Ctrl-C, or Esc and then Ctrl-C, or Ctrl-X and then Ctrl-C.)

[#phexedit]: Hexedit
This general-sounding name, HexEdit, might be used by multiple programs. The program described by Pixel@Rigaux's page on HexEdit (also mirrored at Pixel@Rigaux's page on HexEdit, and Chez site's page on Pixel@Rigaux's HexEdit (mentioned by the OpenBSD 4.6 package description for hexedit), Pascal Rigeaux's hexedit) is available for OpenBSD, which is one indicator that often suggests that a program is fairly portable. A webpage about a patch for hexedit describes an option for searching for “bits at non-byte alignments”.
Hexcurse
Using a text library called ncurses, this software is available for OpenBSD. OpenBSD/i386 4.6 packages section: page about Hexcurse referred to a URL at http://www.jewfish.net/description.php?title=HexCurse (which doesn't seem to work).
bvi

Software called bvi is meant to be like vi and exists for OpenBSD, so it is probably fairly portable software.

GHex

Users of a graphical interface may find GHex - a hex editor for GNOME to be available for some platforms. (The program's home page has a title of “Ghex”, but the web page itself tends to refer to the program using a capital letter H.) A GHex package exists for OpenBSD, so the software probably works on a fair number of Unix-like platforms that use GNOME.

Hex Editors designed to require graphical environments
wxHexEditor
(may still be in a beta state) for RPM, Mac OSX, and 32-bit Microsoft Windows. The program uses the GPLv2 (with the option to use a later version), and code from hashlib++ and Udis86 which are “used under BSD license”. Users of Microsoft Windows may want to get the program by going straight to the files on the SourceForge project page related to wxHexEditor.
Hex editing in 16-bit DOS and 32-bit Microsoft Windows

First, a couple of options that aren't recommended are presented, because these options do come with some operating systems.

MS-DOS Editor 2
The editor, which is an upgrade from MS-DOS Edit, was available with 32-bit versions of Microsoft Windows (but the technical requirements did not require anything more elaborate than 16-bit DOS). The software may provide some rudimentary ability to edit a byte in a file. This is particularly true if using a numeric parameter such as /1021. (Further details may be provided later.) An advantage to this software is that it comes bundled with some operating systems, so no special installation is required. However, this has not (at the time of this writing) been heavily tested by the staff writing this text. It is not known if inserting a byte (and thereby altering how long a single line is) works well every time, or even any time. So, use at one's own risk. TOOGAM's Software Archive: page on Bit Manipulation Tools (Binary File Editors) may discuss this slightly more.
debug.exe
TOOGAM's Software Archive: page on Bit Manipulation Tools (Binary File Editors) may discuss this slightly more.
Third party options
TOOGAM's Software Archive: page on Bit Manipulation Tools (Binary File Editors) lists some options, such as “Hex 5.1a” (the older version of “Hextool”).
Hex Editing in Microsoft Windows

Users of 32-bit Windows may be able to use software in the section described for users of DOS and 16 or 32 bit versions of Microsoft Windows. Some other examples, which may work better for 64-bit versions of Microsoft Windows, may include the following (which may not have been verified):

  • shed (simple hex editor)
  • Free Hex Editor (Frhed) (Wikipedia page of Frhed),
  • Binary EYE (BEYE),
  • Bless,
  • DiskProbe (Dskprobe.exe) (mentioned by TechNet guide: Troubleshooting Disks and File Systems and Microsoft KB Q913964). This comes with the Windows Support Tools (most probably for Windows XP, which the documentation was written for, but probably also for some newer operating systems). A warning, however: the program edits the disk's data directly, rather than editing a file. Therefore, data loss could occur on GTP disks (as noted by the referenced TechNet article). The result is quite condemning: Not just the file is affected, but “Making direct changes to GPT structures could cause the partition table checksums to become invalid, rendering the disk inaccessible.” Microsoft's recommendation (given just a bit earlier in the text) is simply, “Do not use DiskProbe on GPT disks in 64-bit computers.”
“Hex Editors”/“Character/String Replacers” for Unix
“Character/String Replacers” for Unix
tr
...
bbe : binary block editor
bbe : binary block editor may be similar to sed
Unix Hex Editors designed to require graphical environments
Ghex
...
Okteta
(may be a part of KDEUtils: a Microsoft Windows port may be at Windows.KDE.org)
KHexEdit (for KDE)
Other hex editing software/details
Wikipedia's comparison of hex editors may show some other software, and show some details about many pieces of software.
Modifying the contents of a disk
One way is to edit the disk sectors that contain the file's information. This may not be anywhere close to the most convenient way to modify a file, especially if fragmentation must be manually dealt with and most especially if the data is stored with encryption.
Patching
Patches may allow specific changes to be automated. The dd command might also be able to be used. Text editing software may or may not be a method that will work well: in some cases text editing software may end up corrupting non-text files. bsdiff/bspatch exists for OpenBSD, so it is probably fairly portable software, although the home page notes that “bsdiff is quite memory-hungry”.
Editing contents of memory, or data stored in long term storage (on a disk)
This is being referenced here since it involves editing data, and so it may be similar in nature to editing text files. However, this section is basically about editing files, so for this sort of related information check out the relevant sections in the handling data page. (A more specific hyperlink may be available in the future.)
[#viewfile]: Viewing files
Some various types of options

One option may be to use a file editor: either a simplistic text editor, or a more advanced program for editing textual content (a word processor), or a binary file editor. Another option may be a file viewer such as a web browser. Such options may be heavier duty than the simpler, smaller, and faster options like those which may be mentioned here, but do know that those may be an alternative.

A warning about terminal-affecting codes

This whole section is about (what is hopefully) a fairly rare scenario, but it is a warning so responsibility does dictate that it should be covered. (Temporarily skipping this section may be good if there is time pressure, and if the environment is fairly trusted, and if the material isn't being reviewed for the purpose of helping to pass an test/examination that may cover this sort of material.) This can be useful information, however, for anyone affected by the subject of this topic.

For those using a text mode environment (including terminal software for remotely accessing a system), be aware that the terminal being used can be affected by how some files are processed. This is because some files may end up having data which the default console interprets as code meant to change how the terminal behaves. This can be done maliciously, such as if an ANSI video driver allows certain types of ANSI escape sequences to remap keystrokes. However, perhaps more common is when a VT terminal-modifying codes, probably more likely to be experienced in Unix-type environments, has code that modifies the terminal.

This whole situation is fairly uncommon, but common enough that it may happen (particularly if viewing a binary file from a command prompt accessed remotely), and can be quite unpleasant when it does. The most commonly used vulnerable commands may be more, cat, and type (all of which are described more in following text about using files). A default telnet client may also do this, and the terminal emulation code included with communications programs may intentionally support certain features that can cause this unpleasantness when the supported codes lead to undesired effects. IRC users have been known to intentionally cause such issues, which has been known as a “flash” attack. (That may have been named after http://dnull.com/unix/flash.c (e.g., some Qodem documentation: section on “Deviations from QMODEM” mentions flash.c)

There are reasons why some software will simply pass the code to the terminal and expect the terminal implementation to update the screen as needed. This approach simplifies code development by using what is available instead of redundantly implementing things separately, and this sort of simple approach allows for easy compatibility for various hardware and tasks like output redirection. Some software may intentionally not process terminal emulation codes for the simple theory that any processing done by the terminal/console is probably intentional and desirable. These terminal emulation codes were, after all, designed for non-malicious reasons and intentionally implemented by somebody.

The problems from processing terminal emulation codes can often be prevented in various ways, including using software that parses the content and directly displays the content instead of passing the content to the “terminal” “console”. That will cause such codes to be visibly rendered instead of having the data passed to the direct console output which could cause such code to be processed. Typically whether that happens, or whether the codes end up getting processed, are largely determined simply by what software is being used. Another way is to avoid problems is to use a different terminal emulation, such as changing settings that prevent such changes to be made. For example, the ANSI.SYS file from MS-DOS supports keyboard redefinition codes but other display drivers may not support those dangerous codes by default. Less dangerous codes, like supporting colors, can still be enabled and even those dangerous codes can be supported if non-default options are used.

For MS-DOS and similar environments, there is no real safe series of commands to try to perform without possibly triggering an attack by pressing an affected key. If one suspects keyboard redefinitions occur, the only real way to resolve that is to stop using the affected console. If the console is simply a window in an environment like Microsoft Windows or OS/2 then that may be less painful than a pure DOS session where closing the console involves doing something like rebooting the computer.

For Unix environments, if the only issue is that a VT code caused another code page to become mapped to the terminal, a possible fix is to leave the text file viewer (try pressing q and then holding Ctrl and pressing c and then releasing the Ctrl key), and then run reset and see if that fixes things. If not, and if this is a remote connection, it could be that the client has been effectively remapped and re-connecting the session could help. (If a terminal multiplexer is used, detaching from that terminal multiplexing software could allow one to not need to exit any software that is being run from that session.) If that doesn't work, exiting the login session (by typing exit at the command prompt) may be a good and safe way to go.

older text: Note that some of these options may work better with some files and environments than other options. For example, embedded codes that affect the terminal (such as ANSI keyboard rediection in MS-DOS or VT terminal modifying codes in Unix) may affect the terminal even after the file is viewed, but only if those codes are processed. Some software may simply redirect the codes to the standard output so they get processed, while others may act different so that the codes are visibly shown, unprocessed, onto the screen.

[#catypefl]: Simplistic command line tools for file viewing

Operating systems based on Unix and those based on DOS (including Microsoft Windows) support a command called more. This program affects standard input, so a file may need to be redirected using syntax such as the following (which works in both Unix and DOS environments).

more < input.txt

In MS-DOS, a short text file may be viewed with:

TYPE input.txt

In Unix, a short text file may be viewed with:

cat input.txt
Outputting multiple files
Unix

In Unix, this is definitely handled by the cat program. Use:

cat first.txt another.txt additional.txt

The ability to specify files is, in fact, how the cat command got its name. A technical term for placing one data stream (like a file) at the end of another data stream (like a file) is “concatinate”. The name of the cat command is meant as a reference to the second syllable of the word “concatinate”.

DOS

This might not be supported. Of course, this can be accomplished by using the type command twice.

JP Software's products will allow multiple filenames, similar to Unix's cat.

echo a > a
echo b > b
type a b

... yields the following results

C:\sample>type a b
a
b

C:\sample>

If memory serves well, older MS-DOS versions would not accept the syntax of multiple files.

In Windows 7 (64-bit), the following was run as a test.

echo a > a
echo b > b
type a b

... the results were perplexing:

C:\TEMP\SAMPLE>type a b

a


a

b


b

C:\TEMP\SAMPLE>

These results are clear as mud, and so no attempts to describe them are being made here at this time. (This occurred both with command extensions on and off, using “CMD/E:OFF”.)

Both MS-DOS and Unix provide a method to redirect the output to a pager.

A long text file in MS-DOS may be viewed with

TYPE input.txt | more

This is very similar to a command that will work in Unix:

cat input.txt | more

Another option very standard among Unix platforms is to be able to run the command specified by the environment variable called $PAGER which might, or might not, be the exact same thing as the more command. If the command used by $PAGER is indeed set up and compatible with receiving standard input for text to display, which is common, then the following may work just as well:

cat input.txt | $PAGER

Other options may include using a full screen program which is designed to edit a text file. Briefly, here are some of the options: In MS-DOS/Windows, a full screen editor may be available with EDIT filename.txt ” whereas with Unix a text file editor, which might be full screen editor may be available with “ $EDITOR filename.txt ”. (In Unix, “ $VISUAL filename.txt ” may use a full-screen editor when using $EDITOR may be a line-based editor. Unix variables for editing text files.) These may be just some of the possible implementations of using a text editor.

Surprised that there's so much the same between Unix and DOS/Windows/“OS/2”? There are many other similarities between Unix command lines and Windows command prompts. Examples include:

Note:: This is a tangent and this section is probably not the best place for this material. (Find somewhere else.)

date shows the date, for can perform essentially the same task with minor differences in the syntax, and mkdir, “ chdir .. ” (and probably “cd .. ”), rmdir, and exit are all internal commands that tend to work identically. Both use an environmental variable called PATH that may be viewed with the set command, and external commands which will perform the same task with the same command line arguments include more, “ netstat -na ”, “ netstat -nr ”, “telnet remotesys.example 23 ”, “ftp ftp.example.com ”, and “ nslookup remotesys.example ” which will perform a DNS lookup. The ping is essentially the same (although parameters controlling how many pings occur are different), and Unix's traceroute is similar to TRACERT in Windows. Windows Vista, like Unix, has a shutdown command that takes a t parameter to specify when the shutdown should occur.

Text-mode tools to view files, with paging/scrolling, from the command line
[#pagrmore]: more

more exists in both DOS and Unix. It may be available in a greater number of machines that use DOS, and in Unix it may work with more terminal types than some other options.

Paging in Unix
Environment variable
A standard, which is adopted widely enough that it generally should be followed in order to meet common expectations (possibly by some software), is to point an environment variable called PAGER to a working pager. Therefore, one can run a pager by running $PAGER. If that doesn't work, it may be (probably is) a good idea to fix that situation, by making it work. (See setting environment variables as needed.) For instance, setting the variable to more would work, if the more command exists. (The more command is probably the most commonly available pager, although less may be almost as common and may be nicer to work with.)
more
The more command is generally available in Unix. Although it may not be the nicest pager, it can be a handy one to remember. Not only is the more command available in Unix, and probably more Unix boxes than other alternatives, but also it is available in some other operating systems.
less

A program called less is often available.

Usability

The main thing to know about using less is how to quit. Pressing q key is likely to work. Pressing the [Esc] key probably won't.

Setting the LESSSECURE envrionment variable to a value of 1 (which is a value that is often interpreted as “true” by many pieces of software) may disable some functionality, such as running commands from the pager. This can be desired so that a person might not have easy access to run commands as a different user just because the user is given access to run a pager as that different user.

Humor

The LESSSECURE envrionment variable's name: the variable is designed to help make things more secure whenever LESSSECURE is set to true.

People who prefer less over more may make a statement of:

less > more

(“less is greater-than more.) This seems incorrect if being treated as a mathematical statement of inequality, which is perhaps why that statement is liked. Another statement sometimes made is, “less is more, more or less”. (Or perhaps the intended meaning, at least by some people, is: “less is more, more or less”. Whether the first reference to “more” in that statement is an adjective, or a noun, may be ambiguous and not always consistent.)

Re-usability: The OpenBSD manual (“man”) page for less and OpenBSD manual (“man”) page for more: the files may in fact be hard links to the same inode. Also, the page command may also have a hard link to the same inode, and the OpenBSD manual (“man”) page for the page command may go to the man page used for less and more. Apparently these commands may be so similar that there isn't a need to have separate executables.

Other options
Other commands that might exist include most, which may be like more but with backwards and sideways scrolling, or pg.
DOS

The “more” command is probably available. Other pagers might often not be included with the operating system, although there may be a nice text editor in DOS (perhaps called EDIT) which might be full-screen.

There may be some downloadable options: a command called list might occasionally be found as a command that has been added on some DOS machines. The command will typically run either a built-in command for JP Software products, or Vernon D Buerg's shareware. Both varieties are rather similar to each other, and also similar to Unix's less command. less) may be found on some systems.

(Note: notice of Vernon D Buerg's death documents a reason why this shareware might not be as easy to register as it used to be.)

[#textline]: Working with lines of text

There are occasions when an error message might refer to a specific line number. Being able to identify that line number may be useful. Here are some tidbits dealing with line numbers.

To count lines, one option is to open up the file in a viewer or editor which shows line numbers. Unix has a couple of additional tools that can help with that:

wc -l inputfile
sed -n '$=' inputfile

In Windows, Notepad will show line numbers if the status bar is shown. This is one way that Notepad can actually be more useful than Wordpad. (Microsoft Word, like Notepad, can show line numbers.) However, Notepad might not show the line numbers if Word Wrap is enabled, so disable Word Wrap if the line numbers aren't showing up.

In less, pressing Ctrl-G may show a line number.

Interacting with a “line” of text is the primary method of some text editors. See the sections for the ed editor, the ex and nex editors, and the edlin editor for some examples.

A note about dealing with some Microsoft software, perhaps most notably Microsoft Internet Explorer. The software may reference a line number which is quite unintuitive. In at least some cases, it seems such software would combine all files (HTML files, JavaScript files, perhaps even CSS files), treat that as one virtual file, allow changes to be made to the virtual file as Dynamic HTML technologies modify the DOM, and then show the line number of that virtual file. When a script error is being reported, the result doesn't even give a real clue as to which filename has the content that led to the error being generated. If there's a small number of files, and if one has easy access to modify the files, one may be able to figure things out a bit by inserting extra blank lines in non-impactful places, and then seeing how different the line number's value is when the page is re-loaded. However, though that may be a fast method on an unprepared machine, it is a bit cumbersome and probably not reasonable for large projects if one doesn't know what the file is. The intended way to deal with this is to use specialized “debugging” software which can figure out what line is being referenced and then display a file with that line highlighted.

[#cmparfil]: Comparing files

This section contains some information about some methods of comparing a couple of files. Keeping track of a larger number of files may be a task described better by the terms such as “file integrity verification/checking”.

Eyeballing
If files may be viewed, one can do a comparison of the contents. This may not notably reveal some differences, such as if one file ends with 0x200D0A and followed by the EOF character used by the operating system, compared to another file which might not have such characters.
Exact Bit by bit (or byte by byte) comparison
Unix
diff -s file1 file2
DOS
FC

Some newer versions of DOS have an FC command.

(Note: The MS-DOS File Compare (FC) command may have nothing to do with the fc command that is internal to some Unix command shells and which is related to command line history.)

FC /B file1 file2

For those who visually like the output of the FC /B command, Neill Corlett's Command-Line Pack has a bincomp command for OS X (and other operating systems, though some or all those other operating systems may also just be able to run FC /B just as easily.)

Other/misc
Do older versions of DOS use COMP for text files only?
Using hashes

One option may be to use software generally designed to archive and/or compress and/or backup and/or transfer files. Such software often reports and/or stores information about the files, such as a hash. For example, if two copies of a file exist, one may zip the files and use some unzipping software to view a hash of the uncompressed data. (The compressed data may be different within the two compressed files, even if the files were created at different times by the same software. However, a hash of the uncompressed data should match.) This may not be the most efficient method for automating the task for lots of files, but it may be good enough in some cases where one just wants a fast answer using tools (such as command line programs like zip, unzip, etc.) that might pre-exist, since often such tools may be included in operating systems. By comparing hashes, one can check whether the hash is identical, the system performing the comparison does not need access to the actual files. (This may be nice if the files are not directly readable, or if the files are on different systems connected by a slow network link.)

Other options include dedicated software meant just for file integrity checking.

[#filehash]: File Hashing

Note: In addition to the commands mentioned in this section, there is another section that describes File integrity checking. Such software tends to use hashes, and may provide solutions designed for checking lots of files (like all of the files on a volume).

Although hash algorithms may have weaknesses causing opportunities for collisions that could be intentionally caused by malicious attackers if they have such capability, such collisions are rather unlikely to happen by accidental chance. So, hash comparisons can be a rather quick way to get a rather high degree of confidence that files are likely identical, and comparing a hash value (or even just part of a hash value) can be a whole lot quicker than a full comparison of files. (For instance, if files are in different locations, then doing local calculations and transmitting a hash value may be far faster than transmitting entire files.)

File Hashing in Unix

For Unix, see if there is software named after a hash algorithm. Here are some examples of commands that may exist:

cksum ”, “ sha1 ”, “ md5 ”, “ rmd160 ”, or “ sum ”.

File Hashing in Microsoft Windows
CertUtil

Some Microsoft Windows operating systems (including Windows 7) include a command called CertUtil which can perform file hashing. Note that some versions of the CertUtil program may not support this functionality. When such functionality is available, this may be the quickest way to get a file hash (going off of the assumption that installing extra software would take more time than simply using the CertUtil command).

Here is a sample:

certutil -hashfile filename SHA512
SHA512 hash of file filename:
01 23 45 67 89 ab cd ef 01 23 45 67 89 ab cd ef 01 23 45 67 89 ab cd ef 01 23 45 67 89 ab cd ef 01 23 45 67 89 ab cd ef 01 23 45 67 89 ab cd ef 01 23 45 67 89 ab cd ef 01 23 45 67 89/CODE> CertUtil: -hashfile command completed successfully.

Although a “-v can be used before the -hashfile parameter, the results did not actually make the output be more verbose. (Presumably that “verbose” option is intended to affect the program's behavior when performing some of the other tasks that the certutil program can do.)

Available hash algorithms

According to this screenshot of help from some help built into CertUtil (posted as part of Steven_Lee0510's answer), these are the list of available hash algorithms:

MD2 MD4 MD5 SHA1 SHA256 SHA384 SHA512

(tedr's answer to user64996's SuperUser.com question for a checksum utility provides the same options.)

Steven_Lee0510's post claimed the -hashfile parameter is supported by “Windows 7 and Windows 8.1”. That said, the help text in Windows 7 did not show the available parameter list. TechNet's online help for Certutil's -hashfile behavior documents that a person can specify a “HashAlgorithm” but then doesn't specify what algorithms are available.

However, a quick series of tests has been performed in Windows 7, using of each of the HashAlgorithms that were listed in Windows 8.1's text. This did show that Windows 7 could use all of those same algorithms (and result in a hash, instead of an error message).

Charles Belov [SFMTA]'s comment notes that the name of “the algorithm must be typed in all caps.” Although many people have abbreviated the algoritm names (e.g., “Secure Hash Algorithm”) in lowercase, presumably especially Unix users that may have commands with names that use lowercase letters, the abbreviations are more properly written in uppercase and that is the only way the CertUtil command supports specifying the hash algorithm.

wayferer's comment to tedr's answer to user64996's SuperUser.com question for a checksum utility notes the software is not available in Windows PE. Steven_Lee0510's answer (hyperlinked earlier) says, “It seems that the certutil on Windows server 2003 doesn't support the parameter of hash algorithm.”

FCIV

For Microsoft Windows 2000, XP, and Server 2003, Microsoft has released a dedicated program called FCIV (does this stand for File Check Integirty Verifier?) (see MS KB Q841290).

This program is not bundled with Windows 7. However, creator's answer to user64996's SuperUser.com question for a checksum utility indicates it may be built into Windows 8. As it is downloadable, this could also be used for some older operating systems.

Downloadable options

These options use the system's graphical interface.

HashCheck Shell Extension (HashCheck Shell Extension's “BSD-style license” (as described by the softare's main page) was highly rated, shown by Andrew Moore's answer to user64996's SuperUser.com question for a checksum utility. SummerProperties was also described as being open source.

SFC may also be useful: supposedly this has different options in XP than in Vista.

Patching Files
This section may need further expansion to be quite useful. Unix commands, which may be available for other environments, that may be helpful could be bdiff, diff, and patch. (The bdiff might be less commonly included, although it may work better with some files.)
[apndtofl]: Appending to files
Appending a line of text

To append a single line of text (using the operating system's defined version of a “line” of text, so the line ends with a CR LF in MS-DOS or LF in Unix), use:

echo text >> output.txt

(This works in both Unix and DOS.)

Appending the contents of a file
[#unxapndf]: Appending a file in Unix
cat input.txt >> output.txt
[#apndosfl]: Appending a file to another file in DOS and compatible/similar
For pure text files
type input.txt >> output.txt
For binary files

The following might work. (It does work with JP Software products and in Windows Vista; it may not work in some other environments.) Because it may not work precisely in all environments, it is recommended to back up any involved input files. Unless completely safe tests have been thoroughly performed, it is also recommended that the output filename does not exist: do not try to use the output file as an input file (and especially not as the first input file in the list).

To copy three files all onto one destination file, the following syntax might work.

copy /B inputone.txt+inputtwo.txt+third.txt output.txt

Note that there is no space next to any plus signs.

(There is nothing particularly special about using three input files: two input files or four input files are likely to work just as well by simply including plus signs between the files.)

The /B command line parameter specifies binary mode. If that parameter is not supported by the copy command then try leaving it off.

Whenever trying this out in an operating system where this hasn't been previously tested, it is recommended to perform a check of the results. If the destination file is the same file size as one of the input files, and if it is not true that all other input files were zero long, then the command likely did not have desirable results. If the output file is larger, but is not exactly the size of all input files added together, there may also be a problem. Perhaps some bytes were lost because of how the files were being treated as the command tried to treat an input file as ASCII text file input.

[#appndcon]: Appending standard input

Some similarities involving redirection and a command to show files: Unix may create a file using “ cat > output.txt ” and then holding Ctrl and pressing d (perhaps at the start of a new line) to end the file. Similarly, MS-DOS (and similar command prompts, such as Windows XP) can use “ COPY CON output.txt ” to start creating a file, and may stop using the file using Ctrl-Z at the start of a new line. In both of these cases, the character used to stop editing the text file is the EOF character.

Appending standard input in Unix

To copy text from standard input, Unix may use:

cat >> output.txt

Then, in Unix, end the standard input by typing an EOF character at the beginning of a line: For Unix the EOF character is ASCII code 4 (the character representing “end of transmission”/“EOT” control code) and this character can generally be input by holding the Ctrl key and pressing the lowercase letter d (and then releasing the Ctrl key). (Unlike DOS's approach which involves using the Enter key after using the relevant EOF character, entering this EOF character at the start of the line will immediately stop asking for input.)

Note that entering Ctrl-c may end up not having a file be created.

Appending standard input in DOS

There may be a simple and straightforward process, but since that may vary based on some factors such as the type of file (whether it is ASCII text or should be treated as binary data) and the specific operating system being used, here is a general process.

First, back up any files that are going to be involved in this process.

Then, create a new file. (Do not specify an existing file.) To do this, run:

copy CON output.txt

The filename CON refers to a device. As a device name in DOS, it can be referenced no matter which directory is the current directory.

After typing the desired new contents to be put at the end of the file, enter the EOF character: The EOF character for this environment is ASCII code 26 (1AH/0x1A, the character representing the “substitute”/“SUB” control code) and this character can generally be input by holding the Ctrl key and pressing the lowercase letter z (and then releasing the Ctrl key). Then press Enter. The actual EOF character, and any characters that were typed after that character (including the Enter), do not end up getting created in the file.

Then, take the file that was just created and use DOS to append the file onto another file. (That is the part of this process which may vary a bit based on factors such as the operating system implementation.)

Other redirection/piping

Topics to be covered include:

  • Appending standard text to file
  • Piping standard text
  • Similar, but for stderr
  • Reading input from a file
Multiple (alternate) file streams

NTFS supports a feature called AFS. AFS is a feature that provides opportunities for those who are trying to hide data, such as malware authors. For most other situations, multiple data streams can be stored on a drive by simply using multiple files which are each individually easily recognizable.

JP Software products (or perhaps more specifically some versions of “Take Command” which are 32-bit in nature) may support a /: switch on the dir command.

Creating files filled with NULL bytes
Unix

There is generally a file named /dev/zero which is designed to output zero. Copying multiple bytes from this device will result in multiple bytes filled with bits that are cleared to a value of zero.

If that doesn't work, perhaps /dev/null can be used (although the more common purpose of /dev/null is to be an output file).

To make a 3-byte file filled with bits cleared to zero:

dd if=/dev/zero of=output_file bs=3 count=1
Other platforms

Other systems may or may not have something similar to Unix's /dev/zero or /dev/null files. For example, MS-DOS does have the NUL device (which basically acts like as if it exists in the current directory, no matter what the current directory is). However, the bigger obstacle may be that dd will often not exist.

Another option may be to find and locate a command that performs functionality that is exactly the same as, or similar to, some/all of the functionality performed by the dd command. That may often be an option: A dd command may be downloadable for multiple platforms. However, this approach may require that such a null-type of device exists.

Using a “hex”/“binary file” editor to create a small file may be another option. Once a small file is made, it may be appended to itself to double in size. (Copy the file, then append the first copy to the second copy to make a third larger copy.) That may be done repeatedly. Be prepared to keep or create a small file as needed, to be appended to the large file, if an exact filesize is needed and if that size won't be precisely reached by doubling the size of an existing file.

One option may be to just use some skill at creating programming code. Writing a program to output null bytes to a file may be fairly easy. Writing a program to output null bytes to standard output, which may be redirectable, might be even easier.

Because the program is so simple to make, there's probably many implementations that have been made, many of which are probably downloadable. AS one example, Neill Corlett's Command-Line Pack contains a zerofill command which may be an option for some platforms.

[#zbytfile]: Creating a zero-byte file
Overview

If “echo A >> output ” in MS-DOS results in a four byte file (one byte from the letter “A”, a space after the word, and a two-byte “newline” sequence), one would expect that removing the letter “A” from the command line might reduce the filesize. So, what happens when running:

echo>> output

The output of the latest example command is a file that is more than triple the size of the earlier example's output. (The exact same output happens if an optional series of one or more spaces exist before the redirection signs.) The result is the following text file:

ECHO is on.

(That is the more likely scenario when run from a command prompt. It is also possible for ECHO to be off, resulting in an even larger output file.)

Hmm, creating this tiniest of all files might be a bit more challenging than expected. What can be done?

Using a hex editor

Using a “hex”/“binary file” editor is generally one option that can do this successfully. The only real problem with this approach is that this approach may be unavailable on many systems, until software is downloaded.

Outputing a NULL device
Use a method of viewing a file which results in the file's contents going to standard output: Simplistic command line tools for file viewing. Then redirect the output. (This may or may not work well: possibly depending on what software is being used, a key may be to avoid adding any white space when redirecting the output.)
Copying a NULL device
Unix
cp /dev/null output

Trying to use /dev/zero as a source file may also successfully create the desired zero-byte file, but then the command may effectively freeze up while waiting for any potential new output from the device. The command can be aborted (with Ctrl-C), but this still requires interaction. Using /dev/null will work nicer.

DOS
copy NUL output
Redirection

In Unix:

echo -n >> output

is likely to have the desired effect. With DOS, the echo command does not support that “ -n” parameter.

With the command lines from JP Software products, one may use:

REM >> output

Other command line interpretors may have different results: There may be no output. (It does seem this is because the REM command somehow actually overrides the redirection, and so the command ends up being successfully commented out. Trying to pipe the output won't result in the piped-to command being run.) So, no simple solution is provided that works universally for DOS platforms (other than installing JP Software's products).

The following is based on information found by Oleg's answer to SuperUser question: “Windows 7 batch files: How to write string to text file without carriage return AND trailing space?”, and probably works on some other Microsoft Windows platforms as well:

echo | SET /P ="optionalText" >> output

That example, as shown, creates a text file that says optionalText”, but taking out the optional text (and simply having two quotation marks next to each other) can create a 0-byte file. Note that the exact syntax shown does with with CMD file. For JP Software's mostly-compatiable product, leave off the quotation marks and just ignore the error message.

Using the touch command

The following will work with many Unix installations:

touch output
Speculation: Using DOS's DEBUG command

This probably is a way, using a built-in tool for DOS. This could be automated by placing the right contents in a text file, and piping the text file's contents to become the input for the DEBUG command. However, this is probably NOT going to be the easy way.

[#filcmprs]: File compression
Compressing a file

TOOGAM's Bit Compression FAQ discusses some various options.

There are various programs to do this. Many programs are mentioned at TOOGAM's Software Archive: File archivers/compressors.

The most widely supported format on the planet would be Zip files. They can take up very little memory to uncompress, and software for unzipping files is available for many platforms. (The Info-Zip group has identified its program as being one of the most ported programs ever. This claim is discussed further on TOOGAM's Bit Compression FAQ.) Modern versions of Microsoft Windows come with Microsoft Compressed Folders.

For smallest compression, there are contenders that make files smaller than zip files. For instance, see MaximumCompression.com's summary for multiple file compression and/or MaximumCompression.com's summary for single file compression. However, the compressors that result in smaller filesize may have disadvantages: new versions may be incompatible with old versions (as seen with older PAQ code, and ZPAQ versions prior to 1.0), and they may take up large amounts of memory to decompress (as well as to compress). They often are not a whole lot better than some of the best Zippers.

The effectiveness of Zippers does vary. 7-Zip's support for the Zip format can often be among the very best options. In Unix, the command line version of the software may be called p7zip. To make a small zip file from the command line, use:

7za a -tzip -mx=9 -mfb=258 -mpass=15 filename.zip filespec

For more information about using that software, see 7-Zip Help.

Other software might be able to create a smaller Zip file than 7-Zip. Details may be available from TOOGAM's Bit Compression FAQ.

Decompressing a file

Software to help do this is mentioned by TOOGAM's Software Archive: File archivers/compressors and/or MaximumCompression.com's list of compressors.

A hint for Microsoft Windows users: Most files made with popular formats can be extracted with 7Zip.

Software development and compression theory
See code for data compression
Entire partitions

See: disk/drive data compression.