tar (derived from tape archive) is both a file format (in the form of a type of archive bitstream) and the name of a program used to handle such files. The format was created in the early days of Unix and standardized by POSIX.1-1988 and later POSIX.1-2001.
Initially developed to write data to sequential I/O devices for tape backup purposes, tar is now commonly used to collect many files into one larger file for distribution or archiving, while preserving file system information such as user and group permissions, dates, and directory structures.
Rationale
Many historic tape drives read and write variable-length data blocks, leaving significant wasted space on the tape between blocks (for the tape to physically start and stop moving). Some tape drives (and raw disks) only support fixed-length data blocks. Also, when writing to any medium such as a filesystem or network, it takes less time to write one large block than many small blocks. Therefore, the tar command writes data in blocks of many 512 byte records. The user can specify a blocking factor, which is the number of records per block; the default is 20, producing 10 kilobyte blocks (which was large when UNIX was created, but now seems rather small).
Format details
A tar archive consists of a series of file objects. Each file object includes any file data, and is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes. The original tar implementation did not care about the padding and left the buffer unaltered, nowerdays most tar implementations fill the extra space with zeros. The end of an archive is marked by at least two consecutive zero-filled records. (The origin of tar's record size appears to be the 512-byte disk sectors used in the Version 7 Unix file system.) The final block of an archive is padded out to full length with zeros.