Master 512 Technical Guide - Chapter 9: DOS Plus Disc Structure

< Previous | Contents | Next >

9: DOS Plus Disc Structure

Introduction

In this chapter familiarity with the general operation of the disc filing system and its user facilities in DOS Plus. This chapter therefore details how the data is physically stored and managed on the media and the relevance of the fields required by DOS to maintain the system.

The standard DOS formats supported by the 512, 360k and 720k, are also common to genuine PCs and clones, while the 640k ADFS based format and the 800k 512 format are peculiar to the 512. Given that the number of physical tracks and sectors may vary from one format to another, the view of the disc as seen from a DOS application is constant because of the functions provided for disc management in the Operating System.

Provided that disc access is carried out through standard interrupt facilities, most of the implications of differing disc formats can be ignored in user applications. The only obvious exception to this view is the matter of total storage capacity.

The information following explains the principles employed by DOS in managing discs for those who may need to gain direct access to the storage medium at the physical level. The information is provided at a level which should apply to all DOS disc formats. However, a note of caution is called for.

The specifications for the different DOS disc formats were loose enough to allow considerable variation in the detailed interpretation and implementation applied by different PC manufacturers. The result, incompatible discs of supposedly the 'same' format, is well known to most 512 users. A good knowledge of DOS interrupts, together with the precise details of each type of disc format, is needed if discs are to be updated at the physical rather than logical level. A detailed list of various DOS disc format specifications is given in Appendix D.

Directories

The root directory is the highest logical level of organisation on a disc or volume, but regardless of the directory level examined, the type and format of the data stored is similar, with a limited number of minor exceptions in the case of the root directory. These exceptions are identified where appropriate, otherwise all information relates to any directory at any level.

Every complete single entry in a directory is 32 bytes long and usually represents one of only two items (or a single occurrence of a third in the root directory, the volume label). Apart from this, all entries represent either a single file or another directory. The total number of possible directory entries varies with the disc format. For example, it is 192 for the root directory of an 800k 512 disc, 112 in the root directory for both Acorn 640k and IBM 360k. Note that directories can occupy more than one sector on the disc.

Whatever the disc size and format, each directory entry must sufficiently describe the object so that DOS knows all that is required about its characteristics and type. The layout of a single directory entry is shown below, with the position of the data shown as an offset from the start of the entry. Where a number is given in brackets next to an item, further information is given in the notes which follow.

Byte offset
	Dec	Hex	Len	Data type or use
	0	0	8	Filename (1)
	8	8	3	File extension
	11	B	1	File attribute (2)
	12	C	10	Reserved
	22	16	2	Time created or last updated (3)
	24	18	2	Date created or last updated (4)
	26	1A	2	Starting cluster (5)
	28	1C	4	Filesize (6)

Filename and extension (1)

The first byte of a filename field may contain one of several values, with meanings as follows:

00h means this entry has never been used. It also indicates the last used position in the directory, since all entries are initialised to this value during formatting. DOS always re-uses all previously deleted file entries before using a new one.

05h means the first character of the filename is actually E5h, which indicates that the file has been erased.

2Eh is the ASCII code for a full stop and indicates a directory name when in the first character of the filename. If the second character is a space, the entry is an alias for the current directory. This is seen as a literal full stop (.) in the directory display.

If the second character is also a full stop then the entry is an alias for the parent directory, in which case the cluster field contains the cluster number of that parent directory, or zero when the parent directory is the root. This is seen as two literal full stops in a directory display.

Any other value indicates a valid filename. Both the filename and extension fields are padded with space (20h) characters if less than the maximum number of characters (ie eight.three) is used.

File attribute (2)

The attribute byte of a directory entry can indicate the following meanings when the appropriate bit is set to one:

Bit		Indicates
0		Read only: the file may not be updated
1		Hidden file: excluded from normal searches and directory displays
2		System file: excluded from normal searches and directory displays
3		The volume label – it can only exist in the root directory
4		Subdirectory: excluded from file based operations (eg search for file)
5		Archive bit: Set on whenever a file is modified, cancelled by archiving (backing up) with the 'BACKUP' utility.
6		Reserved
7		Reserved

File attributes can be set when a file is first created, or subsequently can be modified by command line commands (eg FSET) or by system calls from within programs. A normal type file which has not been updated since it was archived has all attribute bits set to zero.

Bits three and four of the attribute byte cannot be modified.

The Time Field (3)

The time field is encoded as 16 bits made up of three binary numbers. Bits are assigned as follows:

Bits		Contents
0 - 4		Binary number of two second increments (0 to 29)
5 - Ah		Binary number of minutes (0 to 59)
Bh - Fh		Binary number of hours (0 to 23)

The Date Field (4)

The date field is encoded as 16 bits made up of three binary numbers. Bits are assigned as follows.

Bits		Contents
0 - 4		Day of month (1 to 31)
5 - 8		Month of year (1 to 12)
9h - Fh		Year relative to base 1980

The Cluster Field (5)

The cluster field contents are:

a)		The cluster number for the start of the file for a file entry.
b)		The cluster number of the directory for a subdirectory or a parent directory.
c)		Zero when the parent directory is the root directory.

The value held in this field is not only an absolute cluster number, but also a pointer into the cluster entry in the FAT for the first or only cluster entry. (See clusters and the file allocation table later in this chapter)

The File Size Field (6)

The file size field is only of relevance to file entries. It contains a four byte integer, with the two Iow-order bytes stored first. It records the actual number of bytes used within the file allocation, not the number of disc bytes allocated to the file (see clusters later in this chapter).

Disc Organisation

Unlike the host's DFS or ADFS filing Systems, DOS does not require that each file occupies a contiguous area of disc. This gives major advantages in convenience, since it provides a self-maintaining system which never requires such operations as compacting, rendering errors like Can't extend impossible. The only error in this category which is ever seen in DOS is Disc full. A subsequent DIR will show that either this is literally true, or that the file to be written to the disc will not fit into the remaining unallocated spaces

The converse of this convenience is that, over time, as files are extended or contracted and new files are added or old ones deleted, individual files become very fragmented. This has two possible implications for the user.

The first, which is unavoidable, is that pieces of file may become literally scattered around the disc at virtually random locations. This eventually can make itself felt when file access times become extended. It is of little consequence for small files, but can have a very noticeable performance implication for regularly used large files.

The solution is to format a new disc and copy the data to it file by file. The copying must be carried out at the file level because copying the whole disc track by track (ie using the DISK program) will also take the existing fragmented structure with it, defeating the objective.

Copying the files individually reorganises them by joining the fragments together on the new disc. The result is that the newly copied files are again stored in contiguous areas and disc performance is restored. Assuming two floppy drives this can be simply achieved by the command.

copy A:*.* B:

Using this technique will require that copying is carried out on a directory by directory basis, and that the appropriate directories have already been created on the target drive. Although the effects of file fragmenting may be less noticeable for longer periods with a hard disc, this problem and its solution (unfortunately very much more tedious in this case) apply equally.

The second implication of fragmented files is that, in the event of a physical disc failure or the accidental deletion of a file, retrieving and assembling the fragments of a file in the correct order by recovering hardware sectors from disc can be an extremely lengthy and laborious process.

As usual, but even more especially important for DOS disc formats than for BBC native mode discs, the very best advice is to attempt to totally avoid this possibility by ensuring adequate backups are taken on a separate disc at frequent intervals during lengthy file updating sessions.

Clusters

So as to keep track of where on a disc the files and directories are physically stored, DOS uses a disc space allocation unit, known alternatively as simply an allocation unit, or more usually as a cluster. A cluster is the smallest amount of disc space that DOS will allocate to a file, no matter how small the amount of data to be stored. The minimum allocation of one cluster per file accounts for the seeming contradiction between the amount of space remaining free on a disc and the fact that an attempt to copy another file which ought to fit can sometimes produce the Disc full message.

A cluster is made up of a number of physical disc sectors and is always a binary number of sectors, 1, 2, 4 or 8. The number of sectors per cluster varies with both the disc type and the sector size. Life would be easier if the total size of a cluster in bytes were fixed, but unfortunately there is no hard and fast rule.

For example, in the 512's 800k format there is one 1024 byte sector per cluster (1k), in IBM 360k discs there are two 512 byte sectors per cluster (also 1k), while Acorn's 640k disc has eight 256 byte sectors per cluster (2k) and ICL 720k CP/M discs have four 512 byte sectors per cluster (aIso 2k).

In practice, the size of a cluster is a balancing act between a managable number of clusters and wasted disc space caused by small files occupying a whole cluster. Different manufacturers appear to have differing opinions about the optimum mix. To easily establish the number of sectors per cluster, the SHOW command can be used with the [DRIVE] option. This is slightly awkward, as the SHOW command is particularly fussy about how its parameters are entered. The three outputs shown below from this command were captured by redirecting the output from the command to disc. The disc in drive A: contained the SHOW program, while drive B: contained the disc to be investigated. The format of the command given (from drive A:) was therefore:

A>SHOW B:[DRIVE] > SHOW.OPn

where n was a number from one to three, one for each disc examined. The point to watch carefully is that there must be no space between the target drive identifier and the [DRIVE] option. if the command were issued as:

A>SHOW B: [DRIVE] > SHOW.opn

with a space included, only the capacity of drive B: would be given and the drive characteristics would be those from drive A:. Clearly this is a bug in the command line interpretation, but it's far too easy to miss, resulting in the wrong information being produced, so take care.

Issued correctly (with no space) the characteristics of three disc formats were obtained for illustration. They are, in order, IBM 360k (formatted on an Olivetti M24) Acorn 64k and DOS Plus 800k. The disc capacity is identified on the third line of each display.

	B:		Drive Characteristics
	2,880:		128 Byte Record Capacity
	360:		Kilobyte Drive Capacity
	112:		32 Byte Directory Entries
	112:		Checked Directory Entries
	128:		Records / Directory Entry
	8:		Records / Block
	9:		Sectors / Track
	0:		Reserved Tracks
	512:		Bytes / Physical Record

	B:		Drive Characteristics
	5,120:		128 Byte Record Capacity
	640:		Kilobyte Drive Capacity
	112:		32 Byte Directory Entries
	112:		Checked Directory Entries
	128:		Records / Directory Entry
	16:		Records / Block
	16:		Sectors / Track
	0:		Reserved Tracks
	256:		Bytes / Physical Record

	B:		Drive Characteristics
	6,400:		128 Byte Record Capacity
	800:		Kilobyte Drive Capacity
	192:		32 Byte Directory Entries
	192:		Checked Directory Entries
	128:		Records / Directory Entry
	8:		Records / Block
	5:		Sectors / Track
	0:		Reserved Tracks
	1,024:		Bytes / Physical Record

As can be seen in each display, the last line shows the number of bytes per physical record. A physical record is CP/M terminology for a sector, so this line means the number of bytes per sector. This is 512 for the 360k IBM disc, 256 for the 640k Acorn disc and 1024 for the 800k DOS disc.

The seventh line shows a number of records per block. This is 'CP/Mese' again. Internally, CP/M operates on the basis that all disc formats are logically regarded as being made up of 128-byte records (this is nothing to do with records in files). In fact it's merely a historical hangover, but has been maintained for compatibility with earlier CP/M software. The block referred to in this case is the allocation unit, or the same unit as a DOS cluster. In the 800k format disc, for example, there are eight logical 128-byte records (=1024 bytes) per block (ie per cluster).

From these values we can easily see that there are 1024 bytes per cluster and, at 1024 bytes per sector, an 800k 512 DOS Plus disc has one sector per cluster.

As a second example, consider the 640k ADFS disc format. The last line shows '256 bytes / physical record' (= bytes per sector) and line seven shows '16 records / block' (= 16 * 128 byte logical records per cluster). In this case a cluster is 2048 bytes, or 2k, and at 2k bytes per sector there are therefore eight sectors per cluster.

Repeating the exercise for the 360k IBM disc gives the result of two 512-byte sectors per 1k cluster. By using SHOW [DRIVE] and this simple calculation the physical organisation of any DOS disc can be easily established.

Fortunately the information which identifies the disc organisation is equally easy to derive in programs. It is stored on the disc at the beginning of the first sector of the first track.

In PC disc formats, this identification information forms part of the first sector of the boot record, which itself always starts at the first sector of track zero. PCs use this fixed location and the data contained to decide if they can recognise the disc, and during their start-up procedures when booting. Although DOS 800k discs are not bootable and sector zero contains the first FAT sector (see below) this information is still present in the same location in the first sector.

The layout of the relevant identification bytes in the first sector of a DOS disc are shown below in Table 9.1.

Byte Offset
	Dec	Hex	Len	Contents
	0	0	3	E9 XX XX or EB XX XX (disctype)
	3	3	8	OEM name and version
	11	B	2	Bytes per sector
	13	D	1	Sectors per allocation unit (cluster)
	14	E	2	Reserved sectors starting at zero
	16	10	1	Number of FATS (usually two)
	17	11	2	Number of root directory entries
	19	13	2	Total number of sectors in volume
	21	15	1	Media descriptor byte (see below)
	22	16	2	Number of sectors per FAT
	24	18	2	Sectors per track
	26	1A	2	Number of heads
	28	1C	2	Number of hidden sectors

Table 9.1 Allocation of Bytes in a DOS Disc, Sector 1.

In a bootable IBM disc this would be followed by the bootstrap routine from byte 1E. The disc type byte at offset zero is, in IBM discs, a direct jump to the bootstrap loader. To start the boot process, therefore, a PC merely identifies the disc type and jumps to the address indicated.

The media byte is an IBM disc type identifier. Those IBM formats which are relevant to DOS Plus 2.1 are identified as:

	0FCh		5.25" single sided, nine sectors per track
	0FDh		5.25" double sided, rune sectors per track
	0FEh		5.25" single sided, eight sectors per track
	0FFh		5.25" double sided, eight sectors per track

There are other identifiers, but they apply only to DOS version 3 or to eight inch floppy discs.

512 800k discs have no boot sector. They are identified as the second type shown, with 0FDh stored in the first byte of the first sector, which is the first FAT sector. This is explained below in the FAT schematic.

The File Allocation Table

Obviously, as files become fragmented, so do both the used and unused clusters, therefore a complete record of the used and unused areas on a disc must be maintained. This is the purpose of the file allocation table, usually referred to as the FAT for short.

When a new file is created the minimum allocation unit, one cluster, is initially made available for the file, and the address of that cluster is stored in the cluster field in the directory entry at offset 1Ah, as shown earlier.

If the size of a single cluster is insufficient for the quantity of data in the file, a second cluster is allocated and the address of that second cluster is stored in the FAT, within the entry for the first cluster. This process of extending the file cluster by cluster is repeated until the required file size is achieved. The last FAT entry for a file, which of course does not point to another FAT entry, contains an end of file marker to indicate that no more clusters follow.

Since DOS has a record of the start cluster for each file or directory on a disc in the directory entry, and within a file clusters are 'chained' together internally, the only additional other information required to manage the whole disc is the address of any free clusters.

The method used is the simplest. The FAT is large enough to hold an entry for every cluster on the disc. Each logical cluster entry, therefore, always resides in a known physical offset in the FAT, regardless of the disc capacity. Further, the directory entry cluster field not only provides a simply calculated index into the FAT, each FAT entry points to the next. In addition the address of each FAT entry can be used to directly calculate the physical location on the disc of the first sector of the corresponding cluster.

Since each FAT entry is, in effect, self indexing, the actual contents of the entry are not needed to point to physical disc locations (ie FAT entry one always refers to cluster one). Because of this the contents of a FAT entry can be used to indicate the status of the cluster instead of its location.

Each FAT entry in 512 readable discs occupies three nibbles (12 bits). The possible meanings of the cluster status as recorded in the FAT are:

Value		Meaning
000h		Cluster is available (free for use)
001h		Cluster is in use (this is not a chain number since the minimum chain number is two, ie the second cluster)
002h‑FEFh		Cluster is part of a file (the contents point to the next cluster in the chain)
FF0h‑FF6h		Reserved cluster (cluster is not free for use and is not used)
FF7h		Cluster is bad (contains bad sectors)
FF8h‑FFFh		This cluster is the last in a file chain (ie an end of file cluster)

In this way, accessing the contents of any cluster entry in the FAT gives all the information required for logical to physical disc mapping as well as the status of that area. In interpreting entries in the FAT the relevant facts are:

1.		FAT entry number = cluster number (entry zero is reserved, see below)
2.		Cluster number multiplied by sectors per cluster gives the first physical disc sector of the cluster.
3.		FAT entries are stored in pairs, in three bytes.
4.		To index into the FAT: a) multiply the cluster number by three b) integer divide the product by two This gives the offset into the FAT. c) Move the word at that FAT offset into a register. Note: Remember that FAT entries can span physical sectors as they are a multiple of three bytes. d) IF (a) MOD 2=0 THEN result = (c) AND 0FFFh ELSE result=(c) SHR 4 bits e) The resulting value 'result' corresponds to one of the FAT entry values given in the preceding table. If it does not, either you have made a mistake, or the disc's FAT is corrupt. In the latter case you can attempt to retry using the second FAT.

If you get an error at e) which indicates the two FATs do not agree, your program should report the fact and include an option to stop the current operation immediately. For security you should either include a suitable routine in your program (or use the DISK utility) to produce an exact physical duplicate of the disc before proceeding.

Schematically The FAT appears as follows:

	Offset		Content
	0		disc type
	1 - 2		FFFFh (always)
	3 - 5		Group 1 = FAT entries 1+2
	6 - 8		Group 2 = FAT entries 3+4
	9 - Ah		Group 3 = FAT entries 5+6

and so on, up to the number of FAT entries given by the number of sectors per FAT (offset 16h in sector zero) multiplied by the bytes per sector (offset Bh) divided by three. Within each three-byte FAT group the significant nibbles for each FAT entry are organised as shown in the following diagram, Figure 9.1.

Three byte group:

Byte no.	1 2 3 / \ / \ / \ 1 2 3 4 5 6
Nibble no.
Hex digits for low entry Hex digits for high entry	2 3 - 1 - - - - 3 - 1 2

Figure 9.1 Organisation of a FAT Entry

Each hex digit shown above is numbered, one to three, where one is the most significant value, three the least. In other words the nibble values that result in the chosen 16-bit register after decoding should be in the order 0123.

As described, each 32-byte directory entry for a file or a directory contains the number of the first cluster entry in the FAT. If the contents of the cluster entry in the FAT are zero that cluster is free for use. Any value between 001h and FF6h indicates that the contents are the next cluster number in the current chain, while a value of 0FF8h or greater is used for the last cluster in any chain and therefore represents an end of file marker.

There is one other cluster status that is important. That is the value of 0FF7h, which indicates a bad cluster that must not be used. This occurs when DOS attempts to write a cluster but fails after a specified number of attempts (usually defaulted to three tries). On failure, the cluster is marked as bad and will not be used again for any purpose. This is acceptable to a limited degree on a Winchester, but if it occurs on a floppy, the data should be retrieved and the disc scrapped as soon as possible.

The ability to mark clusters as 'bad', yet to continue to be able to use the disc, is one of the mechanisms used by DOS to increase the reliability of the disc filing system when a disc eventually begins to show signs of failure.

To further improve matters, DOS keeps two copies of the FAT on each disc in most manufacturers' formats (although this is not true for all formats particularly the smaller capacity 'older' formats). This technique not only increases security further, it also provides a means for dealing with certain exceptional events which might occur during normal operations.

For example, if a file were being copied and part-way through the operation DOS found there was insufficient disc space, the copy would have to be abandoned. If any portion of the file had already been written, some of the unused clusters on the target disc would already have been allocated to the file (which will not now be copied) and the FAT would be incorrect.

To avoid this type of problem, DOS dynamically updates only the first FAT during the disc operation, updating the second FAT only on successful completion. If then the operation must be abandoned part-way through for controllable reasons (eg disc full) copying the contents of the second FAT back to the first restores the status quo immediately.

Unfortunately, operations sometimes fail for uncontrolled reasons, such as power failure, a machine hang-up or simply a faulty disc area within the FAT itself. In such cases, the first copy of the FAT may unavoidably be left in an inconsistent or even unreadable state and a part-copied or part-updated file might or might not be accessible subsequently.

In the event of a failure which leaves a FAT corrupted, the CHKDSK utility can be used in 'repair' mode, (/F, /L or /R options) when an attempt can be made to rejoin and reallocate appropriate clusters with the intention of making the disc usable again. CHKDSK reads and compares the two FATs, attempting to make sense of the two sets of information in the FATS and the contents of the disc. (If one FAT is entirely unreadable the other one alone is used.) This is by no means a guaranteed solution, but when it does work it can save a great deal of effort and time.

If such a failure occurs a warning is given and your first action must be to salvage any available data to a newly formatted disc.

12 and 16 bit FATs

As can be seen above, each FAT entry is three nibbles, or one and a half bytes. This means that the maximum possible number of clusters is 16³, or 4096 clusters. In IBM hardware this number is inadequate for anything other than floppy discs or small Winchesters (eg 10 Mb) so a second type of FAT exists, consisting of 16 bits per entry. This is used for larger disc formats as it permits addressing of up to 65535 clusters.

The 16 bit FAT was only introduced with DOS version 3.0+ and as it applies only to larger Winchesters it is of no relevance to the 512. The size of the FAT entry is decided automatically by DOS during formatting and it should be impossible to have a 16 bit FAT on any floppy disc, therefore all FAT entries in the 512 are of the 12 bit type, regardless of the disc format.

< Previous | Contents | Next >

About the Master 512 | Bibliography

Introduction

Directories

Byte offset

Dec

Hex

Len

Data type or use

Filename and extension (1)

File attribute (2)

Bit

Indicates

The Time Field (3)

Bits

Contents

The Date Field (4)

Bits

Contents

The Cluster Field (5)

The File Size Field (6)

Disc Organisation

Clusters

Byte Offset

Dec

Hex

Len

Contents

The File Allocation Table

Value

Meaning

Three byte group:

12 and 16 bit FATs