Previous | Index | Next

Master 512 Forum by Robin Burton
Beebug Vol. 12 No. 5 October 1993

This month we'll start to investigate a simple-to-use back-up system based on readily available software.

For my examples I'll use PKZIP, but first, thanks to Howard Smith for providing information on an early version of Flexibak and to David Harper for supplying a copy of the latest version, now called Flexibak Plus. More on these next month.

GETTING ORGANISED

Whatever back-up system you use, your first thoughts should be about how you'll want to secure your files and how they might need to be recovered. You won't always have had a total disaster, so you won't always want a complete recovery, and it could even be just one file.

To plan the job, mentally divide your data and programs into logical chunks, each of which can be processed separately as and when required, not just as part of the whole. I'll explain this concept in more detail later, using my own data to illustrate. Your approach will depend to an extent on the facilities in the software used, but some generalisations apply, whether you're floppy or hard disc based. That said, the two disc types also impose constraints of their own which must be considered.

Hard disc users really must plan backups more carefully than floppy users, because no matter what tools are used there's no way the contents of even a small winchester will fit onto a single floppy.

By contrast floppy users store files on separate discs already, so back-up planning might seem unnecessary. Not so! If the crude approach of whole disc copies is your method there are gains to be made, potentially spectacular ones, when data compression is employed. Files from a particular application might fill several floppies, but the number of back-up discs can be reduced dramatically if the data is compressed. Equally important, securing and recovering can be very much quicker than direct copies, as we'll see later.

The best approach also depends on the type of file and whether new files are continually created, or more typically only existing ones are updated. If your back-up system is currently not very well organised (perhaps even if you think it is) here are some ideas to consider.

Let's illustrate extremes by assuming the system is used for only one application. If the task is producing letters and reports, it's likely new files will continually be created. If a permanent disc record is needed, each new file must be included in your back-ups. Alternatively, if the sole use of the system is for a database new files won't be created very often, perhaps never, once the system is set up. Here the need is to secure the latest versions of existing files as they're updated.

Like most users I'm somewhere between the two extremes, usually amending files, but occasionally creating new ones too. Some compromise is inevitable, but you should still tune your data organisation and back-up strategy to suit your needs. The ideas here are food for thought and, while some points may be obvious, it's unlikely that there's no room for improvement, unless you're very well organised. To a degree hard disc users have to be more organised, while floppy users can be a bit more flexible in their arrangements, but both might benefit from a bit more thought.

REDUCING THE WORKLOAD

The first point is that efficient security means saving time, which means selective back-ups, and avoiding duplicated effort is the biggest step you can take towards speeding up the job. This will be harder if your programs and data aren't organised with this in mind in the first place.

Applications usually have a number of files, even a word processor probably has overlays, one or more dictionary files, various printer related files, plus help and configuration data too. As an example, the volume of data in my word processor is 460K contained in ten files. Although word processing is a 'simple' application (some are much worse) it's a big help not to secure such files unnecessarily.

Remember too that EXE files (256K of my total) and dictionaries (107K) are less likely to compress well, so here's almost half a megabyte of back-up which I don't repeat unless there's a good reason.

Obviously, if different file types are mixed together in the same directory selective back-ups (and recoveries) will be more difficult. If you can't separate the different types of files, you'll be continually re-securing application files that haven't changed – a valuable waste of time and disc space.

Mind you, even if you have original issue discs, don't think you needn't secure your applications. Remember, configuring some applications to your particular requirements can be quite a lengthy job, and one that's best avoided unless there's a good reason for it. If problems force you to re-copy an application you'll have to repeat the entire installation process again, unless you have a secure copy of the fully configured working system.

Applications should certainly be secured, but only once. It shouldn't be done just because you're securing data after a work session. Keep application files and data files either on separate discs, or at least in separate directories and then, with even the simplest procedures, you can secure the application once, and thereafter save time by securing only the data.

This also simplifies recovery, since it allows easy recovery of data, applications, or both when necessary. Mix them together and recovering just the data, or just the application becomes much harder, if not impossible.

MAKE LIFE EASY

The biggest step towards back-up reliability is automating the process as much as possible. Why? Two main reasons.

The first is subjective – if the system's easy to use, it will be easier to discipline yourself to use it. If it's fiddly or involves too much effort, human nature says it won't be done as often as it should be. The second reason is that during backups mistakes will be far less likely. A fixed routine which is tried and tested doesn't offer much chance to introduce new problems.

Using an essentially manual system, like PKZIP, two standard facilities (ie. free and available) will help. The first is batch files, the second is recursion, which is frequently overlooked. Recursion can provide extra control or a range of options simply and with total reliability.

Now let's look at real examples. These are based on my routines for securing Essential Software's files. The data structure is simple, one directory called ES, in the root directory of my hard disc, contains everything belonging to ES. Source files, issue disc contents, and sundry data files are all stored in one level of sub-directories within ES. I won't list the whole structure, but, part of it looks like this:

\ES
\ES\DEV
\ES\NEWPROGS
\ES\SOURCES
\ES\RAMDISC
\ES\PRTSCRN
\ES\SUPRSTAR

....and so on, totalling 22 subdirectories containing about 3.25Mb of data.

DEV and NEWPROGS are development and testing directories, so their contents tend not to remain fixed for long. SOURCES changes less often, only when a modification to an existing program is completed or if a new one is added.

Finally, from RAMDISC onwards, the ES issue directories are virtually static, the only changes being bug-fixes (none for years, of course) or if a spelling error or 'typo' is corrected in the documentation.

You can see, given that the initial backup included everything, I need to secure DEV and NEWPROGS fairly frequently, SOURCES less often and issue directories rarely if ever. Apart from the differing frequency required by these back-ups, the total volume of ES, even compressed, is obviously far too much to fit onto a single floppy. So how do I handle it?

REDUCING RISKS

Like other systems, PKZIP can create single archives spanning multiple floppies, but it's an option I prefer not to use. Instead the ES back-up is split into three jobs, one for DEV and NEWPROGS, one for SOURCES, and one for the issue directories. You might favour multivolume archives but I don't, here's why.

First, if one disc in a multi-volume set becomes faulty, the whole set is at risk. Perhaps you can re-create a faulty disc from a duplicate copy of the set, but remember that this requires an exact duplicate; nothing else will do. If each back-up disc is independent and one develops an error, the worst that can ever happen is that you lose the contents of that one disc; it can't affect the rest.

Second, when you update a multivolume set (in all the systems I know) you have to update several if not all of the back-up discs each time. At best this is likely to be the first disc (holding the index), the disc or disc's containing the files to be updated and the last in the set (or an extra disc) which is where copies of amended or new files are usually added. As mentioned, if you're wise you'll have two exact duplicate sets, so you actually have to do the whole job twice, or copy each of the amended discs every time, (but beware, if you use COPY and miss a disc, or there's an undetected error in the first set....!).

The 'independent disc' approach avoids all this. The only discs you handle are those containing amended files, and so long as your back-up software is fast enough (mine is), you simply make two originals of every back-up disc.

Spanned-volume archives are inflexible and seem to offer no benefits to balance the inconvenience and potential risks. You can disagree, but I have had a case (only one admittedly, but it's enough) where both copies of a back-up disc were faulty.

PERFORMANCE

Last month I said that PKZIP commands could become complex. However, they don't have to be, so let's take a look. The general format of the command is:

PKZIP [option] <archivename> [file-list]

where [option] can be one or several action modifiers, <archivename> (which can include a drive/path specification) is the output archive's filename and [file-list] is a list of files to be processed.

If omitted [option] defaults to 'add', which writes a new version of all (source) files to the archive, whether the file or the archive already exists or not – if they don't they're created. By default [file-list] is everything in the current directory if no list is supplied. The archive name must be specified of course, but the extension ZIP is added automatically.

Suppose then I want to secure the entire contents of ES\SOURCES to drive A:. If my current directory is ES\SOURCES and the archive is called ES_SRCES, the command can be as simple as:

PKZIP A:ES_SRCES

Everything but the archive path and name are defaulted, so this couldn't get much easier.

I also said compressed archives are much more efficient than direct copies, so by how much? ES\SOURCES contains 51 files totalling 1.628Mb. This, of course, would need at least two 800K floppies using COPY, perhaps three, but I tried it to quantify the task.

After 2 mins 55 secs the expected 'disc full' message appeared, with 29 files copied. I lost interest at that point because copying the remaining files would obviously take a great deal of manual entry, hence the next disc (or two) would take very much longer. Let's be very generous and suppose the whole job could be done in 10 minutes.

Next I timed the job using PKZIP version 2.04g. The results speak for themselves. ES_SRCES.ZIP was completed in under three minutes, it contained all the files, and was less than 325K. This is a compression ratio of better than 5:1, or put another way I could get over twice as much data on the disc in less time than taking one direct copy. Of course, this data was text so compression should be good, but the comparison is valid because the same real data was used which was identical in both cases.

What about a 'poor' example, so as not to be accused of bias? OK. I also archived my word processor's directory to drive A:, producing an archive file of 250K in 1 min 40 sees,. The compression is very impressive and worthwhile, especially given the file types, but here the elapsed time was slightly more than for a direct copy (1 min 18 sees).

This further reinforces the earlier point, that securing applications should be avoided unless it's justified by changes in the installed software. Despite the result though, archiving of applications is still worthwhile as we'll see when we look at recovery.

DEVELOPING A SYSTEM

Securing a single directory is so simple it doesn't justify a batch file, but when things get more complicated this is worthwhile. Next month I'll explain how ES back-ups work, using parameter-driven batch files. If you didn't realise it, this means I'm leaving you with a little puzzle.

No problem? OK, try this. Create a batch file MAIN.BAT containing the following lines.

%1
%2
%3

Now create three more, ONE.BAT, TWO.BAT and THREE.BAT, each of which simply announces itself on screen. ONE.BAT should contain ECHO This is ONE.BAT and so on. Now enter:

MAIN ONE TWO THREE

You might expect that, with these parameters, MAIN will call ONE.BAT, followed by TWO.BAT then THREE.BAT. It doesn't! The puzzle is to see if you can make MAIN work before next month's Forum. Oh yes, I almost forgot! MAIN must be able to run any combination of ONE.BAT, TWO.BAT and THREE.BAT, from one to all three, in any order.

Add or change anything (except the names) but use only standard batch commands. Extra files (batch or otherwise) programs and manual interference are not permitted. See you next month!

Previous | Index | Next

About the Master 512 | Bibliography