Previous | Index | Next

Master 512 Forum by Robin Burton
Beebug Vol. 12 No. 7 December 1993

This month we conclude our look at PKZIP back-ups and see how PKUNZIP reverses the archiving process when disaster strikes.

Unfortunately space won't allow us to investigate all the PKZIP options, but we should be able to cover enough of them to allow you to build a reasonable backup system to cater for most needs.

RECURSING DIRECTORIES

So far we've talked only about securing single directories, even though the batch job to do it may be one of a number doing different back-ups but which are related. One reason for that approach is that it makes PKZIP basics easier to explain, but equally very often when you update files they're all in one directory, so that's frequently how you'd use it anyway.

However, there's a trade-off between keeping each back-up job separate for simplicity and ending up with lots of jobs that are each too small to be practical. Obviously on a winchester the directory structure can be more complex than on floppies, though for efficient performance it's a good idea not to get too carried away with this idea. But in general, even on a hard drive, the more directories you have the smaller they tend to be. Conversely, on floppies despite the smaller capacity there's often a need for a directory structure, whether it's demanded by an application or your own convenience.

There comes a point therefore, where single directory back-ups, whether from a floppy or from a hard disc, would require several batch jobs but the amount of data just doesn't justify the complexity. The answer in these cases is directory recursion.

As the name implies, this option tells PKZIP to base the start of its activities on the specified or current directory as before, but also to include sub-directories in the operation. Like many PKZIP options the letter used for the switch is logical, 'r' (recurse) but this option must be used carefully or the results may not be quite what you expect.

POINTS TO WATCH

Judging by my mail, difficulty arises in recursion for back-ups more than for most other PKZIP operations. To be blunt this is a case of not reading the documentation thoroughly, but even so it's quite common. Although a command in the general form:

PKZIP -r <archive-name>

will certainly recurse all subdirectories in the current path and compress all the files found too, as it stands the original directory structure is lost by this command. This means that if you subsequently recover files from an archive created by the above command, although the back-up will have been completed OK, all the files will be extracted into a single directory.

That's not to say that recursion used this way isn't useful, it just isn't suitable for back-ups. An illustration will make this clearer, so let's take a look.

Suppose you have a directory which contains two or three subdirectories, the contents of each of which is similar but they each include several different file-types. Further, if some of the files are temporary working files which needn't be secured (say with the extension .WRK) while all the permanent files have varying extensions, then the job of extracting the permanent files from several subdirectories by normal DOS methods, assuming you want to keep the temporary files, would be awkward to say the least.

However, recursion (with exclusion) offers a very easy way to archive all these permanent files in one simple operation. Refer to last month's article for the exclusion option if you need a refresh.

Assuming the PKZIP command were issued from the top-level directory of this structure, to compress all the permanent files into a single archive file the command would be:

PKZIP -r <archive-name> -x *.wrk

This would recurse all subdirectories in the current directory, compressing all files into the specified archive file except those excluded by the '-x', in other words those with a .WRK extension. As in my root directory back-up, the exclusion directive can also be in the form '-x@' followed by a filename containing a list of excluded files. In this case any number of files could be excluded by the list, using wildcarded names, by explicitly naming each file in turn or by any mixture of the two.

Likewise, to back-up all the files of one particular type from a directory structure is just as easy if not easier. Suppose that we wanted just files with a .TXT extension, then the command:

PKZIP -r <archive-name> *.txt

is all that's needed. Nothing else but files with a .TXT extension will be archived by this command. Obviously you can also specify other options to refine this operation, including those described last month, or a number of others which could specify files older (or newer) than a given date, those not already archived and so on.

Used like this, PKZIP recursion offers powerful facilities which can make some jobs much easier, but of course our main interest here is security back-ups, so let's get back to that.

RETAINING STRUCTURE

What's usually needed for back-ups is an archive file which safeguards not only the files it contains, but also retains full information on how the files were originally stored. If this is done, when the day of disaster arrives such as a FAT corruption or the kids using your disc as a frisbee you can easily re-create a new master disc which will be an exact duplicate of the original.

The 'p' option (path) is used to preserve the full path structure of subdirectories, but note that this is another PKZIP directive which is case sensitive, the lower case version is what we need here (upper case 'P' recurses only specified directories.) When the command is intended to not only recurse subdirectories but also to preserve the entire directory structure, the command therefore is:

PKZIP -rp <archive-name>

This tells PKZIP to recurse subdirectories (from the current or specified directory) but also to record the original path of each of the files being archived.

Archives created like this can then be recovered in their entirety, which we'll examine next, to totally re-create a lost disc or path, but despite that they lose no flexibility over simpler archives. These can also be searched to recover a single file like a simple archive, and they can also be updated selectively by any of the methods we've previously looked at.

When you have several small directories sharing the same path and want to secure not only the data but the structure too, this is how it's done. I tend not to use recursion very much because on a hard disc most paths contain too much data for a single backup floppy even when the data is compressed, but for floppy users, or if directories are small, recursion can simplify (and shorten) the job considerably.

RECOVERING

So far we've looked at several of the most useful basic PKZIP options for backing up files and directories, but having given you the essentials I'll leave you to explore the less commonly used archiving options yourself.

It's now time to examine the reverse process. Files in a .ZIP archive can't be accessed directly, though there are tools such as SHEZ which can help you to view the contents of a .ZIP file 'in situ'. However, I tend not to use such programs; again I choose what I think is the the simplest route, PKUNZIP.

We saw the simplest PKUNZIP command last month:

PKUNZIP -t <archive-name>

This tells PKUNZIP to test the integrity of every file in the specified archive. Each filename is displayed in turn, including its original pathname if appropriate, and PKUNZIP then checks the file against its 32-bit CRC. If all is well it says 'OK' for each file, while any discrepancy causes a report that the file is damaged. If you have the complete ZIP package you'll have a program called PKZIPFIX, which can in some circumstances recover individual files within an archive, but which will in any case fix the .ZIP file so that its other undamaged contents can still be recovered. Of course, most of the time there's no problem, so mainly I treat this option as an easy way of checking what a particular archive contains.

In my own experience the only reason for problems is the back-up floppy itself, so the best insurance against that sort of trouble is duplicated back-ups, preferably not kept in the same place. I can tell you that only once have I ever had two faulty back-ups of the same data, but in that case both discs had developed a bad sector, so PKZIP wasn't responsible.

A second potential cause of trouble is your version of the PKZIP software. To extract files from a ZIP file obviously requires PKUNZIP, which is of course supplied as part of the PKZIP package. However, PKUNZIP is often supplied on its own on shareware discs to allow you to unpack the shareware disc's contents. The point to watch is that the version of PKUNZIP used is at least as recent as the version of PKZIP that produced the archive file.

The copy of PKUNZIP on a shareware disc will certainly handle everything on that particular disc, but do make sure your master copy of ZIP and UNZIP is the latest in your possession and that they're both the same version.

If you try to use a version of UNZIP to extract from a ZIP file produced by a later version of PKZIP you'll probably get the message "Sorry, I don't know how to handle this file". This isn't inevitable, all later PKZIP versions do support all previous compression standards, and an old form of compression could have been specified during compression (this is another option) but if you do see this message there's no alternative, you need a later version on PKUNZIP.

PKUNZIP is at least as easy to use as PKZIP and the general format of commands is similar too, as you'd expect. For example , the command to extract all the files from an archive called 'SECURITY' on drive A: to the current directory on the current drive would be:

PKUNZIP a:security

As before the option is defaulted, so it becomes '-e' for 'extract', the ZIP extension on the archive file is defaulted too, and the destination for the output data is the current directory and drive since no output path is supplied. Of course an output path can be specified too, so for example:

PKUNZIP a:security c:\recoveries\*.*

will unzip all the files in the archive to a directory called \RECOVERIES on drive C:, but note that in this case the target directory must pre-exist. Naturally files are most often restored to their original location, so in practice this option isn't needed very often.

Remember that if the archive contains only files and no directory structure data, PKUNZIP won't know anything about where they came from, so recovery is always to the current directory unless you specify otherwise. Of course, if the archive included recursed directories with paths preserved (PKZIP options -rp), PKUNZIP will still allow you to recover any file to any directory, but life will be much easier for full recoveries.

For example, if a master disc is a total write-off or when you've had to repartition a hard disc, the original directories won't exist, so you have two options.

One is to create the required directories yourself before the recovery, perhaps in your recovery batch files, but the other and easier option is to let PKUNZIP do it for you completely automatically. If the data was secured with both the 'r' and 'p' directives the original path is preserved along with the files, so:

PKUNZIP -d <archive-name>

can be used to tell PKUNZIP to recover all the archived files to their original directories. If the target directories don't exist at the time, PKUNZIP will create them for you as it goes along; if they do already exist they're used.

At this point I have a small confession to make. I deliberately omitted mention of the (specified) path and recursion options from my back-up examples for clarity, but in my live system I always use them, even if I know there's only one directory to secure. This is so that, if I have to perform full recovery, PKUNZIP will rebuild the complete subdirectory structure of my hard disc for me, while if I need to recover only one or two files I can still do that into any directory I want, including the original location if necessary.

To make sure this point is clear, consider the ES archive routines again and mentally add this extra information. All the \ES subdirectory back-ups take place from within \ES itself, so for \ES\SOURCES for example, the command to back-up to drive A: is:

PKZIP -rP a:es srces \es\sources\*.*

This retains the full path of \ES\SOURCES in the archive and secures all its files too. When I recover, if either \ES or \ES\SOURCES doesn't exist they will be created for me; if they do exist it doesn't affect the recovery.

Note that I use the upper case path option, otherwise all the other \ES subdirectories would be included too and the job would fail because the back-up disc would become full. Also note a couple of other important items. Because I've used the specified path option I have to add the full path (from the root) of the directory to be archived, and I have to specify at the end of the path that all files (*.*) are to be included. Without the wildcarded filename only the directory structure of \ES\SOURCES would be saved and if SOURCES did contain any subdirectories these too would be ignored.

Although PKZIP operations can be extremely simple, when you use some of the more sophisticated options be aware that things do get complicated and your file and directory specifications must be absolutely precise. If you do need such options I'd advise that you test commands very thoroughly manually and ensure that the secured data can be accessed in exactly the way you expect too. Only when you're absolutely sure everything works should you build these commands into batch routines on which the safety of your data will depend. Remember, if eventually you need to recover data these routines MUST work; it will be too late by then if you find they don't.

AND FINALLY

That rounds off our look at PKZIP, but watch this space for other (I hope) interesting disc offers.

At the moment next month's topic is as much of a mystery to me as it is to you but in the meantime if anyone has specific queries on the points covered in the last three articles drop me a line and I'll try to help. If you do write, on this or any other topic, please note that I have recently changed jobs and in consequence replies to your letters may take a little longer in future.