File-Format Obsolescence

In the November 2017 Scientific American David Pogue introduced the idea of file-format rot.   His idea is that not only does computer media become obsolete, but file-formats change, programs disappear, and older files can no longer be read.   He points out that such problems can occur even even between version of software from the same vendor.

I encountered the most insidious kind of file-format obsolescence years ago while working for the Boeing Commercial Airplane Company.   The company requirement for retention of commercial airplane contracts is fifty years after the last plane is delivered.   Since an airplane contract can specify deliveries over many years there are plenty of old contracts saved in the document repository.   Some were created in WordPerfect and no one expects to be able to read a wpd file nowdays, but many were created using Microsoft Word and Word still reads doc files, one would think.

In fact, Microsoft dropped support for older (prior to Word 95) doc files and other Office files years ago.   Using an intermediate version of Word to upgrade the file format was not an acceptable option because the file, and therefore the contract, would be changed in an untraceable way.   Fortunately, Apache OpenOffice and the Document Foundation's LibreOffice still can read old MS Word documents, at least for now.

Defending against Obsolescence

Everyone knows about hardware obsolescence.   We all have copied files from floppy disks to diskettes to zip drives to CD-roms in an effort to keep our old files.   Now people are keeping files on USB sticks or just letting them spin around on ever larger hard drives.

File-format obsolescence is a bit more difficult.   I still have some wpd files lurking in dark corners of my file system that I will never read again.   Software vendors going out of business is one thing, but what do you do if you can't trust a vendor to maintain compatibly with their own older file formats?

Standards

One defense is to rely on international standards.   The ISO committee demands full documentation and a reference program be submitted before a file-format can be declared a standard.   The Open Document Formats (Apache OpenOffice, the Document Foundation LibreOffice, Googledocs, and others), Portable Document Format (PDF), and even Microsoft's Office Open XML (docx, exclx, etc.) are ISO standards.  

ISO standards help, but there still is no guarantee.   For example, Microsoft does not seem too keen on adhering to their own standard.

Software

The more vendors of a given software product or file-format there are the greater chance that at least one of them will continue to support the format.   Apache.org, The Office Foundation, Google, and others support the Open Document Formats (odt, odc, etc). All of them support the same file format precisely because it is an international standard.   Should one of them go out of business, the othere can be expected to continue with the same file format.   A single vendor puts you at the whim of the vendors marketing department as to what will be supported in the future.

Open Source software enhances this concept.   Apache, The Office Foundation, Google, IBM, and others have made their software openly available.   If you are dissatisfied with the products they provide, you can download the source code from Sourceforge and build your own copy of the application.   That is even true if the companies go out of business.   That implies that the file formats will continue to be supported for a long time.

So...

The upshot of this blog is:   Use a file-format that is an ISO standard and pick a format that is supported by multiple open source vendors.   As an extra advantage, open source software is free of charge and, at least for office software, the quality exceeds that of proprietary software.

This blog has been preoccupied with office formats.   File formats for photograhs have been much more stable.   I will look into image file formats in another blog.