80+ Year Documents
Documents in aerospace need a lifespan of 80+ years. To imagine what that means in terms of computer industry, think back to 1933. The first programmable computer still had to be invented. So there was no digital document format yet. But if there were, whatever file format was useful at that time, should still be available today. It can be copied to more current media, but the content needs to remain available.
Is 80+ years really necessary? A Boeing 747 was conceived (and first documented) mid 1960s, produced since 1968, and is still being produced today. That's almost 50 years. A Boeing 747 can be in service for up to 30 years. The documentation from the start must remain readable until the last service day of the last aircraft. Even if production of Boeing 747s stopped today, those first documents would be kept for up to 80 years since they were written.
To provide some context, Microsoft and Apple did not exist when the first documents for the Boeing 747 were written, and both could conceivably no longer exist by the time these documents are last needed. So Microsoft Word or Apple Pages may not be the best document format for aircraft designs. Aircraft designs are usually confidential, which means that any tool that uses the Internet for storage cannot be used in the design phase. Given the current evolution of the computer industry to cloud storage, it is entirely conceivable that very few of tomorrow's commercial document generation tools can be used in aerospace.
So one must think carefully about the format to store any aerospace information. Especially when you consider the amount of documentation that is created in building an aircraft. An industry semi-serious joke is that you need the weight of the aircraft in paper.
The PDF file format is commonly used in aerospace as a long term storage format of documents. PDF has stood the test of time so far. PDF dates back to 1993, which makes it two decades old. Given the ubiquitousness of PDF, it is generally considered safe to assume that the format will remain available for the lifespan of currently conceived aircraft.
So the PDF file format serves it role well from an archival perspective.
However, a problem arises when a PDF is changed. It is common practice to use change marks in PDF documents to signal to the reader what has changed. As a reader/reviewer, you either have to assume that the change marks are correct, or you have to re-read the entire (usually lengthy and technical) document. There are no good tools that can compare if two PDF's are correct, and there certainly are no qualified tools out there that have been proven to always work correctly to the quality demands of aerospace. Developing a qualified PDF comparison would be challenging because two PDFs can contain exactly the same visible information, but be binary completely different. Even extracting just the text does not work given that there are many ways to binary represent the same text inside a PDF, and the flow of text makes it hard to compare two documents.
So while the PDF file format is great for archiving, it is terrible for reviewing and proving compliance.
When it comes to auto-generated PDFs, things are even worse, especially when generating TeX or LaTeX as an in-between step. The tools required to generate the PDF are themselves not qualified, so there is no way to know if the tooling did not introduce errors. It is certainly possible the generator did not understand a code or a typo and left out the section or put in some unexpected content. Combined with the complexity of reviewing, generated PDFs are not safe to use for aerospace unless they are reviewed from start to end every time. This makes it a needlessly wasteful way of creating compliant and reviewed documents.
There is a format that is guaranteed never to change its content in unexpected ways, has the longest lifespan of all known formats, is easily compared for differences, and is easy for automation: plain text.
From a certification point of view, plain text is perfect. It only contains what is needed, hides nothing, and creating qualified tools to compare them is easy (we have them, as I am sure many others have). When we have two text files, we can easily proof the differences between them, without relying on change marks or having to reread the entire document.
But plain text has long been abandoned in favor of PDF because text doesn't allow to uniformly mark differences, highlight sections, and provide other markup.
However, in recent years, a consensus is growing on how to format text format. It started with Markdown, introduced by John Gruber, who writes:
[...] “Markdown” is two things: (1) a plain text formatting syntax; [...]
The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. While Markdown’s syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown’s syntax is the format of plain text email.
Herein lies the genius of Markdown:
- It promotes the plain text itself as the canonical format;
- It still allows to output nicely formatted document should this be desired.
From an aerospace certification point of view, text files are perfect. Using the Markdown annotation, we can agree and communicate the formatting rules we use for formatting plain text files, which makes them pretty, and more importantly, easily understood by all parties.
Others in the industry have picked up the idea of Markdown, and extended it in much needed ways. Two of the most important extensions to Markdown in the context of aerospace are:
- Tables: needed across the board when developing documents;
- Critic Markup: change marking in plain text format.
These extensions are only conventions on how to format them in plain text, and hence do not take away any of the advantages of plain text. MultiMarkdown is a popular format that covers these and other extensions to Markdown. Other extensions provided by MultiMarkdown, such as footnotes, and generalizations across the board, are very nice extensions to the original Markdown format. By MultiMarkdown we mean the format, not the tool.
As a side note, the MultiMarkdown format is not perfect because it sometimes violates the Markdown readability principle. For instance, its inline-math-formula notation seems designed for machine reading instead of human reading. The proposed format is not the way someone would write the formula in plain text email. An uninformed reader 80 years from now would not intuitively understand the formula. It is also unfortunate that the math notation assigns meaning to symbols that may not be intended by the author of the plain text (such as $$).
HTML predates PDF. As long as no browser-specific code are included and only standard HTML is used, any document written in the past is still perfectly readable today. Given the nature of the web, it is almost guaranteed that HTML will remain in existence for longer than PDF.
One argument people have made in the past is that unlike PDF, HTML does not retain proper formatting. It is true that especially older Internet Explorer versions have not always rendered HTML correctly. Assuming one talks about static documents with no dynamic behavior, standardization today has reached the point where one can write pixel-perfect documents that are 100% standard HTML+CSS, and render the same in any browser on any platform.
Comparing HTML documents for changes can be done in a aerospace qualified way. However, in general, Markdown plain text files are nicer to compare and edit than HTML. On the other hand, for human reading HTML is nicer than plain text, especially for hyperlinks that reference other sections within the same document, and tables. There are enormous numbers of tables in aerospace documents, sometimes with lengthy content that can wrap in HTML but not in plain text, so for this reason alone HTML is an interesting format.
It would be possible to qualify a MultiMarkdown to HTML converter (the required qualification level would be TQL-5). This would allow the human to review the HTML document, and consider the plain text as reviewed since it is guaranteed to have the same content. Editing and comparing for changes happens in the plain text file. But reading and reviewing could optionally be done in HTML if the reviewer prefers this. This approach would be compliant with the most rigorous certification demands.