Monday, March 22, 2010

PDF vs. DOC vs. DOCX vs. ODF vs. RTF

The following are a series of charts comparing different document formats:

General information

 
Native program1 Language2
PDF Adobe Acrobat PDF (a subset of PostScript)3
DOCX MS Word 2007– XML4
DOC MS Word 97–2003 Binary (0s and 1s)
ODT OpenOffice.org Writer XML
RTF MS WordPad RTF (similar to TeX)

  1. Native program. The program most-commonly used to edit the file type.
  2. Language. The computer language used to create the document. In other words, if you decompressed the file and then opened it inside Notepad, it is the language you would see.
  3. PDF stands for Portable Document Format.
  4. XML (Extensible Markup Language) is a tag-based language that is similar to HTML.


Size of a hello-world document

Fonts not embedded Fonts embedded
PDF 6.0 KB 23.9 KB
DOCX 9.7 KB 940.0 KB
DOC 19.5 KB 192.0 KB
ODT 7.8 KB Not applicable.
RTF 3.8 KB (1.5 KB when zipped) Not applicable.


Features

Macros Font embedding Digital signatures
PDF Yes Yes Yes
DOCX Yes Yes Yes
DOC Yes Yes Yes
ODT Yes No Yes
RTF No No No

Features (continued)

Font sub-setting1 Embedded video Bookmarks
PDF Yes Yes Yes
DOCX No No Yes
DOC No Yes Yes
ODT No Yes Yes
RTF No Yes Yes

  1. Font sub-setting. Unlike font embedding, where all the characters belonging to a font family are inserted into the document, font subsetting only embeds the characters used in the document in question (e.g., A, E, and F instead of A through Z). This reduces the file's size.


Conclusions

File size

So, which format is best? It depends on your needs. If file-size is the most important issue, choose RTF.  DOCX, ODF, and PDF files are larger, even though they are compressed by default, and RTF is not. RTF is plain text. You can open it in Notepad and see all the statements. For example, here's what I see when I open my RTF hello-world document with Notepad (not WordPad):

and here is what I see when I open up my DOCX hello-world document:

Is the second document XML? No. It's XML compressed into zeroes and ones. If you want to see the real XML, change the extension of the file from .docx to .zip and uncompress it. Then, you will see the XML file. Open that in Notepad.

So, to repeat, RTF is smaller than the other formats, even though they are compressed and RTF is not. When I zip up my RTF file, it shrinks to 1.54 KB. Not bad!

Features

When it comes to features, there's no comparison. PDF wins hands down. It's not just font-subsetting. PDF also gives you explicit control over the layers, view, and color of a document.

Layers.  Other formats like DOC and ODT also have layers, but they're hidden. The only time you deal with them is when you send an object behind another object. Adobe Acrobat has a panel devoted just to layers and when you open a PDF in Adobe Illustrator, you can edit them individually.

View.  You can also control how PDFs are viewed down to a minute detail. You can specify the initial zoom level and page when the document is opened. You can also allow or deny editing, printing, and copying of text on a case-by-case basis.

Color.  The PDF also gives you extensive control over the color of your documents. Computer monitors display everything in mixtures of red, green, and blue (RGB). Printers work in CMYK (cyan, magenta, yellow, and black). A PDF file can be saved in both modes. It can also be saved as an LAB (luminance a-b) image.

Macros

To be fair, some features are not universally liked. Macros, for example, can be used to spread virii. Macro viruses are especially common in Adobe PDF and Word Documents. Here's a summary of the types of macros each document format supports:

PDF: JavaScript
DOC: Visual Basic
ODT: Basic, Python, and JavaScript
RTF: Not applicable

So, if you like writing macros, OpenOffice.org may suit you best. If you prefer security, use RTF, as it doesn't support macros.

2 comments:

denise said...

Thanks for the clear and complete article on these file types! Sorry you had to wait 6 years to get a comment ;-)

Unknown said...

yes, thanks, simple and clear - and its a mess for non techies just trying to write