A Brief Introduction to OpenOffice.org Writer Files Agoura Hills CA

Get started on the path to taking programmatic control of your files and adding value to OpenOffice.org's no-cost price tag by using XML.

Local Companies

U-PIC Insurance Services
800-955-4623
28001 Dorothy Dr
Agoura Hills, CA
BPI
(818) 347-7122
24500 Indian Hill
West Hills, CA
Astoundit Software, LLC
805-492-2506
1534 N. Moorpark Rd
Thousand Oaks, CA
Murano Software, Inc.
(818) 340-5032
21323 Dumetz Road
Los Angeles, CA
FocusBiz Software
818-635-6335
CA
Callbox, Inc.
(310) 439-5814
16770 Encino Hills Dr. Suite 200
Encino, CA
Pixel by Pixel
310.487.6912
1742 S Bentley Ave
Los Angeles, CA
I(2) Drive
310 498-2955
2344 28th Street
Santa Monica, CA
TimeTECH - Customizable Time and Attendance / Workforce Management Solutions
905-677-7009
7420 Airport Rd 203
Mississauga, CA
Valortek
866-640-8181
Beverly Hills, CA

A Brief Introduction to OpenOffice.org Writer Files

provided by: 
Originally published at Internet.com


Open Office.org Writer, a no-cost, open-source answer to otherwise pricey, commercial word processing applications, stores its files with an "odt" extension. Even users with casual familiarity with Writer may be surprised to know this file is nothing more than a standard "zip" file full of XML files. The implication behind this fact is this: armed with a little knowledge of these internal files, you can programmatically create and edit them.

In this article, I will discuss some of the basic concepts relating to the ODT file itself. I will not discuss how to actually use OpenOffice.org Writer itself—the lessons involved to gain efficient competency with any word processing application would provide ample material to fill a book.

Opening the ODT File

To get started, you of course need an ODT file. Once you have one, unzipping the file gives you (among other things) four XML files: * content.xml: The actual content of a document. * meta.xml: Meta-data such as creation date, editor, and statistics (word count, and so forth). * settings.xml: OpenOffice.org program settings and preferences local to the document itself. * styles.xml: Formatting styles (for paragraphs, characters, and so on) defined by OpenOffice.org and by the author.

Additional files and directories will turn up (some of them depending on just what is in the document, such as a possible "Pictures" directory). However, for the purposes of this article, I will mainly discuss the content.xml and styles.xml files. The main reason for dismissing the presence of the additional files is this: If you're reading or editing material in an existing ODT file, their presence does not generally matter anyway, and if you are creating a new document programmatically, the simplest way is to "edit" an empty template file and save it to a new name. This is perhaps the safest strategy to use because it lets you focus entirely on the content of the document you want to process.

Content

The content.xml file, as mentioned above, is the real meat of an ODT file: Your actual document material is stored in this file. Feel free to open the content.xml file in any text or, better yet, XML editor you have available. You should see something similar to this:

Note that I am taking some liberties to omit the elements I won't discuss, and I am sure you will forgive me for not listing all twenty-two namespace declarations in the root element. Once you have navigated to this point, note that the XML structure of your actual document material is stored in just a handful of "block-level" elements: Your Heading Here Some paragraph of stuff here. First list item text here.

The "inline-level" (that is, the character-level) formatting entities occur in span elements: Some fancy text here.

Aside from the container elements for tables and images, you know enough of the basic document structure to read or edit an OpenOffice.org file. Granted, there are still a few things worth knowing. Notably, in the above snippet examples, the ellipses are omitting the rather important "text:style-name" attribute that defines the formatting information for the material contained in that element.

Blah blah blahyadda yadda.

You can get a line break in Writer by pressing SHIFT+ENTER instead of just ENTER.

Also, the "list item" element above shows a single paragraph element. However, multiple paragraph elements are possible here. (Visually, you get a separated paragraph without an additional bullet or number.) You can get these in Writer in the following way: When you type text in a given list item, pressing ENTER takes you to the next list item; however, immediately pressing BACKSPACE once returns you in the previous list item but leaves you in a new paragraph. (And immediately pressing BACKSPACE a second time takes you out of the list mode entirely, returning you to a new standard paragraph.)

Finally, as far as the content.xml file is concerned, bulleted and numbered lists are both just lists. The formatting style of the given list and list items determines whether a bullet or a number is used to represent it in the interface to the author.

Formatting Styles

Defined styles

One thing that is not obvious to casual users of Writer is this: The most "correct" formatting your content in OpenOffice.org involves the use of pre-defined or user-defined styles in the "Styles and Formatting" tool. From here on, I will simply say "defined style" to mean both pre-defined and user-defined styles. There are style families for page-level entities (such as page and margin sizes), paragraph-level entities (which includes headers and titles), character-level entities (such as emphasis on just one word in a paragraph), and list entities. Note there are toolbar buttons in the Styles and Formatting tool window of OpenOffice.org that correspond to these families. The actual definitions of these defined styles—which font to use, what color the background is, and the like—is stored in the styles.xml file.

Moreover, defined styles behave in an inheritance fashion within their family: A given defined style is related to a parent style. In nearly object-oriented fashion, a change to a top-level defined style formatting entity (such as text color) propagates downward through related descendant styles until those descendant styles override that formatting entity. Direct and controlled use of defined styles can give some nice control and semantic meaning to content that is otherwise totally lost in the hapless application of the usual formatting buttons. Sometimes, it is important to know "why" something is in a red font face, and a defined style preserves and conveys this meaning ... not to mention gives you one central location to change all the instances in the document to a blue font face instead.

The name of the defined style in the styles.xml file generally matches the name the author sees in OpenOffice.org's interface. The biggest caveat to that statement is that special characters, including the space character, are converted to their hex-code value representation, which is then surrounded by underscores. (There is also the notable exception that the style seen as "Default" in the user interface is labeled as "Standard".) In other words, if an author applied the "Heading 1" defined style to a paragraph of text, the content.xml file would include: Your Heading Here

The "Heading_20_1" style itself, as mentioned above, is detailed in the styles.xml file itself. These details include: * That it belongs to the "paragraph" family. * That its parent style is a style simply

Digging Beyond the Surface

As may be rather evident, the above discussion barely scratches the surface of the ODT file format. However, with the knowledge I have hopefully imparted thus far, you should have little trouble "reverse engineering" the parts you need for yourself: Simply start with a blank document, add (only) the element you would like to understand, such as a table, and then save the document, unzip the ODT file, and open the content.xml file in an editor. Search through the file for a piece of text you inserted and then pick things apart. I assembled a thin folder of printouts relating to how an ODT file implements various aspects of a document; I have found the ODT file to be remarkably accessible.

Real-World Applications

At my workplace, the content material we manage, which turns into deliverable PDF files for customers, is stored in DITA XML topic files. (DITA stands for Darwin Information Typing Architecture.) This topic-based storage and management has served our in-house tech editors rather well, affording such things as content re-use, single-sourcing, and conditional content filtering. However, the content owners, those technical people closer to the product itself who own and author the raw content material, are not proficient with the DITA schemas—nor should they be.

To address the gap, we implemented an XSLT-driven process that converts the DITA material that was sent into the PDF files into a fresh "content.xml" file. This lone content.xml file is then zipped up into an otherwise blank ODT file and is made available for the content owners as part of the nightly build process. The content owners then are able to easily make edits to their content and, thanks to Writer's change-tracking feature, the tech editors can locate these changes, finesse the grammar and wording, and incorporate the changes back into the DITA XML files. I will admit right now that this particular XSLT was very complex to write—it involved step-debugging a highly recursive transform—and does not fully support the look and feel of the material as seen in the PDF file. But, for in-house editing purposes, this isn't strictly relevant: the content owners can review and approve the finalized appearance from the same PDF files that are delivered to customers.

Alternatively, you may have a process that you need to take in the other direction. Another use case involves creating a true "template" in Writer (an "ott" file) that can be given to a working group as a rather fancy fill-in form. The material in the resulting content.xml file can be scanned for (either by direct location, or by some applied style name, or via other mechanisms) and converted out to some other format. Consider the possibility of turning a "requirements" Writer document into a working skeleton for some test cases.

Closing Remarks

Hopefully, I have shown you just enough of OpenOffice.org's Writer file format to open up some possibilities for you to use it in new ways. By taking a blank document or document template, you can edit or replace the body section of the embedded content.xml file. By taking an existing document, you can find the content and transform it for other purposes. As the material inside the ODT file is readable text and reasonably well-structured XML, it is wide open to full, external, programmatic assault. Being able to take control of your own documents in this fashion is nothing short of powerful.

About the Author

Rob Lybarger works in a small IT shop in the greater Houston, TX area. Among other duties related to Ant and Java, he has written XSLT mentioned in the Real-World Applications section of this article and also performed various in-house customizations of the stock DITA processing and formatting stylesheets. At home, Rob enjoys spending time with his four month old daughter and being a highly satisfied owner of aa Mac computer.

Author: Rob Lybarger

Read article at Internet.com site

Featured Local Company

U-PIC Insurance Services

800-955-4623
28001 Dorothy Dr
Agoura Hills, CA
http://www.u-pic.com

Related Local Events
Windows Hardware Engineering Conference
Dates: 11/5/2008 - 11/7/2008
Location: Los Angeles Convention Center
Los Angeles CA
View Details

Digital Video Expo
Dates: 11/4/2008 - 11/6/2008
Location: Los Angeles Convention Center
Los Angeles CA
View Details

IT ChannelVision Fall 2008
Dates: 10/4/2008 - 10/8/2008
Location: Hyatt Regency Los Angeles
Los Angeles CA
View Details

IT ChannelVision Fall 2008
Dates: 10/4/2008 - 10/8/2008
Location: Hyatt Regency Los Angeles
Los Angeles CA
View Details

Biotechnology Vendor Showcase - University of California, Los Angeles
Dates: 10/2/2008 - 10/2/2008
Location: UCLA
Los Angeles CA
View Details
Rate Article
     
Articles Insider

Rss   Delicious   Digg   Add To My Yahoo   Add To My Google   Bookmark   Search Plugin

Topics:
Advertising Engineering Home Services Retail & Consumer Services
Business Services Entertainment Industrial Goods & Services Software
Career Family Insurance Technology
Cars Financial Services Internet Telecommunications
Computer Hardware Food & Beverage Legal Transportation & Logistics
Construction Health Pets Travel
Education Home Electronics Real Estate Wedding