A Brief Introduction to OpenOffice.org Writer Files Aptos CA

Get started on the path to taking programmatic control of your files and adding value to OpenOffice.org's no-cost price tag by using XML.

Local Companies

Fiorano Software, Inc.
408-354-3210
718 University Avenue, Suite 212
Los Gatos, CA
AuditSolutions
(408) 997-4888
3150 Almaden Expressway #250
San Jose, CA
Expert Computer Care
(619) 593-8303
1177 Persimmon Ave
El Cajon, CA
Orenburg USA
(858) 794-5330
380 Stevens Ave Ste 315
Solana Beach, CA
Irvine Training Company
(949) 715-0864
7545 Irvine Center Dr
Irvine, CA
Dj Concepts
(818) 891-9975
North Hills, CA
Fortiviti Inc
(619) 692-9000
7580 Metropolitan Dr
San Diego, CA
Oasys Consulting Inc.
(714) 633-9750
Orange, CA
Committee For Safety of Foreign Exg Student
(760) 295-0716
1376 Darwin Dr
Oceanside, CA
Tower Software Inc.
(559) 432-3105
Fresno, CA

A Brief Introduction to OpenOffice.org Writer Files

provided by: 
Originally published at Internet.com


Open Office.org Writer, a no-cost, open-source answer to otherwise pricey, commercial word processing applications, stores its files with an "odt" extension. Even users with casual familiarity with Writer may be surprised to know this file is nothing more than a standard "zip" file full of XML files. The implication behind this fact is this: armed with a little knowledge of these internal files, you can programmatically create and edit them.

In this article, I will discuss some of the basic concepts relating to the ODT file itself. I will not discuss how to actually use OpenOffice.org Writer itself—the lessons involved to gain efficient competency with any word processing application would provide ample material to fill a book.

Opening the ODT File

To get started, you of course need an ODT file. Once you have one, unzipping the file gives you (among other things) four XML files: * content.xml: The actual content of a document. * meta.xml: Meta-data such as creation date, editor, and statistics (word count, and so forth). * settings.xml: OpenOffice.org program settings and preferences local to the document itself. * styles.xml: Formatting styles (for paragraphs, characters, and so on) defined by OpenOffice.org and by the author.

Additional files and directories will turn up (some of them depending on just what is in the document, such as a possible "Pictures" directory). However, for the purposes of this article, I will mainly discuss the content.xml and styles.xml files. The main reason for dismissing the presence of the additional files is this: If you're reading or editing material in an existing ODT file, their presence does not generally matter anyway, and if you are creating a new document programmatically, the simplest way is to "edit" an empty template file and save it to a new name. This is perhaps the safest strategy to use because it lets you focus entirely on the content of the document you want to process.

Content

The content.xml file, as mentioned above, is the real meat of an ODT file: Your actual document material is stored in this file. Feel free to open the content.xml file in any text or, better yet, XML editor you have available. You should see something similar to this:

Note that I am taking some liberties to omit the elements I won't discuss, and I am sure you will forgive me for not listing all twenty-two namespace declarations in the root element. Once you have navigated to this point, note that the XML structure of your actual document material is stored in just a handful of "block-level" elements: Your Heading Here Some paragraph of stuff here. First list item text here.

The "inline-level" (that is, the character-level) formatting entities occur in span elements: Some fancy text here.

Aside from the container elements for tables and images, you know enough of the basic document structure to read or edit an OpenOffice.org file. Granted, there are still a few things worth knowing. Notably, in the above snippet examples, the ellipses are omitting the rather important "text:style-name" attribute that defines the formatting information for the material contained in that element.

Blah blah blahyadda yadda.

You can get a line break in Writer by pressing SHIFT+ENTER instead of just ENTER.

Also, the "list item" element above shows a single paragraph element. However, multiple paragraph elements are possible here. (Visually, you get a separated paragraph without an additional bullet or number.) You can get these in Writer in the following way: When you type text in a given list item, pressing ENTER takes you to the next list item; however, immediately pressing BACKSPACE once returns you in the previous list item but leaves you in a new paragraph. (And immediately pressing BACKSPACE a second time takes you out of the list mode entirely, returning you to a new standard paragraph.)

Finally, as far as the content.xml file is concerned, bulleted and numbered lists are both just lists. The formatting style of the given list and list items determines whether a bullet or a number is used to represent it in the interface to the author.

Formatting Styles

Defined styles

One thing that is not obvious to casual users of Writer is this: The most "correct" formatting your content in OpenOffice.org involves the use of pre-defined or user-defined styles in the "Styles and Formatting" tool. From here on, I will simply say "defined style" to mean both pre-defined and user-defined styles. There are style families for page-level entities (such as page and margin sizes), paragraph-level entities (which includes headers and titles), character-level entities (such as emphasis on just one word in a paragraph), and list entities. Note there are toolbar buttons in the Styles and Formatting tool window of OpenOffice.org that correspond to these families. The actual definitions of these defined styles—which font to use, what color the background is, and the like—is stored in the styles.xml file.

Moreover, defined styles behave in an inheritance fashion within their family: A given defined style is related to a parent style. In nearly object-oriented fashion, a change to a top-level defined style formatting entity (such as text color) propagates downward through related descendant styles until those descendant styles override that formatting entity. Direct and controlled use of defined styles can give some nice control and semantic meaning to content that is otherwise totally lost in the hapless application of the usual formatting buttons. Sometimes, it is important to know "why" something is in a red font face, and a defined style preserves and conveys this meaning ... not to mention gives you one central location to change all the instances in the document to a blue font face instead.

The name of the defined style in the styles.xml file generally matches the name the author sees in OpenOffice.org's interface. The biggest caveat to that statement is that special characters, including the space character, are converted to their hex-code value representation, which is then surrounded by underscores. (There is also the notable exception that the style seen as "Default" in the user interface is labeled as "Standard".) In other words, if an author applied the "Heading 1" defined style to a paragraph of text, the content.xml file would include: Your Heading Here

The "Heading_20_1" style itself, as mentioned above, is detailed in the styles.xml file itself. These details include: * That it belongs to the "paragraph" family. * That its parent style is a style simply

Digging Beyond the Surface

As may be rather evident, the above discussion barely scratches the surface of the ODT file format. However, with the knowledge I have hopefully imparted thus far, you should have little trouble "reverse engineering" the parts you need for yourself: Simply start with a blank document, add (only) the element you would like to understand, such as a table, and then save the document, unzip the ODT file, and open the content.xml file in an editor. Search through the file for a piece of text you inserted and then pick things apart. I assembled a thin folder of printouts relating to how an ODT file implements various aspects of a document; I have found the ODT file to be remarkably accessible.

Real-World Applications

At my workplace, the content material we manage, which turns into deliverable PDF files for customers, is stored in DITA XML topic files. (DITA stands for Darwin Information Typing Architecture.) This topic-based storage and management has served our in-house tech editors rather well, affording such things as content re-use, single-sourcing, and conditional content filtering. However, the content owners, those technical people closer to the product itself who own and author the raw content material, are not proficient with the DITA schemas—nor should they be.

To address the gap, we implemented an XSLT-driven process that converts the DITA material that was sent into the PDF files into a fresh "content.xml" file. This lone content.xml file is then zipped up into an otherwise blank ODT file and is made available for the content owners as part of the nightly build process. The content owners then are able to easily make edits to their content and, thanks to Writer's change-tracking feature, the tech editors can locate these changes, finesse the grammar and wording, and incorporate the changes back into the DITA XML files. I will admit right now that this particular XSLT was very complex to write—it involved step-debugging a highly recursive transform—and does not fully support the look and feel of the material as seen in the PDF file. But, for in-house editing purposes, this isn't strictly relevant: the content owners can review and approve the finalized appearance from the same PDF files that are delivered to customers.

Alternatively, you may have a process that you need to take in the other direction. Another use case involves creating a true "template" in Writer (an "ott" file) that can be given to a working group as a rather fancy fill-in form. The material in the resulting content.xml file can be scanned for (either by direct location, or by some applied style name, or via other mechanisms) and converted out to some other format. Consider the possibility of turning a "requirements" Writer document into a working skeleton for some test cases.

Closing Remarks

Hopefully, I have shown you just enough of OpenOffice.org's Writer file format to open up some possibilities for you to use it in new ways. By taking a blank document or document template, you can edit or replace the body section of the embedded content.xml file. By taking an existing document, you can find the content and transform it for other purposes. As the material inside the ODT file is readable text and reasonably well-structured XML, it is wide open to full, external, programmatic assault. Being able to take control of your own documents in this fashion is nothing short of powerful.

About the Author

Rob Lybarger works in a small IT shop in the greater Houston, TX area. Among other duties related to Ant and Java, he has written XSLT mentioned in the Real-World Applications section of this article and also performed various in-house customizations of the stock DITA processing and formatting stylesheets. At home, Rob enjoys spending time with his four month old daughter and being a highly satisfied owner of aa Mac computer.

Author: Rob Lybarger

Read article at Internet.com site

Featured Local Company

Fiorano Software, Inc.

408-354-3210
718 University Avenue, Suite 212
Los Gatos, CA
http://www.fiorano.com?src=zdnetbsd

Related Articles
- ZsCompare Software Aptos CA
ZsCompare has been designed to allow you to compare and synchronize directories and the files within them, the content of files, and snippets of text that can come from virtually any source. You can run comparisons on local directories, networked computers, or removeable media such as CDs, DVDs, and flash drives.
- Tips on Installing Windows NT Aptos CA
- CascadePoint Archive Software Aptos CA
- Zoho Writer Aptos CA
- OpenOffice Application Freeware Aptos CA
- FTP OutBox Aptos CA
- Ipswitch Windows File Transfer Aptos CA
- S-Databaser Aptos CA
- Creating Excel Files with Python and Django Aptos CA
Related Articles
- ZsCompare Software Aptos CA
ZsCompare has been designed to allow you to compare and synchronize directories and the files within them, the content of files, and snippets of text that can come from virtually any source. You can run comparisons on local directories, networked computers, or removeable media such as CDs, DVDs, and flash drives.
- Tips on Installing Windows NT Aptos CA
- CascadePoint Archive Software Aptos CA
- Zoho Writer Aptos CA
- OpenOffice Application Freeware Aptos CA
- FTP OutBox Aptos CA
- Ipswitch Windows File Transfer Aptos CA
- S-Databaser Aptos CA
- Creating Excel Files with Python and Django Aptos CA
Related Local Events
OMG Technical Meeting
Dates: 12/8/2008 - 12/12/2008
Location: Hyatt Regency
Santa Clara CA
View Details

SPIE's Photonics West 2009
Dates: 1/24/2009 - 1/29/2009
Location: San Jose Convention Center
San Jose CA
View Details

SPIE Advanced Lithography (Formerly Microlithography)
Dates: 2/22/2009 - 2/27/2009
Location: San Jose Convention Center
San Jose CA
View Details

Design and Verification Conference & Exhibition (DVCon)DVCon is the premier conference on the usage of HDLs and HVLs?y.networks in a new, instructor led seminar
Dates: 2/24/2009 - 2/26/2009
Location: DoubleTree Hotel San Jose
San Jose CA
View Details

Software Development Conference & Expo
Dates: 3/9/2009 - 3/13/2009
Location:
Santa Clara CA
View Details
Rate Article
     
Articles Insider

Rss   Delicious   Digg   Add To My Yahoo   Add To My Google   Bookmark   Search Plugin

Topics:
Advertising Engineering Home Services Retail & Consumer Services
Business Services Entertainment Industrial Goods & Services Software
Career Family Insurance Technology
Cars Financial Services Internet Telecommunications
Computer Hardware Food & Beverage Legal Transportation & Logistics
Construction Health Pets Travel
Education Home Electronics Real Estate Wedding