Open-source News

How ODT files are structured

opensource.com - Mon, 08/15/2022 - 15:00
How ODT files are structured Jim Hall Mon, 08/15/2022 - 03:00 Register or Login to like Register or Login to like

Word processing files used to be closed, proprietary formats. In some older word processors, the document file was essentially a memory dump from the word processor. While this made for faster loading of the document into the word processor, it also made the document file format an opaque mess.

Around 2005, the Organization for the Advancement of Structured Information Standards (OASIS) group defined an open format for office documents of all types, the Open Document Format for Office Applications (ODF). You may also see ODF referred to as simply "OpenDocument Format" because it is an open standard based on the OpenOffice.org's XML file specification. ODF includes several file types, including ODT for OpenDocument Text documents. There's a lot to explore in an ODT file, and it starts with a zip file.

More Linux resources Linux commands cheat sheet Advanced Linux commands cheat sheet Free online course: RHEL technical overview Linux networking cheat sheet SELinux cheat sheet Linux common commands cheat sheet What are Linux containers? Our latest Linux articles Zip structure

Like all ODF files, ODT is actually an XML document and other files wrapped in a zip file container. Using zip means files take less room on disk, but it also means you can use standard zip tools to examine an ODF file.

I have an article about IT leadership called "Nibbled to death by ducks" that I saved as an ODT file. Since this is an ODF file, which is a zip file container, you can use unzip from the command line to examine it:

$ unzip -l 'Nibbled to death by ducks.odt'
Archive: Nibbled to death by ducks.odt
Length Date Time Name
39 07-15-2022 22:18 mimetype
12713 07-15-2022 22:18 Thumbnails/thumbnail.png
915001 07-15-2022 22:18 Pictures/10000201000004500000026DBF6636B0B9352031.png
10879 07-15-2022 22:18 content.xml
20048 07-15-2022 22:18 styles.xml
9576 07-15-2022 22:18 settings.xml
757 07-15-2022 22:18 meta.xml
260 07-15-2022 22:18 manifest.rdf
0 07-15-2022 22:18 Configurations2/accelerator/
0 07-15-2022 22:18 Configurations2/toolpanel/
0 07-15-2022 22:18 Configurations2/statusbar/
0 07-15-2022 22:18 Configurations2/progressbar/
0 07-15-2022 22:18 Configurations2/toolbar/
0 07-15-2022 22:18 Configurations2/popupmenu/
0 07-15-2022 22:18 Configurations2/floater/
0 07-15-2022 22:18 Configurations2/menubar/
1192 07-15-2022 22:18 META-INF/manifest.xml
970465 17 files

I want to highlight a few elements of the zip file structure:

  1. The mimetype file contains a single line that defines the ODF document. Programs that process ODT files, such as a word processor, can use this file to verify the MIME type of the document. For an ODT file, this should always be:
application/vnd.oasis.opendocument.text
  1. The META-INF directory has a single manifest.xml file in it. This file contains all the information about where to find other components of the ODT file. Any program that reads ODT files starts with this file to locate everything else. For example, the manifest.xml file for my ODT document contains this line that defines where to find the main content:
<manifest:file-entry manifest:full-path="content.xml" manifest:media-type="text/xml"/>
  1. The content.xml file contains the actual content of the document.

  2. My document includes a single screenshot, which is contained in the Pictures directory.

Extracting files from an ODT file

Because the ODT document is just a zip file with a specific structure to it, you can extract files from it. You can start by unzipping the entire ODT file, such as with this unzip command:

$ unzip -q 'Nibbled to death by ducks.odt' -d Nibbled

A colleague recently asked for a copy of the image that I included in my article. I was able to locate the exact location of any embedded image by looking in the META-INF/manifest.xml file. The grep command can display any lines that describe an image:

$ cd Nibbled
$ grep image META-INF/manifest.xml
<manifest:file-entry manifest:full-path="Thumbnails/thumbnail.png" manifest:media-type="image/png"/>
<manifest:file-entry manifest:full-path="Pictures/10000201000004500000026DBF6636B0B9352031.png" manifest:media-type=" image/png”/>

The image I'm looking for is saved in the Pictures folder. You can verify that by listing the contents of the directory:

$ ls -F
Configurations2/ manifest.rdf meta.xml Pictures/ styles.xml
content.xml META-INF/ mimetype settings.xml Thumbnails/

And here it is:

Image by:

(Jim Hall, CC BY-SA 40)

OpenDocument Format

OpenDocument Format (ODF) files are an open file format that can describe word processing files (ODT), spreadsheet files (ODS), presentations (ODP), and other file types. Because ODF files are based on open standards, you can use other tools to examine them and even extract data from them. You just need to know where to start. All ODF files start with the META-INF/manifest.xml file, which is the "root" or "bootstrap" file for the rest of the ODF file format. Once you know where to look, you can find the rest of the content.

Because OpenDocument Format (ODF) are based on open standards, you can use other tools to examine them and even extract data from them. You just need to know where to start.

Image by:

Jonas Leupe on Unsplash

Linux Documentation What to read next How I use the Linux fmt command to format text How I use the Linux sed command to automate file edits Old-school technical writing with groff Create beautiful PDFs in LaTeX A gentle introduction to HTML Writing project documentation in HTML Level up your HTML document with CSS This work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License. Register or Login to post a comment.

Linux 6.0-rc1 Released With Exciting Performance Optimizations, New Hardware Support

Phoronix - Mon, 08/15/2022 - 07:30
After the two week long merge window, Linus Torvalds this afternoon released the first release candidate of Linux 6.0. Over the next roughly two months the Linux 6.0 kernel will stabilize but already from my early testing on various systems it is in nice shape and the features and performance are looking great...

Intel Xeon Platinum 8380 Performance Is Looking Great For Linux 6.0

Phoronix - Sun, 08/14/2022 - 18:16
Not only is the AMD EPYC performance looking real good for Linux 6.0, but many of the scheduler changes and common kernel improvements also carry over well for Intel's Xeon Platinum 8380 "Ice Lake" server processors too. For your viewing pleasure this weekend are some initial benchmarks looking at Linux 5.19 stable compared to Linux 6.0 Git as we approach the end of the merge window...

Ubuntu Linux Preparing systemd-hwe To Ease OEM Hardware Enablement

Phoronix - Sun, 08/14/2022 - 17:35
Being prepared for Ubuntu 22.10 and presumably will be back-ported in future Ubuntu 22.04 LTS point releases is the systemd-hwe package to more easily deal with updated hardware rules as part of new device enablement...

Wine-Staging 7.15 Released - Currently At 536 Patches Atop Upstream Wine

Phoronix - Sun, 08/14/2022 - 17:10
Following yesterday's belated release of Wine 7.15, Wine-Staging 7.15 is now available that continues to carry hundreds of extra testing/experimental patches atop upstream Wine for bug fixes and other features to empower Windows games and applications on Linux...

Pages