This is the third and final post on the topic of the artificial complexity of the OOXML format. This complexity is the result of careful design aimed at preventing interoperability. Developers have to deal with a veritable “maze” of tags, even for the simplest content. This binds users to the Microsoft ecosystem, providing the first example of standard-based lock-in.
The PPTX case
To demonstrate the difference in complexity between Impress and PowerPoint XML schemas in ODF and OOXML formats, I created a simple eight-slide presentation summarising the most common types: title and subtitle, centred text, bulleted list, table, vector image, photo, colour graphics and video. I created the same file using both software programmes, starting from a basic template without a background to prevent interference with the slide format and, consequently, the XML schema.
This is the PDF file of the presentation (the first seven slides are identical, while the video on the last slide is replaced by a static image):
slideloTo perform the analysis, I duplicated and renamed the two files, replacing the original extension with ‘ZIP’, and then unzipped them to create two folders containing all the files from their respective XML schemas.
The LibreOffice folder is very similar to the one created by Calc and Writer, containing five subfolders – three of which are identical to those of the ODS and ODT files – and five files. manifest.rdf is missing, but all the others are present and have the same characteristics. The Media and Pictures folders were added to contain multimedia content and images. Once again, all the content is located in the content.xml file, while the other files contain instructions for displaying the slides correctly and for displaying elements other than text.
Therefore, despite the diversity of the three applications (Calc, Writer and Impress), we are faced with an extremely consistent internal file structure, as one would expect from a standard that aims to simplify life for users and developers. This consistency is a key benefit and simplifies the reproducibility of the standard format.
The Microsoft 365 folder contains three subfolders and the [Content_Types].xml file, as with the XLSX and DOCX files examined in the last two weeks. One of the folders has a different name, but this relates to the application and does not increase complexity. Opening the ‘[Content_Types].xml’ file provides information about the other files, including those in the subfolders.
In this case, the content is located in eight slide*.xml files (where the asterisk is a sequential number) inside the slide folder within the ppt folder. These folders and files are completely different to those in the XLSX and DOCX files, but information relevant to displaying slides on screen is scattered throughout them. Again, there are no technical reasons for the differences in the XML schemas of the three files other than to make their internal structures more complex. This unnecessary complexity is also reflected in the XML files describing the contents of the presentation.
The ODP file
As with …