This week we announced a groundbreaking update to Mako Core™, our popular software development kit (SDK) for developers specializing in print document processing and manipulation.
Mako now includes a game-changing intelligent Document Object Model (iDOM) that provides unparalleled access to every aspect of a document’s composition, including fonts, images, vector art, layers and metadata. Unlike other SDKs that offer generic support for various Page Description Languages such as PDF, PS, OXPS, and PCL across print and digital document domains, Mako is sharply focused on print.
Recently I was talking to one of our Mako™ partners, Actino Software, who are based in Germany. Actino have successfully built many document solutions for their customers with Mako, and a recent example was for a large German crane manufacturer with subsidiaries worldwide. The design, manufacture and maintenance of cranes requires a lot of documentation, and the company estimates they manage around 100,000 documents, mainly in PDF.
There are numerous challenges to managing such a large corpus of documents, and one of those is the quality of those documents. I’m not talking here about the content, but rather the way in which the PDF was produced. Many different authoring and PDF creation tools are in use, some going back decades, resulting in a wide variation in the way PDFs are constructed. This is where Mako comes in.
For this project, Actino Software built centralized services that analyze and optimize the PDFs and prepare them for secure delivery. Mako can do this quickly, processing 400 MB files with many thousands of pages, fast. The analysis function identifies the number of pages, bookmarks, fonts, embedded attachments and other features, including document metadata (title, subject etc). The optimize function removes invalid hyperlinks, changes named destinations into working hyperlinks, and resaves the file to a reduce file size. In this particular case, Mako reduced the file size by more than 50% thanks to image resampling. Mako’s built-in font optimization (eliminating duplicate fonts and merging of font subsets) is another way it can reduce file size.
This work supports two DRM (Digital Rights Management) workflows:
Distribution via a portable storage device such as thumb drive, supported by a plug-in to Adobe Acrobat Reader that talks back to the company website for authentication.
Download from a secure website, for which small file size is a must.
Actino were able to meet the customer requirements with a rapid development schedule, based on their understanding of document workflows and their familiarity with the Mako Core SDK. Their customer chose the Actino solution over an Adobe DRM solution, which was considered too expensive and rigid.
Last week, my colleague David Stevenson and I ignored all common sense and ran a live coding demo using Mako™. What could have gone wrong!?
Mako is a software development kit that can be used to add a variety of functions into software products, which is why it’s often referred to as the software engineer’s Swiss Army knife.
During the demo we showed how to use Mako to modernize your print infrastructure in three simple ways:
Firstly, we looked at modernization through library consolidation and showed how you can operate on multiple PDLs including PDF, PCL, PostScript® and XPS, all using a single Mako SDK library. We then looked at adopting automated workflows with Mako and demonstrated how to analyze and redact text automatically in a PDF, using Mako’s layout analysis and text search capabilities. Finally, we showed how you can make the most of print infrastructure-as-a-service by integrating Mako with Microsoft’s Universal Print, including modifying and redirecting print jobs.
Thankfully, nothing did go wrong and if you missed it, don’t worry. We recorded everything and you can watch the recording above on demand.
In this week’s post, Global Graphics Software’s principal engineer, Andrew Cardy, explores the structure tagging API in the Mako™ Core SDK. This feature is particularly valuable as it allows developers to create PDFs that can be read by screen readers, such as Jaws®. This helps blind or partially sighted users unlock the content of a PDF. Here, Andy explains how to use the structure tagging API in Mako to tag both text and images:
What can we Structure Tag?
Before I begin, let’s talk about PDF: PDF is a fixed-format document. This means you can create it once, and it should (aside from font embedding or rendering issues) look identical across machines. This is obviously a great thing for ensuring your document looks great on your user’s devices, but the downside is that some PDF generators can create fixed content that is ordered in a way that is hard for screen readers to understand.
Luckily Mako also has an API for page layout analysis. This API will analyze the structure of the PDF, and using various heuristics and techniques, will group the text on the page together in horizontal runs and vertical columns. It’ll then assign a page reading order.
The structure tagging API makes it easy to take the layout analysis of the page and use it to tag and structure the text. So, while we’re tagging the images, we’ll tag the text too!
Mako’s Structure Tagging API
Mako’s structure tagging API is simple to use. Our architect has done a great job of taking the complicated PDF specification and distilling it down to a number of useful APIs.
Let’s take a look at how we use them to structure a document from start to finish:
Setting the Structure Root
Setting the root structure is straight forward. Firstly, we create an instance of IStructure and set it in the document.
Next we create an instance of a Document level IStructureElement and add that to the structure element we’ve just created.
One thing that I learnt the hard way, is that Acrobat will not allow child structures to be read by a screen reader if their parent has alternative (alt) text set.
Add alternate text only to tags that don’t have child tags. Adding alternate text to a parent tag prevents a screen reader from reading any of that tag’s child tags. (Adobe Acrobat help)
Originally, when I started this research project, I had alt text set at the document level, which caused all sorts of confusion when my text and image alt text wasn’t read!
Using the Layout Analysis API
Now that we’ve structured the document, it’s time to structure the text. Firstly, we want to understand the layout of the page. To do this, we use IPageLayout. We give it a reference to the page we want to analyze, then perform the analysis on it.
Now the page has been analyzed, it’s easy to iterate through the columns and nodes in the page layout data.
Tagging the text
Once we’ve found our text runs, we can tag our text with a span IStructureElement. We append this new structure element to the parent paragraph created while we were iterating over the columns.
We also tag the original source Mako DOM node against the new span element.
Tagging the images
Once the text is structured, we can structure the images too.
Earlier, I used Microsoft’s Vision API to take the images in the document and give us a textual description of them. We can now take this textual description and add it to a figure IStructureElement.
Again, we make sure we tag the new figure structure element against the original source Mako DOM image.
Notifying Readers of the Structure Tags
The last thing we need to do is set some metadata in the document’s assembly, this is straight forward enough. Setting this metadata helps viewers to identify that this document is structure tagged.
Putting it all Together
So, after we’ve automated all of that, we now get a nice structure, which, on the whole, flows well and reads well.
We can see this structure in Acrobat DC:
And if we take a look at one of the images, we can see our figure structure now has some alternative text, generated by Microsoft’s Vision API. The alt text will be read by screen readers.
It’s not perfect, but then taking a look at how Adobe handles text selection quite nicely illustrates just how hard it is to get it right. In the image below, I’ve attempted to select the whole of the title text in Acrobat.
In comparison, our page layout analysis seems to have gotten these particular text runs spot on. But how does it fair with the Jaws screen reader? Let’s see it in action!
So, it does a pretty good job. The images have captions automatically generated, there is a sense of flow and most of the content reads in the correct order. Not bad.
Printing accessible PDFs
You may be aware that the Mako SDK comes with a sample virtual printer driver that can print to PDF. I want to take this one step further and add our accessibility structure tagging tool to the printer driver. This way, we could print from any application, and the output will be accessible PDF!
In the video below I’ve found an interesting blog post that I want to save and read offline. If I were partially sighted, it may be somewhat problematic as the PDF printer in Windows 10 doesn’t provide structure tagging, meaning that the PDF I create may not work so well with my combination of PDF reader and screen reader. However, if I throw in my Mako-based structure and image tagger, we’ll see if it can help!
Of course, your mileage will vary and the quality of the tagging will depend on the quality and complexity of the source document. The thing is, structural analysis is a hard problem, made harder sometimes by poorly behaving generators, but that’s another topic in itself. Until all PDF files are created perfectly, we’ll do the best we can!
Want to give it a go?
Please do get in touch if you’re interested in having a play with the technology, or just want to chat about it.
Andy Cardy is a Principal Engineer for Global Graphics Software and a Developer Advocate for the Mako SDK.
Find out more about Mako’s features in Andy’s coding demo:
In this session Andy uses coding in C++ and C# to show you three complex tasks that you can easily achieve with Mako:
• PDF rendering – visualizing PDF for screen and print (15 mins)
• Using Mako in Cloud-ready frameworks (15 mins)
• Analyzing and editing with the Mako Document Object Model (15 mins)
To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here
If you’re into code, then you’ll enjoy watching the recording of our recent webinar, Sharpen the saw: a live coding demo using Mako™.
Mako is a versatile SDK for building fast, scalable solutions for your print workflow. Its unique document object model uses Mako’s C++ and C# APIs to control color, fonts, text, images, vector content, metadata and more, combining precision with performance.
In the session, principal engineer Andy Cardy uses coding in C++ and C# to show you three complex tasks that you can easily achieve with Mako:
Using Mako in Cloud-ready frameworks
Analyzing and editing with the Mako Document Object Model
Look out for more sessions like this over the coming months.