Using the Mako Core SDK to modify documents in Microsoft’s Universal Print

Over the past year, Microsoft has been working hard to bring its new Cloud printing service, Universal Print, to general availability.

As a part of Universal Print, developers get access to a set of Graph APIs that allows analysis and modification of print job payload data. This feature enables a few different scenarios, including adding security (e.g. redactions or watermarks) to a Universal Print-based workflow.

As a curious engineer, I wanted to see how different it would be for an independent software vendor (ISV) to use our Mako™ Core SDK to modify a print job flowing through Universal Print, instead of using a more traditional route of using a virtual printer driver.

Thinking about the workflow a little more, I came up with the following design:

Using the Mako SDK to modify documents in Universal Print.
Using the Mako SDK to modify documents in Universal Print.

In the design above, we can see the end-user’s Word document gets printed to a virtual printer. This allows the ISV to be notified of the job, and modify it accordingly using Mako. Once modified, the ISV then redirects the job on to the physical printer for printing.

There’s a couple of nice things about this design:

Firstly, it uses the Graph API to access Universal Print, which is an easy-to-use and well documented REST API. Secondly, since the functionality is accessed via a REST API, it allows our ISV service to be written in whichever Mako supported language we like.

I chose C# to make best use of the C# Graph API SDK.

Developing the service

There are five main steps to developing the service:

  1. Handle print job notifications
  2. Download the print job payload
  3. Modify the payload
  4. Upload the payload
  5. Redirect to the target printer

Handle print job notifications

To be notified of print jobs in Universal print, you can use the Graph’s change notifications. These will allow you to sign up to a notification, which will call a provided webhook.

Download the print job payload

Once we have notification that a print job has been sent to our virtual printer, we can start downloading its payload.

Here we use the appropriate Graph APIs, along with standard Graph authentication to access the print job’s document. We then simply save it to disk.

Modify the payload

Once we have the document on disk (although Mako can also modify streams too!), we can open the document and modify it using Mako’s document object model (DOM).

Alternatively, Mako can also convert from one page description language (PDL) to another. This is useful in situations where your destination printer doesn’t support the input PDL.

Upload the payload

Uploading the modified document is straightforward. This time we use the Graph API to create an upload session, and use the WebClient class to put the document back into the original print job.

Redirect to the target printer

And finally, after the print job has been updated, we can redirect it onto another printer. This redirection also automatically completes the print job and task.

Alternatively, if we want to be a little more green, we could always send the document to OneDrive, Sharepoint, or another document management system. After doing so, you then complete the print job and its associated task.

See it in action

We actually coded this demo live in our last Mako webinar, showing an implementation where an ISV wants to automatically redact content.

Access the code directly at our GitHub repository or watch the webinar recording below:

Try it out

We’re keen to talk to you about your Universal Print project and see how we can help. Contact us here.

For more information about Mako, visit globalgraphics.com/mako.

About the author

Andy Cardy, Principal Engineer at Global Graphics Software
Andy Cardy, Principal Engineer at Global Graphics Software

Further reading:

Improving PDF accessibility with Structure Tagging

Be the first to receive our news updates and product news. Why not subscribe to our monthly newsletter? Subscribe here

Follow us on LinkedIn and Twitter

Improving PDF accessibility with Structure Tagging

In this week’s post, Global Graphics Software’s principal engineer, Andrew Cardy, explores the structure tagging API in the Mako™ Core SDK. This feature is particularly valuable as it allows developers to create PDFs that can be read by screen readers, such as Jaws®. This helps blind or partially sighted users unlock the content of a PDF.  Here, Andy explains how to use the structure tagging API in Mako to tag both text and images:

What can we Structure Tag?

Before I begin, let’s talk about PDF: PDF is a fixed-format document. This means you can create it once, and it should (aside from font embedding or rendering issues) look identical across machines. This is obviously a great thing for ensuring your document looks great on your user’s devices, but the downside is that some PDF generators can create fixed content that is ordered in a way that is hard for screen readers to understand.

Luckily Mako also has an API for page layout analysis. This API will analyze the structure of the PDF, and using various heuristics and techniques, will group the text on the page together in horizontal runs and vertical columns. It’ll then assign a page reading order.

The structure tagging API makes it easy to take the layout analysis of the page and use it to tag and structure the text. So, while we’re tagging the images, we’ll tag the text too!

Mako’s Structure Tagging API

Mako’s structure tagging API is simple to use. Our architect has done a great job of taking the complicated PDF specification and distilling it down to a number of useful APIs.

Let’s take a look at how we use them to structure a document from start to finish:

Setting the Structure Root

Setting the root structure is straight forward. Firstly, we create an instance of IStructure and set it in the document.

Next we create an instance of a Document level IStructureElement and add that to the structure element we’ve just created.

One thing that I learnt the hard way, is that Acrobat will not allow child structures to be read by a screen reader if their parent has alternative (alt) text set.

Add alternate text only to tags that don’t have child tags. Adding alternate text to a parent tag prevents a screen reader from reading any of that tag’s child tags. (Adobe Acrobat help)

Originally, when I started this research project, I had alt text set at the document level, which caused all sorts of confusion when my text and image alt text wasn’t read!

Using the Layout Analysis API

Now that we’ve structured the document, it’s time to structure the text. Firstly, we want to understand the layout of the page. To do this, we use IPageLayout. We give it a reference to the page we want to analyze, then perform the analysis on it.

Now the page has been analyzed, it’s easy to iterate through the columns and nodes in the page layout data.

Tagging the text

Once we’ve found our text runs, we can tag our text with a span IStructureElement. We append this new structure element to the parent paragraph created while we were iterating over the columns.

We also tag the original source Mako DOM node against the new span element.

Tagging the images

Once the text is structured, we can structure the images too.

Earlier, I used Microsoft’s Vision API to take the images in the document and give us a textual description of them. We can now take this textual description and add it to a figure IStructureElement.

Again, we make sure we tag the new figure structure element against the original source Mako DOM image.

Notifying Readers of the Structure Tags

The last thing we need to do is set some metadata in the document’s assembly, this is straight forward enough. Setting this metadata helps viewers to identify that this document is structure tagged.

Putting it all Together

So, after we’ve automated all of that, we now get a nice structure, which, on the whole, flows well and reads well.

We can see this structure in Acrobat DC:

And if we take a look at one of the images, we can see our figure structure now has some alternative text, generated by Microsoft’s Vision API. The alt text will be read by screen readers.

Figure properties dialogue
Figure properties dialogue

It’s not perfect, but then taking a look at how Adobe handles text selection quite nicely illustrates just how hard it is to get it right. In the image below, I’ve attempted to select the whole of the title text in Acrobat.

Layout analysis is hard to get right!

In comparison, our page layout analysis seems to have gotten these particular text runs spot on. But how does it fair with the Jaws screen reader? Let’s see it in action!

Struture tagging with Mako SDK

So, it does a pretty good job. The images have captions automatically generated, there is a sense of flow and most of the content reads in the correct order. Not bad.

Printing accessible PDFs

You may be aware that the Mako SDK comes with a sample virtual printer driver that can print to PDF. I want to take this one step further and add our accessibility structure tagging tool to the printer driver. This way, we could print from any application, and the output will be accessible PDF!

In the video below I’ve found an interesting blog post that I want to save and read offline. If I were partially sighted, it may be somewhat problematic as the PDF printer in Windows 10 doesn’t provide structure tagging, meaning that the PDF I create may not work so well with my combination of PDF reader and screen reader. However, if I throw in my Mako-based structure and image tagger, we’ll see if it can help!

Structure tagging video

Of course, your mileage will vary and the quality of the tagging will depend on the quality and complexity of the source document. The thing is, structural analysis is a hard problem, made harder sometimes by poorly behaving generators, but that’s another topic in itself. Until all PDF files are created perfectly, we’ll do the best we can!

Want to give it a go?

Please do get in touch if you’re interested in having a play with the technology, or just want to chat about it.

Andy Cardy, Principal Engineer at Global Graphics Software
Andy Cardy, Principal Engineer at Global Graphics Software

Andy Cardy is a Principal Engineer for Global Graphics Software and a Developer Advocate for the Mako SDK.

Find out more about Mako’s features in Andy’s coding demo:

SHARPEN THE SAW – A LIVE CODING DEMO USING MAKO™

Sharpen the Saw: Mako SDK demo

In this session Andy uses coding in C++ and C# to show you three complex tasks that you can easily achieve with Mako:
• PDF rendering – visualizing PDF for screen and print (15 mins)
• Using Mako in Cloud-ready frameworks (15 mins)
• Analyzing and editing with the Mako Document Object Model (15 mins)

To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here

Follow us on LinkedIn and Twitter

What’s the difference between PDF/X-1a and PDF/X-4?

PDFX-1 PDFX-4

Which PDF/X should I use?

Somebody asked me recently what the difference is between PDF/X-1a (first published in 2001) and PDF/X-4 (published in 2010). I thought this might also be interesting to a wider audience.

Both are ISO standards that deliberately restrict some aspects of what you can put into a PDF file in order to make them more reliable for delivery of jobs for professional print. But the two standards address different needs/desires:

PDF/X-1a content must all have been transformed into CMYK (optionally plus spots) already, so it puts all of the responsibility for correct separation and transparency handling onto the creation side. When it hits Harlequin, all the RIP can do is to lock in the correct overprint settings and (optionally) pre-flight the intended print output condition, as encapsulated in the output intent.

On the other hand, PDF/X-4 supports quite a few things that PDF/X-1a does not, including:

  • Device-independent color spaces
  • Live PDF transparency
  • Optional content (layers)

That moves a lot more of the responsibility downstream into the RIP, because it can carry unseparated colors and transparency.

Back when the earlier PDF/X standards were designed transparency handling was a bit inconsistent between RIPs, and color management was an inaccessible black art to many print service providers, which is why PDF/X-1a was popular with many printers. That’s not been the case for a decade now, so PDF/X-4 will work just fine.

In other words, the choice is more down to where the participants in the exchange want the responsibility to sit than to anything technical any more.

In addition, PDF/X-4 is much more easily transitioned between different presses, and even between completely different print technologies, such as moving a job from offset or flexo to a digital press. And it can also be used much more easily for digital delivery alongside using it for print. For many people that’s enough to push the balance firmly in favour of PDF/X-4.

For further reading about PDF documents and standards:

  1. Full Speed Ahead: How to make variable data PDF files that won’t slow your digital press
  2. PDF Processing Steps – the next evolution in handling technical marks

About the author

Martin Bailey, CTO, Global Graphics Software
Martin Bailey, CTO, Global Graphics Software

Martin Bailey is Global Graphics’ Chief Technology Officer. He’s currently the primary UK expert to the ISO committees maintaining and developing PDF and PDF/VT and is the author of Full Speed Ahead: how to make variable data PDF files that won’t slow your digital press, a new guide offering advice to anyone with  a stake in variable data printing including graphic designers, print buyers, composition developers and users.

Be the first to receive our news updates and product news. Why not subscribe to our monthly newsletter? Subscribe here

Follow us on LinkedIn and Twitter

Why does optimization of VDP jobs matter?

Would you fill your brand-new Ferrari with cheap and inferior fuel? It’s a question posed by Martin Bailey in his new guide: ‘Full Speed Ahead – how to make variable data PDF files that won’t slow your digital press’. It’s an analogy he uses to explain the importance of putting well-constructed PDF files through your DFE so that they don’t disrupt the printing process and the DFE runs as efficiently as possible. 

Here are Martin’s recommendations to help you avoid making jobs that delay the printing process, so you can be assured that you’ll meet your print deadline reliably and achieve your printing goals effectively:

If you’re printing work that doesn’t make use of variable data on a digital press, you’re probably producing short runs. If you weren’t, you’d be more likely to choose an offset or flexo press instead. But “short runs” very rarely means a single copy.

Let’s assume that you’re printing, for example, 50 copies of a series of booklets, or of an imposed form of labels. In this case the DFE on your digital press only needs to RIP each PDF page once.

To continue the example, let’s assume that you’re printing on a press that can produce 100 pages per minute (or the equivalent area for labels etc.). If all your jobs are 50 copies long, you therefore need to RIP jobs at only two pages per minute (100ppm/50 copies). Once a job is fully RIPped and the copies are running on press you have plenty of time to get the next job prepared before the current one clears the press.

But VDP jobs place additional demands on the processing power available in a DFE because most pages are different to every other page and must therefore each be RIPped separately. If you’re printing at 100 pages per minute the DFE must RIP at 100 pages per minute; fifty times faster than it needed to process for fifty copies of a static job.

Each minor inefficiency in a VDP job will often only add between a few milliseconds and a second or two to the processing of each page, but those times need to be multiplied up by the number of pages in the job. An individual delay of half a second on every page of a 10,000-page job adds up to around an hour and a half for the whole job. For a really big job of a million pages it only takes an extra tenth of a second per page to add 24 hours to the total processing time.

If you’re printing at 120ppm the DFE must process each page in an average of half a second or less to keep up with the press. The fastest continuous feed inkjet presses at the time of writing are capable of printing an area equivalent to over 13,000 pages per minute, which means each page must be processed in just over 4ms. It doesn’t take much of a slow-down to start impacting throughput.

If you’re involved in this kind of calculation you may find the digital press data rate calculator useful: Download the data rate calculator

Global Graphics Software’s digital press data rate calculator.
Global Graphics Software’s digital press data rate calculator.

This extra load has led DFE builders to develop a variety of optimizations. Most of these work by reducing the amount of data that must be RIPped. But even with those optimizations a complex VDP job typically requires significantly more processing power than a ‘static’ job where every copy is the same.

The amount of processing required to prepare a PDF file for print in a DFE can vary hugely without affecting the visual appearance of the printed result, depending on how it is constructed.

Poorly constructed PDF files can therefore impact a print service provider in one or both of two ways:

  • Output is not achieved at engine speed, reducing return on investment (ROI) because fewer jobs can be produced per shift. In extreme cases when printing on a continuous feed (web-fed) press a failure to deliver rasters for printing fast enough can also lead to media wastage and may confuse in-line or near-line finishing.
  • In order to compensate for jobs that take longer to process in the DFE, press vendors often provide more hardware to expand the processing capability, increasing the bill of materials, and therefore the capital cost of the DFE.

Once the press is installed and running the production manager will usually calculate and tune their understanding of how many jobs of what type can be printed in a shift. Customer services representatives work to ensure that customer expectations are set appropriately, and the company falls into a regular pattern. Most jobs are quoted on an acceptable turn-round time and delivered on schedule.

Depending on how many presses the print site has, and how they are connected to one or more DFEs this may lead to a press sitting idle, waiting for pages to print. It may also delay other jobs in the queue or mean that they must be moved to a different press. Moving jobs at the last minute may not be easy if the presses available are not identical. Different presses may require different print streams or imposition and there may be limitations on stock availability, etc.

Many jobs have tight deadlines on delivery schedules; they may need to be ready for a specific time, with penalties for late delivery, or the potential for reduced return for the marketing department behind a direct mail campaign. Brand owners may be ordering labels or cartons on a just in time (JIT) plan, and there may be consequences for late delivery ranging from an annoyed customer to penalty clauses being invoked.

Those problems for the print service provider percolate upstream to brand owners and other groups commissioning digital print. Producing an inefficiently constructed PDF file will increase the risk that your job will not be delivered by the expected time.

You shouldn’t take these recommendations as suggesting that the DFE on any press is inadequate. Think of it as the equivalent of a suggestion that you should not fill your brand-new Ferrari with cheap and inferior fuel!

 

Full Speed Ahead: how to make variable data PDF files that won't slow your digital press edited by Global Graphics Software

The above is an excerpt from Full Speed Ahead: how to make variable data PDF files that won’t slow your digital press. The guide is designed to help you avoid making jobs that disrupt and delay the printing process, increasing the probability of everyone involved in delivering the printed piece; hitting their deadlines reliably and achieving their goals effectively.

DOWNLOAD THE FREE FULL GUIDE HERE: https://bit.ly/fsa-pdf

To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here

About the author:

Martin Bailey, CTO, Global Graphics Software
Martin Bailey, CTO, Global Graphics Software

Martin Bailey first joined what has now become Global Graphics Software in the early nineties, and has worked in customer support, development and product management for the Harlequin RIP as well as becoming the company’s Chief Technology Officer. During that time he’s also been actively involved in a number of print-related standards activities, including chairing CIP4, CGATS and the ISO PDF/X committee. He’s currently the primary UK expert to the ISO committees maintaining and developing PDF and PDF/VT.

 

To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here

Follow us on LinkedIn and Twitter

Full Speed Ahead: How to make variable data PDF files that won’t slow your digital press

The use of variable data has increased exponentially over the past five years and is emerging in new applications such as industrial inkjet. Yet poorly designed variable data PDF files disrupt production and reduce ROI.

Watch the recent webinar with Global Graphics Software’s CTO Martin Bailey, the author of Full Speed Ahead, a new guide to offer advice to anyone with a stake in variable data printing, including graphic designers, print buyers, production managers, press operators, composition tool developers and users.

In the webinar Martin presents an overview of the guide and highlights some of the key tips and tricks for graphic designers, prepress and print service providers, showing how, when they all work together, VDP jobs can fly through digital presses.

Sponsored by Delphax Solutions, Digimarc, HP Indigo, HP PageWide Industrial, HYBRID Software, Kodak, Racami and WhatTheyThink, the guide is a practical format for easy reference and includes:

• Tips and tricks for making fast, efficient PDF files for variable data printing
• Helpful illustrations, photos and explanatory diagrams
• Real examples from industry

You can download your copy here: https://www.globalgraphics.com/full-speed-ahead

For further reading about PDF documents and standards:

  1. PDF Processing Steps – the next evolution in handling technical marks

To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here

Follow us on LinkedIn and Twitter

Mako™ 5.0 offers a wealth of new features

We’ve recently released Mako™ 5.0, the latest edition of Global Graphics Software’s digital document SDK. Mako 5.0 earns its major version increment with an upgrade to its internal RIP, new features and a reworked API to simplify implementation. Much requested by Mako customers, Mako 5.0 is the first version to preview C# as a coding alternative to C++ and opens the possibility to support other programming languages in future versions.

Mako 5.0 enables PostScript® (including EPS) files to be read directly, extending the PDL (page description language) support in Mako that already includes PDF, XPS, PCL5 and PCL/XL. Mako can read and write all these PDLs, enabling bi-directional conversion between any of these formats.

With the update of Mako’s internal RIP has come new EDS (error diffusion screens) using algorithms such as Floyd-Steinberg and Stucki. All the screening parameters are exposed via this API, and to help define them, a Windows-based desktop tool can be downloaded from the Mako documentation site. Start with settings that match the popular algorithms and preview the monochrome or color result of your settings tweaks. Then use the settings you have chosen via a button that generates the C++ you need to paste into your code.

Mako 5.0 offers several new APIs that extend its reach into the internals of PDF. For example, it’s now possible to edit property values attached to form and image XObjects. Why is this useful? In PDF, developers can put extra key-value pairs into PDF XObject dictionaries. This is often used to store in application-specific data, as well as for things like variable data tags. This development has led to a more generalized approach to examining and modifying hard-to-reach PDF objects. As ever, well-commented sample code is provided to show exactly how the new APIs work and could be applied in your application.

Finally, we took the opportunity with Mako 5.0 to make changes aimed at making the APIs more consistent in their naming, behavior or return types. Developers new to Mako will be unaware of these changes, but existing code written for Mako 4.x may require minor refactoring to work with Mako 5.0. Our support engineers are ready to assist Mako customers with any questions they have.

For more information contact David Stevenson: david.stevenson@globalgraphics.com

To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here

Mako™ – the print developer’s Swiss Army knife

Mako - the Swiss Army knife of SDKs!
Mako – the print developer’s Swiss Army knife.

Working with a Mako customer recently, I showed him how to code a utility to extract data from a stack of PDF invoices to populate a spreadsheet. I suppose you could describe it as reverse database publishing. This customer had originally licensed Mako to convert XPS to PDF, and later used it to generate CMYK bitmaps of the pages, i.e. using it as a RIP (raster image processor).

With this additional application of Mako, the customer observed that Mako was “like a Swiss Army knife” as it offered so many tools in one – converting, rendering, extracting, combining and processing, of pages and the components that made them up. And doing it not just for PDF but for XPS, PCL and PostScript® too. His description struck a chord with me as it seemed very appropriate. Mako does indeed offer a wide range of capabilities for processing print job formats. It’s not the fastest or feature-richest of the RIPs from Global Graphics Software – that would be Harlequin®. Or the most sophisticated and performant of screening tools – that would be ScreenPro™. But Mako can do both of those things very competently, and much more besides.

For example, we have used Mako to create a Windows desktop app to edit a PDF in ways relevant to production print workflows, such as changing spot colors or converting them to process colors. All the viewing and editing operations are implemented with Mako API calls. That fact alone emphasizes the wide range of applications to which Mako can be put, and I think, fully justifying that “Swiss Army knife” moniker.

For more information visit: www.globalgraphics.com/mako

Unlocking document potential

Using Mako to pre-process PDFs for print workflows follows quite naturally. With its built-in RIP, Mako has exceptional capability to deal with fonts, color, transparency and graphic complexity to suit the most demanding of production requirements.

What is less obvious is Mako’s value to enterprise print management (EPM). Complementing Mako’s support for PDF and XPS is the ability to convert from (and to) PCL5 and PCL/XL. Besides conversion, Mako can also render such documents, for example to create a thumbnail of a PCL job so that a user can more easily identify the correct document to print or move it to the next stage in a managed process. Mako’s document object model (DOM) architecture allows content to be extracted for record-keeping purposes or be added to – a watermark or barcode, for example.

Document Object Model to access the raw building blocks of documents.

The ability to look inside a document, irrespective of the format of the original, has brought Mako to the attention of electronic document and records management system (EDRMS) vendors, seeking to add value to their data extraction, search and categorization processes. Being able to treat different formats of document in the same way simplifies development and improves process efficiency.

Mako’s ability to analyse page layout and extract text in the correct reading order, or to interpret and update document metadata, is a valuable tool to developers of EDRMS solutions. In the face of GDPR (General Data Protection Regulation) and sector-specific regulations, the need for such solutions is clear. And as many of those documents are destined to be printed at some point in their lifecycle, they exist as discrete, paginated digital documents for which Mako is the key to unlocking their business value.

If you would like to discuss this or any aspect of Mako. Please email justin.bailey@globalgraphics.com

Convert from PDF & XPS formats with Mako™

Product manager David Stevenson provides an update on the latest release of Mako:

We’ve just released Mako version 4.6 and I’m pleased to let you know that new in this release is support for PCL5 input, adding to the PCL/XL support already available. Aimed primarily at the enterprise print market, this capability makes it possible to convert to and from PDF & XPS formats and to render thumbnails for preview purposes.

This latest release will also be of interest to our prepress customers: we’ve  improved overall performance and added new, fast render-to-buffer capability, in monochrome and color.

Finally, there is also new and improved support for PDF-named destinations, document metadata and more.

Contact me to find out more: david.stevenson@globalgraphics.com

David Stevenson
David Stevenson,
Product Manager

 

See you at Labelexpo Europe 2017!

Visit us at Labelexpo Europe 2017 - Stand 9B17

With Labelexpo Europe 2017 less than a week away, it won’t be long before we start to pack the car and head off on the Eurostar to Brussels.

It’s going to be an exciting show for us this year, with some key announcements from our major OEM customers – we’re looking forward to supporting them and demonstrating our technologies.

If you have to create a digital front end but don’t know where to start, come and talk to us. With Fundamentals, we’ll show you how you can bring your digital inkjet label press to market quickly and easily. Our BreakThrough Technical Services team will also be on hand to answer your questions from color management to screening.

We also have some new features for labels and packaging in the Harlequin RIP® to show you, including controls for deciding when to blend emulated spot colors with process colors for exceptionally accurate brand color matching, and extended controls over PDF layers so that optional content used in process control can be individually switched on or off.

We’ll be on Stand 9B17, so please stop by and say hello. If you’d like to book an appointment, simply contact us: sales@globalgraphics.com.  In the meantime, we’ve made this short video to whet your appetite. Enjoy!