MPI Tech will enable Global Graphics Software’s Harlequin Direct™ and Fundamentals™ products to support AFP and IPDS jobs. AFP (Advanced Function Presentation) is the most widely used format for high-speed transactional printing in many industries including finance, insurance, manufacturing, health care and education. IPDS (Intelligent Printer Data Stream) is the print description language (PDL) to print AFP documents online.
MPI Tech offers a range of solutions to process AFPDS and native AFP/IPDS at speeds over 6,000 ipm or convert them into the most popular PDL (PCL, PDF, PDF/A, PS etc) on almost every platform (Windows, AIX, Linux, Solaris, UNIX).
Justin Bailey, managing director of Global Graphics Software commented: “We’re pleased to welcome MPI Tech to our partner network. With a proven technology and know-how for processing AFP or IPDS print jobs, Global Graphics Software turns to MPI Tech as its ‘go-to partner’ when our customers require solutions for these transactional print data-streams.”
MPI Tech has been a licensee of Global Graphics technology for many years, using it for converting to, and processing, PostScript, PDF, and other PDLs.
In this week’s post, Global Graphics Software’s principal engineer, Andrew Cardy, explores the structure tagging API in the Mako™ Core SDK. This feature is particularly valuable as it allows developers to create PDFs that can be read by screen readers, such as Jaws®. This helps blind or partially sighted users unlock the content of a PDF. Here, Andy explains how to use the structure tagging API in Mako to tag both text and images:
What can we Structure Tag?
Before I begin, let’s talk about PDF: PDF is a fixed-format document. This means you can create it once, and it should (aside from font embedding or rendering issues) look identical across machines. This is obviously a great thing for ensuring your document looks great on your user’s devices, but the downside is that some PDF generators can create fixed content that is ordered in a way that is hard for screen readers to understand.
Luckily Mako also has an API for page layout analysis. This API will analyze the structure of the PDF, and using various heuristics and techniques, will group the text on the page together in horizontal runs and vertical columns. It’ll then assign a page reading order.
The structure tagging API makes it easy to take the layout analysis of the page and use it to tag and structure the text. So, while we’re tagging the images, we’ll tag the text too!
Mako’s Structure Tagging API
Mako’s structure tagging API is simple to use. Our architect has done a great job of taking the complicated PDF specification and distilling it down to a number of useful APIs.
Let’s take a look at how we use them to structure a document from start to finish:
Setting the Structure Root
Setting the root structure is straight forward. Firstly, we create an instance of IStructure and set it in the document.
Next we create an instance of a Document level IStructureElement and add that to the structure element we’ve just created.
One thing that I learnt the hard way, is that Acrobat will not allow child structures to be read by a screen reader if their parent has alternative (alt) text set.
Add alternate text only to tags that don’t have child tags. Adding alternate text to a parent tag prevents a screen reader from reading any of that tag’s child tags. (Adobe Acrobat help)
Originally, when I started this research project, I had alt text set at the document level, which caused all sorts of confusion when my text and image alt text wasn’t read!
Using the Layout Analysis API
Now that we’ve structured the document, it’s time to structure the text. Firstly, we want to understand the layout of the page. To do this, we use IPageLayout. We give it a reference to the page we want to analyze, then perform the analysis on it.
Now the page has been analyzed, it’s easy to iterate through the columns and nodes in the page layout data.
Tagging the text
Once we’ve found our text runs, we can tag our text with a span IStructureElement. We append this new structure element to the parent paragraph created while we were iterating over the columns.
We also tag the original source Mako DOM node against the new span element.
Tagging the images
Once the text is structured, we can structure the images too.
Earlier, I used Microsoft’s Vision API to take the images in the document and give us a textual description of them. We can now take this textual description and add it to a figure IStructureElement.
Again, we make sure we tag the new figure structure element against the original source Mako DOM image.
Notifying Readers of the Structure Tags
The last thing we need to do is set some metadata in the document’s assembly, this is straight forward enough. Setting this metadata helps viewers to identify that this document is structure tagged.
Putting it all Together
So, after we’ve automated all of that, we now get a nice structure, which, on the whole, flows well and reads well.
We can see this structure in Acrobat DC:
And if we take a look at one of the images, we can see our figure structure now has some alternative text, generated by Microsoft’s Vision API. The alt text will be read by screen readers.
It’s not perfect, but then taking a look at how Adobe handles text selection quite nicely illustrates just how hard it is to get it right. In the image below, I’ve attempted to select the whole of the title text in Acrobat.
In comparison, our page layout analysis seems to have gotten these particular text runs spot on. But how does it fair with the Jaws screen reader? Let’s see it in action!
So, it does a pretty good job. The images have captions automatically generated, there is a sense of flow and most of the content reads in the correct order. Not bad.
Printing accessible PDFs
You may be aware that the Mako SDK comes with a sample virtual printer driver that can print to PDF. I want to take this one step further and add our accessibility structure tagging tool to the printer driver. This way, we could print from any application, and the output will be accessible PDF!
In the video below I’ve found an interesting blog post that I want to save and read offline. If I were partially sighted, it may be somewhat problematic as the PDF printer in Windows 10 doesn’t provide structure tagging, meaning that the PDF I create may not work so well with my combination of PDF reader and screen reader. However, if I throw in my Mako-based structure and image tagger, we’ll see if it can help!
Of course, your mileage will vary and the quality of the tagging will depend on the quality and complexity of the source document. The thing is, structural analysis is a hard problem, made harder sometimes by poorly behaving generators, but that’s another topic in itself. Until all PDF files are created perfectly, we’ll do the best we can!
Want to give it a go?
Please do get in touch if you’re interested in having a play with the technology, or just want to chat about it.
Andy Cardy is a Principal Engineer for Global Graphics Software and a Developer Advocate for the Mako SDK.
Find out more about Mako’s features in Andy’s coding demo:
In this session Andy uses coding in C++ and C# to show you three complex tasks that you can easily achieve with Mako:
• PDF rendering – visualizing PDF for screen and print (15 mins)
• Using Mako in Cloud-ready frameworks (15 mins)
• Analyzing and editing with the Mako Document Object Model (15 mins)
To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here
There are two completely different forms of variable data handling in the Harlequin RIP®, and I’m sometimes asked why we’ve duplicated functionality like that. The simple answer is that it’s not duplication; they each address very different use cases.
But those use cases are not, as many people then expect, “white paper workflows” vs imprinting, i.e. whether the whole design including both re-used and single-use elements is printed together vs adding variable data on top of a pre-printed substrate. Both Harlequin VariData™ and the “Dynamic overlays” that we added in Harlequin version 12 can address both of those requirements.
Incidentally, I put “white paper workflows” in quotes because that’s what it’s called in the transactional and direct mail spaces … but very similar approaches are used for variable data printing in other sectors, which may not be printing on anything even vaguely resembling paper!
The two use cases revolve around who has the data, when they have it, whether a job should start printing before all the data is available, and whether there are any requirements to restrict access to the data.
When most people in the transactional, direct mail or graphic arts print sectors think about variable data it tends to be in the form of a fully resolved document representing all of the many variations of one of a collection of pages, combining one or more static ‘backgrounds’ with single-use variable data elements, and maybe some re-used elements from which one is selected for each recipient. In other words, each page in the PDF file is meant to be printed as-is, and will be suitable for a single copy. That whole, fully resolved file is then sent to the press. It may be sent from one division of the printing company to the press room, or even from some other company entirely. The same approach is used for some VDP jobs in labels, folding carton, corrugated, signage and some industrial sectors.
This is the model for which optimized PostScript, and then optimized PDF, PDF/VT (and AFP) were designed. It’s a robust workflow that allows for significant amounts of proofing and process control at multiple stages. And it also allows very rich graphical variability. It’s the workflow for which Harlequin VariData was designed, to maximize the throughput of variable data files through the Digital Front End (DFE) and onto the press.
But in some cases the variable data is not available when the job starts printing. Indeed, the print ‘job’ may run for months in situations such as packaging lines or ID card printing. That can be managed by simply sending a whole series of optimized PDF files, each one representing a few thousand or a couple of million instances of the job to be printed. But in some cases that’s simply not convenient or efficient enough.
In other workflows the data to be printed must be selected based on the item to be printed on, and that’s only known at the very last minute … or second … before the item is printed. A rather extreme example of this is in printing ID cards. In some workflows a chip or magnetic strip is programmed first. When the card is to be printed it’s obviously important that the printed information matches the data on the chip or magnetic strip, so the printing unit reads the data from one of those, uses that to select the data to be printed, and prints it … sometimes all in less than a second. In this case you could use a fully resolved optimized PDF file and select the appropriate page from it based on identifying the next product to be printed on; I know there are companies doing exactly that. But it gets cumbersome when the selection time is very short and the number of items to be printed is very large. And you also need to have all of the data available up-front, so a more dynamic solution is better.
In other cases there is a need to ensure that the data to be printed is held completely securely, which usually leads to a demand that there is never a complete set of that data in a standard file format outside of the DFE for the printer itself. ID cards are an example of this use case as well.
Moving away from very quick or secure responses, we’ve been observing an interesting trend in the labels and packaging market as digital presses are used more widely. Printing the graphics of the design itself and adding the kind of data that’s historically been applied using coding and marking are converging. Information like serial numbers, batch numbers, competition QR Codes, even sell & use by dates are being printed at the same time as the main graphics. Add in the growing demands for traceability, for less of a need for warehousing and for more print on demand of a larger number of different versions, and there can be some real benefits in moving all of the print process quite close to the bottling/filling/labelling lines. But it doesn’t make sense to make a million page PDF file just so you can change the batch number every 42 cartons because that’s what fits on a pallet.
These use cases are why we added Dynamic overlays to Harlequin. Locations on the output where marks should be added are specified, along with the type of mark (text, barcodes and images are the most commonly used). For most marks a data source must be specified; by default we support reading from CSV files or automated counters, but an interface to a database can easily be added for specific integrations. And, of course, formatting information such as font, color, barcode symbology etc must be provided.
The ‘overlay’ in “Dynamic overlays” gives away one of the limitations of this approach, in that the variable data added using it must be on top of all the static data. But we normally recommend that you do that for fully resolved VDP submissions using something like optimized PDF anyway because it makes processing much more efficient; there aren’t that many situations where the desired visual appearance requires variable graphics behind static ones. It’s also much less of a constraint that you’d have with imprinting, where you can only knock objects like white text out of a colored fill in the static background if you are using a white ink!
For what it’s worth, Dynamic overlays also work well for imprinting or for cases where you need to print graphics of middling complexity at high quality but where there are no static graphics at all (existing coding & marking systems can handle simple graphics at low to medium quality very well). In other words, there’s no need to have a background to print the variable data as a foreground over.
So now you know why we’ve doubled up on variable data functionality!
Ever wondered what a raster image processor or RIP does? And what does RIPping a file mean? Read on to learn more about the phases of a RIP, the engine at the heart of your Digital Front End (DFE).
The RIP converts text and image data from many file formats including PDF, TIFF™ or JPEG into a format that a printing device such as an inkjet printhead, toner marking engine or laser platesetter can understand. The process of RIPping a job requires several steps to be performed in order, regardless of the page description language (such as PDF) that it’s submitted in. Even image file formats such as TIFF, JPEG or PNG usually need to be RIPped, to convert them into the correct color space, at the right resolution and with the right halftone screening for the press.
Interpreting: The file to be RIPped is read and decoded into an internal database of graphical elements that must be placed on the output. Each may be an image, a character of text (including font, size, color etc), a fill or stroke etc. This database is referred to as a display list.
Compositing: The display list is pre-processed to apply any live transparency that may be in the job. This phase is only required for any graphics in formats that support live transparency, such as PDF; it’s not required for PostScript language jobs or for TIFF and JPEG images because those cannot include live transparency.
Rendering: The display list is processed to convert every graphical element into the appropriate pattern of pixels to form the output raster. The term ‘rendering’ is sometimes used specifically for this part of the overall processing, and sometimes to describe the whole of the RIPing process.
Output: The raster produced by the rendering process is sent to the marking engine in the output device, whether it’s exposing a plate, a drum for marking with toner, an inkjet head or any other technology.
Sometimes this step is completely decoupled from the RIP, perhaps because plate images are stored as TIFF files and then sent to a CTP platesetter later, or because a near-line or off-line RIP is used for a digital press. In other environments the output stage is tightly coupled with rendering, and the output raster is kept in memory instead of writing it to disk to increase speed.
RIPping often includes a number of additional processes; in the Harlequin RIP® for example:
In-RIP imposition is performed during interpretation
Color management (Harlequin ColorPro®) and calibration are applied during interpretation or compositing, depending on configuration and job content
Screening can be applied during rendering. Alternatively it can be done after the Harlequin RIP has delivered unscreened raster data; this is valuable if screening is being applied using Global Graphics’ ScreenPro™ and PrintFlat™ technologies, for example.
A DFE for a high-speed press will typically be using multiple RIPs running in parallel to ensure that they can deliver data fast enough. File formats that can hold multiple pages in a single file, such as PDF, are split so that some pages go to each RIP, load-balancing to ensure that all RIPs are kept busy. For very large presses huge single pages or images may also be split into multiple tiles and those tiles sent to different RIPs to maximize throughput.
To find out more about the Harlequin RIP, download the latest brochure here.
When developing your first or next digital press, the software you use to drive it will be a key factor in its success, both for the data rates and output quality you can achieve. The time it takes to get your press to market based on the engineering effort involved to deliver and integrate that software is also a consideration.
A simple user interface to get you started
The Press Operator Controller (POC) is an example front end or user interface available with Harlequin Direct™ , the software solution that drives printhead electronics at ultra-high data rates while retaining high output quality. The POC provides you with an initial working system, so you’re up and running without any significant in-house software development. We provide you with the source code so that you have the option to update and integrate it as part of your production system.
I have created a short video to show you its main functions:
Ian has over 15 years’ experience in industry as a software engineer focusing on high performance. With a passion for problem-solving, Ian’s role as product manager for the Direct range gives him the opportunity to work with printer OEMs and break down any new technology barriers that may be preventing them from reaching their digital printer’s full potential.
Be the first to receive our news updates and product news. Why not subscribe to our monthly newsletter? Subscribe here
Would you fill your brand-new Ferrari with cheap and inferior fuel? It’s a question posed by Martin Bailey in his new guide: ‘Full Speed Ahead – how to make variable data PDF files that won’t slow your digital press’. It’s an analogy he uses to explain the importance of putting well-constructed PDF files through your DFE so that they don’t disrupt the printing process and the DFE runs as efficiently as possible.
Here are Martin’s recommendations to help you avoid making jobs that delay the printing process, so you can be assured that you’ll meet your print deadline reliably and achieve your printing goals effectively:
If you’re printing work that doesn’t make use of variable data on a digital press, you’re probably producing short runs. If you weren’t, you’d be more likely to choose an offset or flexo press instead. But “short runs” very rarely means a single copy.
Let’s assume that you’re printing, for example, 50 copies of a series of booklets, or of an imposed form of labels. In this case the DFE on your digital press only needs to RIP each PDF page once.
To continue the example, let’s assume that you’re printing on a press that can produce 100 pages per minute (or the equivalent area for labels etc.). If all your jobs are 50 copies long, you therefore need to RIP jobs at only two pages per minute (100ppm/50 copies). Once a job is fully RIPped and the copies are running on press you have plenty of time to get the next job prepared before the current one clears the press.
But VDP jobs place additional demands on the processing power available in a DFE because most pages are different to every other page and must therefore each be RIPped separately. If you’re printing at 100 pages per minute the DFE must RIP at 100 pages per minute; fifty times faster than it needed to process for fifty copies of a static job.
Each minor inefficiency in a VDP job will often only add between a few milliseconds and a second or two to the processing of each page, but those times need to be multiplied up by the number of pages in the job. An individual delay of half a second on every page of a 10,000-page job adds up to around an hour and a half for the whole job. For a really big job of a million pages it only takes an extra tenth of a second per page to add 24 hours to the total processing time.
If you’re printing at 120ppm the DFE must process each page in an average of half a second or less to keep up with the press. The fastest continuous feed inkjet presses at the time of writing are capable of printing an area equivalent to over 13,000 pages per minute, which means each page must be processed in just over 4ms. It doesn’t take much of a slow-down to start impacting throughput.
If you’re involved in this kind of calculation you may find the digital press data rate calculator at https://blog.globalgraphics.com/tag/data-rate/ useful:
This extra load has led DFE builders to develop a variety of optimizations. Most of these work by reducing the amount of data that must be RIPped. But even with those optimizations a complex VDP job typically requires significantly more processing power than a ‘static’ job where every copy is the same.
The amount of processing required to prepare a PDF file for print in a DFE can vary hugely without affecting the visual appearance of the printed result, depending on how it is constructed.
Poorly constructed PDF files can therefore impact a print service provider in one or both of two ways:
Output is not achieved at engine speed, reducing return on investment (ROI) because fewer jobs can be produced per shift. In extreme cases when printing on a continuous feed (web-fed) press a failure to deliver rasters for printing fast enough can also lead to media wastage and may confuse in-line or near-line finishing.
In order to compensate for jobs that take longer to process in the DFE, press vendors often provide more hardware to expand the processing capability, increasing the bill of materials, and therefore the capital cost of the DFE.
Once the press is installed and running the production manager will usually calculate and tune their understanding of how many jobs of what type can be printed in a shift. Customer services representatives work to ensure that customer expectations are set appropriately, and the company falls into a regular pattern. Most jobs are quoted on an acceptable turn-round time and delivered on schedule.
Depending on how many presses the print site has, and how they are connected to one or more DFEs this may lead to a press sitting idle, waiting for pages to print. It may also delay other jobs in the queue or mean that they must be moved to a different press. Moving jobs at the last minute may not be easy if the presses available are not identical. Different presses may require different print streams or imposition and there may be limitations on stock availability, etc.
Many jobs have tight deadlines on delivery schedules; they may need to be ready for a specific time, with penalties for late delivery, or the potential for reduced return for the marketing department behind a direct mail campaign. Brand owners may be ordering labels or cartons on a just in time (JIT) plan, and there may be consequences for late delivery ranging from an annoyed customer to penalty clauses being invoked.
Those problems for the print service provider percolate upstream to brand owners and other groups commissioning digital print. Producing an inefficiently constructed PDF file will increase the risk that your job will not be delivered by the expected time.
You shouldn’t take these recommendations as suggesting that the DFE on any press is inadequate. Think of it as the equivalent of a suggestion that you should not fill your brand-new Ferrari with cheap and inferior fuel!
The above is an excerpt from Full Speed Ahead: how to make variable data PDF files that won’t slow your digital press. The guide is designed to help you avoid making jobs that disrupt and delay the printing process, increasing the probability of everyone involved in delivering the printed piece; hitting their deadlines reliably and achieving their goals effectively.
To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here
About the author:
Martin Bailey first joined what has now become Global Graphics Software in the early nineties, and has worked in customer support, development and product management for the Harlequin RIP as well as becoming the company’s Chief Technology Officer. During that time he’s also been actively involved in a number of print-related standards activities, including chairing CIP4, CGATS and the ISO PDF/X committee. He’s currently the primary UK expert to the ISO committees maintaining and developing PDF and PDF/VT.
To be the first to receive our blog posts, news updates and product news why not subscribe to our monthly newsletter? Subscribe here
When a major print OEM switched from a market-leading RIP technology to the Harlequin RIP®, they achieved a faster development time and performance and quality benchmarks with a reduced bill of materials cost.
The Challenge When a leading print OEM was looking to move to a PDF RIP technology that was easy to integrate and help to achieve quality and performance benchmarks, it contacted Global Graphics Software Partner Network member, Vir Softech. As a RIP replacement service provider, the team at Vir Softech includes experienced engineers, with experts who have worked on all the major RIP technologies and understand the interfaces and functions they offer.
The Solution Vir Softech recommended switching from the existing RIP technology to the Harlequin RIP from Global Graphics Software. Vir Softech had experience of using the Harlequin RIP in a similar project and knew it would meet the print OEM’s requirements. After a period of evaluation, including quality and performance benchmarking, the print OEM chose to use the Harlequin RIP.
Deepak Garg, managing director at Vir Softech explains the process: “The first step towards making the change was to assess and understand the various features and functions offered by the OEM’s print devices.”
After investigating, the team prepared a design document highlighting:
The OEM’s product features that interact with the RIP technology
How these product features are implemented
The various RIP interfaces which are used to implement these features and functions
Deepak continues: “Once the print OEM decided to go ahead, we prepared another document highlighting how to achieve these functions using the Harlequin interfaces. Some functions or features could not be implemented using Harlequin directly, such as special color handling, spot color replacement, extraction of cut data etc., so we contacted Global Graphics Software who was able to provide a design showing how these functions could be implemented using Harlequin. We then prepared a proof-of-concept, or working implementation, which demonstrated how the Harlequin RIP would work with the print OEM’s print devices. With Harlequin, such a prototype can usually be achieved within three to six months.”
The Result Development time was much shorter than usual for such an ambitious undertaking, greatly reducing costs and enabling the print OEM to drive their revenue earlier than originally expected. The print OEM began using the Harlequin RIP, instantly meeting its quality and performance targets.
The print OEM says: “The Harlequin RIP helped us to move to native PDF printing and achieve the performance targets for our printers. Harlequin also helped us to reduce the lead time for getting our products to market while keeping development and maintenance costs low.”
About Vir Softech Vir Softech is a technology start-up with expertise in imaging and computer vision technologies. With a strong focus in the Print & Publishing domain, its team of experienced engineers includes experts in all aspects of imaging and RIP technologies, such as job management, job settings, color management, screening, bands generation and management, VDP and imposition etc.
The team at Vir Softech are experts in configuring RIP technologies for better performance targeted for a specific market segment such as production, commercial, large format and enterprise printing. Some of the areas where Vir Softech can help include low resource environment, implementing OEM-specific unique functions using Harlequin RIP interfaces, making use of OEM ASIC for better performance, making use of OEM hardware accelerators for some of the computer-intensive RIP operations such as color conversion, image transformations, image decoding, rendering etc and achieving PPM target of MFP for ISO test suites.
Using Mako to pre-process PDFs for print workflows follows quite naturally. With its built-in RIP, Mako has exceptional capability to deal with fonts, color, transparency and graphic complexity to suit the most demanding of production requirements.
What is less obvious is Mako’s value to enterprise print management (EPM). Complementing Mako’s support for PDF and XPS is the ability to convert from (and to) PCL5 and PCL/XL. Besides conversion, Mako can also render such documents, for example to create a thumbnail of a PCL job so that a user can more easily identify the correct document to print or move it to the next stage in a managed process. Mako’s document object model (DOM) architecture allows content to be extracted for record-keeping purposes or be added to – a watermark or barcode, for example.
The ability to look inside a document, irrespective of the format of the original, has brought Mako to the attention of electronic document and records management system (EDRMS) vendors, seeking to add value to their data extraction, search and categorization processes. Being able to treat different formats of document in the same way simplifies development and improves process efficiency.
Mako’s ability to analyse page layout and extract text in the correct reading order, or to interpret and update document metadata, is a valuable tool to developers of EDRMS solutions. In the face of GDPR (General Data Protection Regulation) and sector-specific regulations, the need for such solutions is clear. And as many of those documents are destined to be printed at some point in their lifecycle, they exist as discrete, paginated digital documents for which Mako is the key to unlocking their business value.
Last week was the first PDF 2.0 interop event, hosted by Global Graphics in Cambridge, UK on behalf of the PDF Association. The interop was an opportunity for developers from various companies working on their support for PDF 2.0 to get together and share sample files, and to process them in their own solutions. If a sample file from one vendor isn’t read correctly by a product from another vendor the developers can then figure out why, and fix either the creation tool or the consumer, or even both, depending on the exact reason for that failure.
When we make our own PDF sample files to test the Harlequin RIP there’s always a risk that the developer making the file and the developer writing the code to consume it will make the same assumptions or misread the specification in the same way. That makes testing files created by another vendor invaluable, because it validates all of those assumptions and possible misinterpretations as well.
It’s pretty early in the PDF 2.0 process (the standard itself will probably be published later this month), which means that some vendors are not yet far enough through their own development cycles to get involved yet. But that actually makes this kind of event even more valuable for those who participate because there are no currently shipping products out there that we could just buy and make sample files with. And the last thing that any of us want to do as vendors is to find out about incompatibilities after our products are shipped and in our customers’ hands.
I can tell you that our testing and discussions at the interop in Cambridge were extremely useful in finding a few issues that our internal testing had not identified. We’re busy correcting those, and will be taking updated software to the next interop, in Boston, MA on June 12th and 13th.
If you’re a Harlequin OEM or member of the Harlequin Partner Network you can also get access to our PDF 2.0 preview code to test against your own or other partners’ products; just drop me a line. If you’re using Harlequin in production I’m afraid you’ll have to wait until we release our next major version!
If you’re a software vendor with products that consume or create PDF and you’re already working on your PDF 2.0 support I’d heartily recommend registering for the June interop. I don’t know of any more efficient way to identify defects in your implementation so you can fix them before your customers even see them. Visit https://www.pdfa.org/event/pdf-interoperability-workshop-north-america/ to get started.
And if you’re a PDF software vendor and you’re not working on PDF 2.0 yet … time to start your planning!
In the middle of 2017 ISO 32000-2 will be published, defining PDF 2.0. It’s eight years since there’s been a revision to the standard. We’ve already covered the main changes affecting print in previous blog posts and here Martin Bailey, the primary UK expert to the ISO committee developing PDF 2.0, gives a roundup of a few other changes to expect.
The encryption algorithms included in previous versions of PDF have fallen behind current best practices in security, so PDF adds AES-256-bit and states that all passwords used for AES-256 encryption must be encoded in Unicode.
A PDF 1.7 reader will almost certainly error and refuse to process any PDF files using the new AES-256 encryption.
Note that Adobe’s ExtensionLevel 3 to ISO 32000-1 defines a different AES-256 encryption algorithm, as used in Acrobat 9 (R=5). That implementation is now regarded as dangerously insecure and Adobe has deprecated it completely, to the extent that use of it is forbidden in PDF 2.0. Deprecation and what this means in PDF!
PDF 2.0 has deprecated a number of implementation details and features that were defined in previous versions. In this context ‘deprecation’ means that tools writing PDF 2.0 are recommended not to include those features in a file; and that tools reading PDF 2.0 files are recommended to ignore those features if they find them.
Global Graphics has taken the deliberate decision not to ignore relevant deprecated items in PDF files that are submitted and happen to be identified as PDF 2.0. This is because it is quite likely that some files will be created using an older version of PDF and using those features. If those files are then pre-processed in some way before submitting to Harlequin (e.g. to impose or trap the files) the pre-processor may well tag them as now being PDF 2.0. It would not be appropriate in such cases to ignore anything in the PDF file simply because it is now tagged as PDF 2.0.
We expect most other PDF readers to take the same course, at least for the next few years. And the rest… PDF 2.0 header: It’s only a small thing, but a PDF reader must be prepared to encounter a value of 2.0 in the file header and as the value of the Version key in the Catalog.
PDF 1.7 readers will probably vary significantly in their handling of files marked as PDF 2.0. Some may error, others may warn that a future version of that product is required, while others may simply ignore the version completely.
Harlequin 11 reports “PDF Warning: Unexpected PDF version – 2.0” and then continues to process the job. Obviously that warning will disappear when we ship a new version that fully supports PDF 2.0. UFT-8 text strings: Previous versions of PDF allowed certain strings in the file to be encoded in PDFDocEncoding or in 16-bit Unicode. PDF 2.0 adds support for UTF-8. Many PDF 1.7 readers may not recognise the UTF-8 string as UTF-8 and will therefore treat it as using PDFDocEncoding, resulting in those strings being treated as what looks like a random sequence of mainly accented characters. Print scaling: PDF 1.6 added a viewer preferences key that allowed a PDF file to specify the preferred scaling for use when printing it. This was primarily in support of engineering drawings. PDF 2.0 adds the ability to say that the nominated scaling should be enforced. Document parts: The PDF/VT standard defines a structure of Document parts (common called DPart) that can be used to associate hierarchical metadata with ranges of pages within the document. In PDF/VT the purpose is to enable embedding of data to guide the application of different processing to each page range.
PDF 2.0 has added the Document parts structure into baseline PDF, although no associated semantics or required processing for that data have been defined.
It is anticipated that the new ISO standard on workflow control (ISO 21812, expected to be published around the end of 2017) will make use of the DPart structure, as will the next version of PDF/VT. The specification in PDF 2.0 is largely meaningless until such time as products are written to work with those new standards.
The last few years have been pretty stable for PDF; PDF 1.7 was published in 2006, and the first ISO PDF standard (ISO 32000-1), published in 2008, was very similar to PDF 1.7. In the same way, PDF/X‑4 and PDF/X‑5, the most recent PDF/X standards, were both published in 2010, six years ago.
In the middle of 2017 ISO 32000-2 will be published, defining PDF 2.0. Much of the new work in this version is related to tagging for content re-use and accessibility, but there are also several areas that affect print production. Among them are some changes to the rendering of PDF transparency, ways to include additional data about spot colors and about how color management should be applied.