Ruby Notes - PDF Form Fillup

Last week I ventured into the world of PDF Form in order to implement a stop-gap solution what could have better be done by a full-fledge API. As part of a business workflow, stakeholders wanted the application to fill an interactive PDF form which would be passed around to the next hop. Now in rails world PDF is synonymous with the famous prawn gem. So I first tried with that.

Prawn Curry !

I quickly realized that prawn helps when you need to render a PDF from scratch. But the great gem has a solid extension model and a bunch of them can be found at github. I've pickedup prawn-fillform gem (from github, not from rubygems - https://github.com/moessimple/prawn-fillform) and the example looks good -

Prawn::Document.generate "output.pdf", :template => "template.pdf"  do |pdf|  
  pdf.fill_form_with(data)
end  

Here :template option tells prawn to use an existing PDF as template in order to generate the output.pdf. fill_form_with is an extension method added to the Prawn::Document by the prawn-fillform gem. data is a hash that specify which fields need to be filled up with what value. But wait ! How I can fugure out the field names? I don't see them in Acrobat Reader. Enter Acrobat Pro - http://www.adobe.com/au/products/acrobatpro.html. Once the template form is opened in Acrobat X/XI, I can see the fieldname highlighted -

Drawing

But my new-found joy soon disappeared, when I realized that the generated PDF is "flattened" out, meaning there were no interactable fields (text field, select, radio button etc.) over there. Surely deep inside the fill_form_with code, I found this -

text_box value, :at => [field.x + x_offset, field.y + y_offset],  
                :align => options[:align] || field.align,
                :width => options[:width] || field.width,
                :height => options[:height] || field.height,
                :valign => options[:valign] || :center,

                # Default to the document font size if the field size is 0
                :size => options[:font_size] || ((size = field.font_size) > 0.0 ? size : font_size),
                :style => options[:font_style] || field.font_style

The text_box function basically redraws a textbox in place of a text field and thus flatten the output. I thought what if there is an extension who can add interactive text_field (acrofields) instead of drawing text_box? Indeed there is one - prawn-forms (https://github.com/yob/prawn-forms). The brief documentation looks like exactly what I want -

Prawn-Forms depends on the core prawn library, and simply adds a few additional methods to the standard Prawn::Document object that allow you to add form elements to your output. Start by requiring the prawn library, then the prawn-forms library. Build your PDF as usual, and use methods like text_field() to add a free text input box.

But wait how to pass value? The github example just renders a text-field with a name and co-ordinates/dimensions. Digging the code and reading PDF Spec(:V specifies field value for text field which is of type :Tx), it seems text_field can render a field with a value pre-populated.

field_dict[:V] = Prawn::Core::LiteralString.new(opts[:default])  

I tried to merge text_field with prawn-fillform by replacing text_box with a equivalent call to text_field as

text_field(field.name.to_s, field.x + x_offset, field.y + y_offset,  
                            :width => options[:width] || field.width,
                            :height => options[:height] || field.height,
                            :default => value)

PDF Annotations

This ideally should be an happy ending. But real-life is rarely ideal ! It seems there is a version-related issue with how prawn handles PDF annotations. What is annotation?

Interactive forms are primarily defined by section 8.6 the PDF spec. The visual appearance of fields is controlled using widget annotations, defined in section 8.4.5. A PDF can contain one form.

The form is defined by a dict linked via the AcroForm entry in the root catalog. Amongst other entries, the form dict has a Fields entry that holds an array of all the form fields. The array should be indirect references to a dicts, generally one per field.

Each field dict can link to one or more widget annotations that define the visual appearance in various states. It seems most fields have a single widget annotation, in which case it is permitted to merge the field and annotation dicts into a single dict. Annotation and field dicts have no common keys, so there is no conflict.

For the fields to appear on the page, the widget annotation (or merged field/widget dict) must appear in the page Annots entry.

After some time, I found another gem which looks a bit more comprehensive as far as PDF Form fields rendering is concerned - prawn-blank https://github.com/hannesg/prawn-blank, which similarly provides text_field, select, checkbox, radio - almost all of the form elements.

It worked with generating a new PDF from with interactive elements but failed miserably while populating fields of an existing PDF Form.

PDFKT to Rescue

I even ventured towards iText which only has JRuby bindings. Mike Perham has a blog post on this - http://www.mikeperham.com/2011/02/15/filling-out-pdf-forms-with-jruby/. But JRuby would have been too much for our rails application.

Only other option is to try out PDFKT http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ Does it work with templates? It does beautifully. As it was a tool-based solution rather than pure ruby, I looked for a ruby-wrapper perhaps. Surely there was one - pdf-forms https://github.com/jkraemer/pdf-forms

require 'pdf_forms'  
pdftk = PdfForms.new('/usr/local/bin/pdftk')  
pdftk.fill_form '/path/to/form.pdf', 'myform.pdf', :foo => 'bar'  

But somehow I wanted a more "sugary" interface just like prawn. So I went ahead and wrote the way I wanted it

data = {}  
data[:LI1BizName] = company  
…

Aurora::Fulfilments::FulfilmentFormPdf.generate "Master_Form_Campaign_#{adwords_campaign_id}_#{id}.pdf" do |pdf|  
  pdf.fill_form_with(data)
end  

It was similar to what prawn-fillform provided. I decided to hide the true "master template" pdf, configuration for pdfkt - tucked away in an initializer perhaps -

Aurora::Fulfilments::FulfilmentFormPdf.configure do |config|

  # Path to the pdfkt binaries. See http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
  config.pdfkt_path = '/opt/pdflabs/pdftk/bin/pdftk'

  # path relative to root where the PDFs will be stored
  config.fulfilment_pdf_path = "data"

  # path relative to root where the master template is stored
  config.fulfilment_pdf_template = "Master_Form_Copy.pdf"

end  

With this requirement set, implementing FulfilmentFormPdf was simple -

require 'pdf-forms'

module Aurora  
  module Fulfilments
    class FulfilmentFormPdf

      # Will need while configuring
      class << self
        attr_accessor :pdfkt_path
        attr_accessor :fulfilment_pdf_path, :fulfilment_pdf_template
        attr_reader   :pdf_form
        attr_accessor :options

        def pdfkt_path=(val)
          @pdfkt_path = val
          @pdf_form = PdfForms.new(@pdfkt_path)
        end
      end

      def self.configure
        yield self
      end

      # Fill and save a PDF based on the filename, using options[:template] as a template form
      #
      def self.generate(filename, options = {}, &block)
        @options = {
            :filename => Rails.root.join(fulfilment_pdf_path, filename),
            :template => fulfilment_pdf_template
            }.update(options)

        block[self] if block.present?
      end

      def self.fill_form_with(data)
        return unless @pdf_form.present?
        @pdf_form.fill_form @options[:template], @options[:filename], data
      end

    end
  end
end