Extracting Data from Signed PDF using LiveCycle Server

Very common request- how do I extract data from a Signed PDF using livecycle ES
To do this you will need to have livecycle server software installed. This example uses processFormSubmission service operation of the forms component.
Attached is the PDF which explains the process and it also has the process lca and the test file need to run the process
Click here
This process can be used when you are getting the signed PDF from email/watchedFolder. This process can also be used when you are submitting the signed pdf from workspace

16 responses to “Extracting Data from Signed PDF using LiveCycle Server

  1. Carlos Nascimbene

    It works very good!!!! Can you provide a deeper explanation about the use of the namespaces in order to get the data into the variables with the setValue Operation?

    Thanks a lot,

    Carlos

    • Girish Bedekar

      Hi Carlos
      If you look at the data which is extracted from the PDF using the process Form Submission, the data has 2 namespaces defined viz xdp and xfa. Now in order to access the data in the xml, you will also have to define the namespaces. We define the namespaces in our process. The namespace consists of a “Prefix” and URI portion. For example I had the following namespace defined in the process
      d http://ns.adobe.com/xdp/. Here d is the namepsace prefix and “http://ns.adobe.com/xdp/” is the namespace URI. If you see in the xml data, you have a namespace called XDP which points to “http://ns.adobe.com/xdp/”
      Then in my setvalue I used the “d” prefix to access the xml data. Basically whereever xdp namespace was used, I replaced it with my namespace-d in this case.
      let me know if you have any more questions
      thanks
      girish

  2. Carlos Nascimbene

    Your example works really fine but in my process I’m facing some problems to retrieve and set the variable values from the resultant XML data.

    When I print the variables to the log after use the SetValue Operation i get null values (but the XML variable is holding all the XDP with the data and the chunk pdf):

    2009-04-28 11:07:18,359 INFO [STDOUT] [PID:5,812] /process_data/apellido_afiliado: null

    The XPath expressions i’m using are:

    LOCATION
    /process_data/@apellido_afiliado
    EXPRESSION
    /process_data/XML_Data/d:xdp/f:datasets/dd:data/DatosAfiliado/apellido

    What i’m doing wrong? Maybe the problem colud be the root node in my schema has not the same name of the root form element in my object’s hierarchy?

    If you think that sending all my XML data could be valuable please tell me.

    Thanks again for all your help,

    Carlos

  3. Hi Girish,
    Your example is really very nice, In my case we dont want the signed PDF so i ignored that.

    We want to export data into excel sheet, so could you please help me on this.

    Thanks in Advance.
    Sameer

  4. Yes Girish,

    Livecycle ES 8.2

  5. Hi Girish,

    Thanks for your quick response..!!

    Actually, I am looking for two processes –

    1) that extracts the data from a pdf dropped into the watched folder and save it to the MYSQL db.

    2) another process Initiated by user that exports the MYSQL data to Excel sheet.

    So please when ever you get free time please help me on this.

    Regards,
    Sharique

  6. Hi Girish,

    Thanks for your support.
    I have done the process 1, but not able to do second (Process 2 may not be that easy, (does it have to be excel sheet))
    Yes, this data from the MYSql DB has to be expert throgh LC process.

    Regards,
    Sharique

  7. Hi ,
    I am trying get the data in xml form from processformsubmission component for the past 2 days. Please send me the sample @ renjithvijayan2005@gmail.com
    thanks
    Renjith

  8. Hi Girish ,
    I tried your sample program and its works fine. It is exactly what I am looking for. I just want to know how you are defining the name space in the process? Ie: how u are assigning name space to “d” and where?

    • Hi
      If you right click the process and see its properties you should see the namespace defined there
      thanks
      girish

  9. How can you submit a pdf or other document to a process in workspace?

  10. Tiruppathi Rajan Gunaseelan

    Hi Girish,

    Thanks for the nice post on extracting data from the pdf. I am having difficulties in implementing your example. Basically I had created a process named “processFormSubmission” in my LiveCycle workbench which has 2 activities defined in it, 1. Default start point and the other one processFormSubmission from the Forms service. I had defined the input reference to this process as document which is nothing but the pdf file, content type as pdf, pdf to xdo as true, mapped the output from the process as document which is nothing but the xml data extracted out of the pdf. I am not sure on where to reference the imported ExtractDataFromSignedPDF.lca file in the process and not sure on the namespaces to be created. Without these lca and the namespace, I got the response as “The invocation of this long-lived process returned job-id ” when I invoke the process from the workbench.

    Please let me know on what I am doing wrong,

    Also please share me the complete steps to extract data from the any input pdf passed into the process. Also let me know whether we would be able to extract xml out of the flattened pdf document or not.

    Appreciate your quick support in this. Thanks in advance.

Leave a reply to Sameer Cancel reply