Skip to content

Transformer - Google Doc API

Description

The Google Doc API transformer leverages Google's Document AI service to parse structured information from unstructured or semi-structured documents. Utilizing advanced AI technologies such as natural language processing, computer vision, and AutoML, this transformer extracts and structures data from documents, making it easily accessible for further processing or analysis. This is particularly useful for applications requiring automated data extraction from documents like invoices, contracts, or any other forms of structured documents.


Config

Parameters

Parameter Type Default Description
serviceAccountKey String N/A The service account key for accessing Google Cloud services.
location String us The location of the Google Cloud project.
projectId String N/A The ID of the Google Cloud project.
processorId String N/A The ID of the Document AI processor to use for processing documents.
mimeType String application/json The MIME type of the document to be processed.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
<apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root">
    <loadOrder>20</loadOrder>
    <dataFeeds>
        <dataFeed definition="EXPR_JSON" name="INVOICE_DOCAI">
            <execPriority>10</execPriority>
            <enabled>true</enabled>
            <push>false</push>
            <pull>true</pull>
            <schema>INVOICE</schema>
            <config><![CDATA[
{
  "dataSource": {
    "entity": "GIT",
    "config": {
      "password": "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",
      "gitURL": "https://github.com/redapiro/apiro_engine_test_feeds.git",
      "branch": "rudtest",
      "pathPrefix": "/rudtest/invoice.pdf",
      "username": "apirobot",
      "transformers": [
        {
          "name": "DOCAI",
          "entity": "DOCUMENT_AI",
          "config": {
            "serviceAccountKey" : "${SYS:GCP_SVC_ACCOUNT}",
            "location" : "us",
            "projectId" : "apiro-data-platform",
            "processorId" : "5f628045934c4151",
            "mimeType" : "application/json"
          }
        }
      ]
    }
  },
  "explicitMappings": [
    {
      "dictionary": "invoice_number",
      "value": "#{PAYLOAD.resolve('$.invoice_number.val')}"
    },
    {
      "dictionary": "extractor",
      "value": "#{ 'GOOGLE DOCUMENT_AI invoice '}"
    },
    {
      "dictionary": "receiver",
      "value": "#{PAYLOAD.resolve('$.receiver.val')}"
    },
    {
      "dictionary": "total_amount",
      "value": "#{PAYLOAD.resolve('$.total_amount.val')}"
    },
    {
      "dictionary": "full_json",
      "value": "#{PAYLOAD.resolve('$')}"
    }
  ]
}
]]>
            </config>
        </dataFeed>
    </dataFeeds>
</apiroConf>

Here is a concise portion of the above example, including only the direct structure of the transformer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "transformers": [
    {
      "name": "DOCAI",
      "entity": "DOCUMENT_AI",
      "config": {
        "serviceAccountKey" : "${SYS:GCP_SVC_ACCOUNT}",
        "location" : "us",
        "projectId" : "apiro-data-platform",
        "processorId" : "5f628045934c4151",
        "mimeType" : "application/json"
      }
    }
  ]
}

Common Mistakes

  • Incorrect Service Account Key: Ensure that the serviceAccountKey is correctly set to your valid Google Cloud service account key.
  • Incorrect Project ID or Processor ID: Verify that the projectId and processorId match the IDs of your Google Cloud project and Document AI processor.
  • Mismatched MIME Type: Ensure that the mimeType matches the format of the documents you are processing.
  • Security Concerns: Be cautious with the use of sensitive information, such as service account keys, in your configuration files.
  • Incorrect Data Source Configuration: Check that the dataSource configuration, including entity type and specific settings, is correctly set up to access your data source.