Description
The Google Doc API transformer leverages Google's Document AI service to parse structured information from unstructured or semi-structured documents. Utilizing advanced AI technologies such as natural language processing, computer vision, and AutoML, this transformer extracts and structures data from documents, making it easily accessible for further processing or analysis. This is particularly useful for applications requiring automated data extraction from documents like invoices, contracts, or any other forms of structured documents.
Config
Parameters
Parameter |
Type |
Default |
Description |
serviceAccountKey |
String |
N/A |
The service account key for accessing Google Cloud services. |
location |
String |
us |
The location of the Google Cloud project. |
projectId |
String |
N/A |
The ID of the Google Cloud project. |
processorId |
String |
N/A |
The ID of the Document AI processor to use for processing documents. |
mimeType |
String |
application/json |
The MIME type of the document to be processed. |
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62 | <apiroConf version="1" xmlns="http://apiro.com/apiro/v1/root">
<loadOrder>20</loadOrder>
<dataFeeds>
<dataFeed definition="EXPR_JSON" name="INVOICE_DOCAI">
<execPriority>10</execPriority>
<enabled>true</enabled>
<push>false</push>
<pull>true</pull>
<schema>INVOICE</schema>
<config><![CDATA[
{
"dataSource": {
"entity": "GIT",
"config": {
"password": "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",
"gitURL": "https://github.com/redapiro/apiro_engine_test_feeds.git",
"branch": "rudtest",
"pathPrefix": "/rudtest/invoice.pdf",
"username": "apirobot",
"transformers": [
{
"name": "DOCAI",
"entity": "DOCUMENT_AI",
"config": {
"serviceAccountKey" : "${SYS:GCP_SVC_ACCOUNT}",
"location" : "us",
"projectId" : "apiro-data-platform",
"processorId" : "5f628045934c4151",
"mimeType" : "application/json"
}
}
]
}
},
"explicitMappings": [
{
"dictionary": "invoice_number",
"value": "#{PAYLOAD.resolve('$.invoice_number.val')}"
},
{
"dictionary": "extractor",
"value": "#{ 'GOOGLE DOCUMENT_AI invoice '}"
},
{
"dictionary": "receiver",
"value": "#{PAYLOAD.resolve('$.receiver.val')}"
},
{
"dictionary": "total_amount",
"value": "#{PAYLOAD.resolve('$.total_amount.val')}"
},
{
"dictionary": "full_json",
"value": "#{PAYLOAD.resolve('$')}"
}
]
}
]]>
</config>
</dataFeed>
</dataFeeds>
</apiroConf>
|
Here is a concise portion of the above example, including only the direct structure of the transformer:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | {
"transformers": [
{
"name": "DOCAI",
"entity": "DOCUMENT_AI",
"config": {
"serviceAccountKey" : "${SYS:GCP_SVC_ACCOUNT}",
"location" : "us",
"projectId" : "apiro-data-platform",
"processorId" : "5f628045934c4151",
"mimeType" : "application/json"
}
}
]
}
|
Common Mistakes
- Incorrect Service Account Key: Ensure that the
serviceAccountKey
is correctly set to your valid Google Cloud service account key.
- Incorrect Project ID or Processor ID: Verify that the
projectId
and processorId
match the IDs of your Google Cloud project and Document AI processor.
- Mismatched MIME Type: Ensure that the
mimeType
matches the format of the documents you are processing.
- Security Concerns: Be cautious with the use of sensitive information, such as service account keys, in your configuration files.
- Incorrect Data Source Configuration: Check that the
dataSource
configuration, including entity type and specific settings, is correctly set up to access your data source.