Google has deprecated legacy versions of AutoML services, which directly impacts IDP's core functionality. Additionally, the IDP application was deprecated with Appian 23.2. Customers who wish to use the application will need to refactor plug-ins using AutoML. |
Each organization's documents are unique. Appian's Intelligent Document Processing (IDP) is flexible enough to allow you to tailor the application to your organization's needs.
IDP comes out-of-the box with Invoice, Purchase Order, Claim, and Receipt document types. If your organization has different documents that it wants to use, you can also add new document types.
Each of these document types are set up to extract certain fields. This page provides instructions for adding and removing fields so that you can modify the information that is extracted from each document type.
Note: These instructions are specific to MySQL databases. If you use a different database, you may need to modify the steps.
We suggest always adding and removing fields, instead of updating fields. This will make sure that the table columns and custom data type (CDT) fields match.
See Mapping Custom Data Types (CDTs) to Pre-defined Database Tables for more information about making changes to existing CDTs.
If you're an Appian developer that needs to capture more data than what exists in the default document types of IDP, you can add fields to the document type's CDT.
In order to extract more fields for a document type, you will need to add the new field to the CDT and verify the data store. There are additional steps you should take if you're adding fields from tables.
In order to extract more information for a document type, you first need to add the new field to the CDT. See Appian Document Extraction to learn more about best practices for creating a new CDT and setting up the CDT fields for document extraction.
There are some extra steps we recommend to make sure the CDT field's column in the database table can accept large character values. Appian Text
fields automatically use VARCHAR(255)
for the column definition in the associated database table. This means that you can only write data to the column in the database table if it is 255 characters or less. Therefore, we recommend updating the column definition to text
, which can handle more data. To do this, you will need to update the CDT using an XSD file.
To add the new field to the CDT:
DU_Invoice
DU_PurchaseOrder
DU_Claim
DU_Receipt
Text
.
Boolean
as the type for each one.text
for the column definition.
<xsd:appinfo source="appian.jpa">@Column(length=255)</xsd:appinfo>
to <xsd:appinfo source="appian.jpa">@Column(columnDefinition="text")</xsd:appinfo>
Verifying the data store adds the new field to the database table and makes sure that the CDT field names and data types match the database column names and data types.
See Managing Data Stores for more information about editing data stores.
To verify the data store:
Intelligent Document Processing (IDP)
application, open the DU Data Store
object.If you added a field and want to remove it, or if you want to remove fields from the default document types in IDP, you can remove these fields as long as the document type has not been used in production yet. If you want to remove fields from the default document type, it is best to do this during the initial application set up.
Note: The only time you should delete fields is before you have started to use the document type in production. Otherwise, any reconciliation tasks that are already in process in production will error when they try to write the data to the database.
In order to stop extracting fields for a document type, you will need to remove the associated column from the database table, remove the field from the CDT, and verify the data store.
Before making any changes in Appian Designer, you must first update the database table.
To update the database table:
duinvoice
dupurchaseorder
duclaim
dureceipt
To delete the supplier
field from the duinvoice
table, you could execute a database command like the following. Note that this example uses MySQL syntax.
1
ALTER TABLE `duinvoice` DROP `supplier`;
After you have updated the database table, you can remove the field from the CDT.
See Mapping Custom Data Types (CDTs) to Pre-defined Database Tables for more information about making changes to CDTs.
DU_Invoice
DU_PurchaseOrder
DU_Claim
DU_Receipt
Verifying the data store makes sure that the CDT is mapped and ready to be used in IDP.
See Managing Data Stores for more information about editing data stores.
To verify the data store:
Intelligent Document Processing (IDP)
application, open the DU Data Store
object.Not all documents include simple data. Quite often, documents can contain more complex forms of information, notably when formatted in a table. Invoices, for example, might list multiple line items in a table specifying the quantity, description, and price for goods or services. During extraction, it's still valuable to associate these items with a single invoice number, so we should account for this data relationship during document processing. This data can be stored in separate database tables to keep it organized, but by associating it with a single primary key, it's still easy to display or access for other purposes.
You can create new CDTs in IDP to recognize, extract, and save data in tables. In the instructions below, CDTs are referred to as "parents" and "children" to describe their relationship. See Custom Data Type Relationships for more information.
When data is returned from Google, it is presented to a user for a reconciliation task. The Reconcile Doc Extraction Smart Service uses nested CDTs to identify which fields to display in a table. To organize this data in the reconciliation task interface, we set up a parent CDT for the document data and child CDT for the table data. Step 2 describes how to nest those CDTs to establish their relationship.
In some cases, Appian doesn't recommend nesting CDTs for writing data with a one-to-many relationship due to potential performance issues. However, nesting your CDTs is recommended in the case of tables in document extraction.
We recommend developers consider how the data will be queried and displayed when creating CDTs. In IDP, data from a table in an extracted document is usually displayed or queried in the context of the original document. Queries operate on the parent-child CDT relationship, so performance is not negatively impacted. However, if you plan to query table data across multiple documents or document types, performance may suffer.
First, you'll need to set up the custom data type to organize and write the data extracted from tables.
urn:com:appian:types:DU
.DU_<DocumentType>_<TableType>
. Replace <DocumentType>
with the name of the document type, and <TableType>
with the name of the table type. For example, DU_Purchase_Order_Table1.Next, you'll add the table CDT as a field in the document type CDT to create a parent-child CDT relationship. Table data is organized in its own database table, but each entry is associated with a distinct document extraction, listed in a separate database table.
parentid
in all lowercase. Use this value exactly.Finish the process of adding new fields in IDP by verifying the data store is properly setup.
The steps above describe publishing the datastore after verification. When you publish a datastore the first time, Appian creates tables in the database. This is the database table creation pattern we recommend. If you manually create database tables, you'll have to modify the CDT's XSD definition to map the CDTs to the corresponding tables and also manually add a parentid
column to the child database table.
Reconciliation tasks help improve document processing and extraction results. These tasks ask users to correct any incorrectly mapped data or data that wasn't properly recognized. Appian learns from this manual correction to help improve extraction results next time.
Modifying Fields for Document Types