Modifying Fields for Document Types

Google has deprecated legacy versions of AutoML services, which directly impacts IDP's core functionality.

Additionally, the IDP application was deprecated with Appian 23.2. Customers who wish to use the application will need to refactor plug-ins using AutoML.

Introduction

Each organization's documents are unique. Appian's Intelligent Document Processing (IDP) is flexible enough to allow you to tailor the application to your organization's needs.

IDP comes out-of-the box with Invoice, Purchase Order, Claim, and Receipt document types. If your organization has different documents that it wants to use, you can also add new document types.

Each of these document types are set up to extract certain fields. This page provides instructions for adding and removing fields so that you can modify the information that is extracted from each document type.

Note: These instructions are specific to MySQL databases. If you use a different database, you may need to modify the steps.

Best practices for modifying document types

We suggest always adding and removing fields, instead of updating fields. This will make sure that the table columns and custom data type (CDT) fields match.

See Mapping Custom Data Types (CDTs) to Pre-defined Database Tables for more information about making changes to existing CDTs.

Adding fields to a document type

If you're an Appian developer that needs to capture more data than what exists in the default document types of IDP, you can add fields to the document type's CDT.

In order to extract more fields for a document type, you will need to add the new field to the CDT and verify the data store. There are additional steps you should take if you're adding fields from tables.

Step 1: Add the new field to the CDT

In order to extract more information for a document type, you first need to add the new field to the CDT. See Appian Document Extraction to learn more about best practices for creating a new CDT and setting up the CDT fields for document extraction.

There are some extra steps we recommend to make sure the CDT field's column in the database table can accept large character values. Appian Text fields automatically use VARCHAR(255) for the column definition in the associated database table. This means that you can only write data to the column in the database table if it is 255 characters or less. Therefore, we recommend updating the column definition to text, which can handle more data. To do this, you will need to update the CDT using an XSD file.

To add the new field to the CDT:

Open the CDT for the document type that you want to modify. The following are the names for the CDTs that are provided by default. If you want to modify a document type that you added, you would use that CDT.
- Invoice: DU_Invoice
- Purchase Order: DU_PurchaseOrder
- Claims: DU_Claim
- Receipt: DU_Receipt
Create fields for the information you want to extract, using the following guidelines:
- For the Type, select Text.
  - Note: If your document contains a checkbox, create a field for every checkbox option, even mutually exclusive options. Select Boolean as the type for each one.
- Use camel case to format the field names.
  - Note: This makes sure the field names are formatted with proper casing and spaces when they are displayed in the interface that is used to reconcile extracted data. For example, the field name licenseNumber becomes License Number in the interface.
Update the column definition in the CDT's XSD file to use text for the column definition.
- Select the settings menu > Download XSD.
- Open the XSD file.
- Find the column you just added and change <xsd:appinfo source="appian.jpa">@Column(length=255)</xsd:appinfo> to <xsd:appinfo source="appian.jpa">@Column(columnDefinition="text")</xsd:appinfo>
- Save the XSD file.
- Select the settings menu > Create New Version from XSD.
- Upload the XSD file you just saved.
- A warning displays that says the data store will need to be updated. You will do this when you verify the data store.
- Click CREATE NEW VERSION.

Step 2: Verify the data store

Verifying the data store adds the new field to the database table and makes sure that the CDT field names and data types match the database column names and data types.

See Managing Data Stores for more information about editing data stores.

To verify the data store:

In the Intelligent Document Processing (IDP) application, open the DU Data Store object.
Click VERIFY. A "No matching tables found!" message appears.
Click Download DDL Script and save the SQL file so you can use it to deploy the changes to another environment.
Make sure Create tables automatically is selected, then click SAVE & PUBLISH.

Removing fields before they have been used in production

If you added a field and want to remove it, or if you want to remove fields from the default document types in IDP, you can remove these fields as long as the document type has not been used in production yet. If you want to remove fields from the default document type, it is best to do this during the initial application set up.

Note: The only time you should delete fields is before you have started to use the document type in production. Otherwise, any reconciliation tasks that are already in process in production will error when they try to write the data to the database.

In order to stop extracting fields for a document type, you will need to remove the associated column from the database table, remove the field from the CDT, and verify the data store.

Step 1: Remove the associated column from the database table

Before making any changes in Appian Designer, you must first update the database table.

To update the database table:

The following are the names for the CDTs that are provided by default. If you want to modify a document type that you added, you would use that CDT.
- Invoice: duinvoice
- Purchase Order: dupurchaseorder
- Claims: duclaim
- Receipt: dureceipt
Remove the column that maps to the field you no longer want to extract.
- Note: Save the SQL command you use to remove the column so that you can use it to deploy the changes to another environment.

Example

To delete the supplier field from the duinvoice table, you could execute a database command like the following. Note that this example uses MySQL syntax.

ALTER TABLE `duinvoice` DROP `supplier`;

Step 2: Remove the field from the CDT

After you have updated the database table, you can remove the field from the CDT.

See Mapping Custom Data Types (CDTs) to Pre-defined Database Tables for more information about making changes to CDTs.

Open the CDT for the document type that you want to modify. The following are the names for the CDTs that are provided by default. If you want to modify a document type that you added, you would use that CDT.
- Invoice: DU_Invoice
- Purchase Order: DU_PurchaseOrder
- Claims: DU_Claim
- Receipt: DU_Receipt
Remove the fields that you no longer want to extract by clicking the red x.
Click SAVE CHANGES.

Step 3: Verify the data store

Verifying the data store makes sure that the CDT is mapped and ready to be used in IDP.

See Managing Data Stores for more information about editing data stores.

To verify the data store:

In the Intelligent Document Processing (IDP) application, open the DU Data Store object.
Click VERIFY.
Make sure an "Entity mappings verified" message displays.
- If this message does not display, do not save the data store. The database table or CDT updates may not match the data store. Verify the changes made in steps one and two match each other, then try verifying again.
Click SAVE & PUBLISH.

Extracting and storing data from tables

Not all documents include simple data. Quite often, documents can contain more complex forms of information, notably when formatted in a table. Invoices, for example, might list multiple line items in a table specifying the quantity, description, and price for goods or services. During extraction, it's still valuable to associate these items with a single invoice number, so we should account for this data relationship during document processing. This data can be stored in separate database tables to keep it organized, but by associating it with a single primary key, it's still easy to display or access for other purposes.

You can create new CDTs in IDP to recognize, extract, and save data in tables. In the instructions below, CDTs are referred to as "parents" and "children" to describe their relationship. See Custom Data Type Relationships for more information.

When data is returned from Google, it is presented to a user for a reconciliation task. The Reconcile Doc Extraction Smart Service uses nested CDTs to identify which fields to display in a table. To organize this data in the reconciliation task interface, we set up a parent CDT for the document data and child CDT for the table data. Step 2 describes how to nest those CDTs to establish their relationship.

A note about nesting

In some cases, Appian doesn't recommend nesting CDTs for writing data with a one-to-many relationship due to potential performance issues. However, nesting your CDTs is recommended in the case of tables in document extraction.

We recommend developers consider how the data will be queried and displayed when creating CDTs. In IDP, data from a table in an extracted document is usually displayed or queried in the context of the original document. Queries operate on the parent-child CDT relationship, so performance is not negatively impacted. However, if you plan to query table data across multiple documents or document types, performance may suffer.

Step 1: Create a CDT for the table

First, you'll need to set up the custom data type to organize and write the data extracted from tables.

In the Intelligent Document Processing application, create a new CDT.
- For the Namespace, enter urn:com:appian:types:DU.
- For the Name, use the naming convention DU_<DocumentType>_<TableType>. Replace <DocumentType> with the name of the document type, and <TableType> with the name of the table type. For example, DU_Purchase_Order_Table1.
- (Optional) Add a Description of the document type.
Add each column header as a field with type Text.
If you don't already have a primary key field in the table, create one to organize the entries.
Click Save Changes.
Repeat this process for as many tables as a document contains.

Step 2: Edit the document type's CDT to add the table CDT

Next, you'll add the table CDT as a field in the document type CDT to create a parent-child CDT relationship. Table data is organized in its own database table, but each entry is associated with a distinct document extraction, listed in a separate database table.

Add the new field to the document type CDT using the instructions above.
In the Type field, enter the name of the CDT object you created in step 1.
The Key field automatically shows that this CDT now has a foreign key, with a one-to-many relationship. Click the key icon to configure the field relationship.
Under Child Field Type, check the boxes for both Updates to a parent value should also update associated child value(s) and I know the name of the column(s) this field should use in the database.
In the Column Name field, type parentid in all lowercase. Use this value exactly.
Click Save Changes.

Step 3: Verify the data store

Finish the process of adding new fields in IDP by verifying the data store is properly setup.

The steps above describe publishing the datastore after verification. When you publish a datastore the first time, Appian creates tables in the database. This is the database table creation pattern we recommend. If you manually create database tables, you'll have to modify the CDT's XSD definition to map the CDTs to the corresponding tables and also manually add a parentid column to the child database table.

Reconciliation tasks for documents with tables

Reconciliation tasks help improve document processing and extraction results. These tasks ask users to correct any incorrectly mapped data or data that wasn't properly recognized. Appian learns from this manual correction to help improve extraction results next time.

Feedback

Was this page helpful?