Adding a Document Type

Introduction

Each organization's documents are unique. Appian's Intelligent Document Processing (IDP) is flexible enough to allow you to tailor the application to your organization's needs.

IDP comes out-of-the box with Invoice, Purchase Order, Claim, and Receipt document types. Your organization may need to process documents of a different type. This page provides instructions for Appian developers to add more document types. See Appian Document Extraction for more information on what types of documents work best for data extraction.

To enable managers to easily derive insights from the reporting in the Metrics tab, we recommend limiting the number of document types to four, excluding the invalid document type. If you need more than four document types, we recommend adding another document channel.

These instructions are specific to MySQL databases. If you use a different database, you may need to modify the steps.

If you want to modify the fields of existing document types, refer to Modifying Fields for Document Types.

Step 1: Add the new document type to the reference table

The dudoctype reference table stores all of the document types used in IDP. In order to add a document type, you must first add a row to this database table to define the new document type.

To add the new document type to the reference table:

  1. Insert a new row into the dudoctype table.
  2. Update each column with the following information:
    • doctypeid: Set this to NULL. When set to NULL, an auto-incrementing ID will automatically be assigned.
    • doctypename: The document type name. This name must be unique among all document type names.
    • choiceindex: The index that is referenced in expression rules to select the corresponding document type and data store entity. The choiceindex for a newly added document must be one more than the highest choiceindex for that channelid.
    • doctypestatus: Initially, set this to Inactive. When you configure the application you will select which document types you want to process. This column updates to active for document types that are selected.
    • isinvalidtype: Used to identify the Invalid document type. Unless you are adding an Invalid document type for a new document channel, set this to 0. There can be only one invalid document type for each document channel.
    • channelid: The value from the channelid column of the dudocchannel reference table. Use the value for the document channel that you are adding the document type to. For the Standard channel that comes out of the box, this value is 1.

EXAMPLE

Let's say your organization needs to process medical device licenses and you want to add it to the first document channel. The highest choiceindex for channel ID 1 is 4 which means the choiceindex for the new document type would be 5. If you were adding the new document type to channel ID 2 instead, the choiceindex would be 4 because for channel ID 2 the highest choiceindex is 3.

doctypeid doctypename choiceindex doctypestatus isinvalidtype channelid
0 NULL 0 Active 1 1
1 Invoice 1 Active 0 1
2 Purchase Order 2 Active 0 1
3 Claim 3 Active 0 1
4 Receipt 4 Active 0 1
5 NULL 0 Active 1 2
6 Partnership Application 1 Active 0 2
7 Statement of Work 2 Active 0 2
8 Retirement Application 3 Active 0 2

To add a new document type called Medical Device License, update the dudoctype table by executing a database command like the following. Note that this example uses MySQL syntax.

1
INSERT INTO `dudoctype` (`doctypeid`, `doctypename`, `choiceindex`, `doctypestatus`, `isinvalidtype`, `channelid`) VALUES (NULL, 'Medical Device License', '5', 'Inactive', '0', '1');

Step 2: Create a CDT and specify fields for the document type

In order to create a new document type, you will need to create a custom data type (CDT) with fields for the information you want to extract from the new document type.

See Appian Document Extraction to learn more about best practices for creating a new CDT and setting up the CDT fields for document extraction.

To create a CDT for the new document type:

  1. Go to > Appian Designer
  2. In the Intelligent Document Processing (IDP) application, create a CDT for the new document type by duplicating an existing data type using one of the original CDTs.

    Why should you duplicate an existing data type? Fields of type Text automatically use VARCHAR(255) for the column definition in the associated database table. We recommend updating the column definition in the XSD to use text. This has a much larger character limit to prevent problems with writing more than 255 characters to the table. Instead of editing the XSD, it is easier to just duplicate an original CDT.

    • For the Namespace, enter urn:com:appian:types:DU.
    • For the Name, use the naming convention DU_<DocumentType>. Replace <DocumentType> with the name of the new document type. For example, DU_MedicalDeviceLicense.
    • (Optional) Add a Description of the document type.
  3. Update the field names to reflect the information you will want to extract from this document type, using the following guidelines:
    • Do not change the Type of any of the fields. All these fields must be of type Text.
    • Do not modify the id field.
    • Use camel case to format the field names.
      • Note: This will allow the field names to be formatted with proper casing and spaces when they are displayed in the interface that is used to reconcile extracted data. For example, the field name licenseNumber would become License Number in the interface.
    • All of the original document type CDTs have four fields to extract data. If you need more than four fields, follow the instructions in Modifying Fields for Document Types to add fields and update the column definition to use text.
  4. Click SAVE CHANGES.

EXAMPLE

To create a new Medical Device License document type, you would create a new CDT with the name DU_MedicalDeviceLicense. In this example, we duplicated the DU_Invoice data type.

add_doc_type_creat_data_type

In that CDT, you would update the fields with the information you want to extract from a medical device license, such as licenseNumber, licenseType, and issueDate.

CDT fields example

Step 3: Create a data store entity and database table for the document type

Now that you have created the CDT, you must create a data store entity in the DU Data Store and verify the data store to create a new database table.

See Data Stores for more information about editing data stores.

To create a data store entity and verify the data store:

  1. In the Intelligent Document Processing (IDP) application, open the DU Data Store object.
  2. Click Add Entity.
  3. For Name, enter the name of the CDT you just created for the new document type.
  4. For Type, select the CDT you just created.
  5. Click VERIFY. A "No matching tables found!" message appears.
  6. Click Download DDL Script and save the SQL file so you can use it to deploy the changes to another environment.
  7. Make sure Create tables automatically is selected, then click SAVE & PUBLISH.

EXAMPLE

To create the data store entity and table for the Medical Device License document type, add it to the DU Data Store object using the DU_MedicalDeviceLicense CDT and verify the data store.

Step 4: Create a new data store entity constant

In order to refer to the data store entity in other Appian objects, you will need to create a new constant that points to the data store entity that you just created for the new document type.

To create a constant for the new data store entity:

  • In the Intelligent Document Processing (IDP) application, create a new constant.
    • Name it after the new data store entity, using DSE (for data store entity) at the end of the name. For example, DU_NEW_DOC_TYPE_DSE.
    • For the Type, select Data Store Entity.
    • For the Data Store, select DU Data Store.
    • For the Entity, select the entity you just created.
    • Save it in the DU Rules and Constants folder and click CREATE.

EXAMPLE

For the Medical Device License document type, you would create a constant and name it DU_MEDICAL_DEVICE_LICENSE_DSE, select DU Data Store for the Data Store, select DU_MedicalDeviceLicense for the data store entity, and click CREATE.

add_doc_type_data_store

Step 5: Update existing expression rules with the new objects

After you create the new objects for the new document type CDT and data store entity constant you need to update existing expression rules to use the new objects. This allows the new document type to be used in the application.

Update the expression rule for the document type CDT

The DU_returnDataTypeForChoiceIndex expression rule returns the CDT for a document type. Basically, given an index, it returns the CDT that matches the index in an array of CDTs. It is used to dynamically invoke the correct CDT for the document type when performing the reconciliation task.

In order for it to return the custom data type for the new type of document, you will need to add the CDT you created to the expression rule.

To add the new CDT to the expression rule:

  1. Open the DU_returnDataTypeForChoiceIndex expression rule.
  2. In the choose() function, add the new CDT as the last item in the array, using the type!{urn:com:appian:types:DU}DU_NewDocumentType'() convention.
  3. Click SAVE CHANGES.

EXAMPLE

To update the DU_returnDataTypeForChoiceIndex expression rule for the medical device license document type, you would add the type!DU_MedicalDeviceLicense() CDT as the last item in the choose() function.

Entering type!DU_ and selecting DU_MedicalDeviceLicense from the auto-suggest list, will automatically convert type!MedicalDeviceLicense() to type!{urn:com:appian:types:DU}DU_MedicalDeviceLicense().

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if(
  rule!DU_isInvalidDocTypeId(docTypeId: ri!choiceIndex),
  {},
  choose(
    ri!channelId,
    /*Channel ID 1*/
    choose(
      ri!choiceIndex,
      'type!{urn:com:appian:types:DU}DU_Invoice'(),
      'type!{urn:com:appian:types:DU}DU_PurchaseOrder'(),
      'type!{urn:com:appian:types:DU}DU_Claim'(),
      'type!{urn:com:appian:types:DU}DU_Receipt'()
    )
  )
)

Update the expression rule for the document type data store entity

The DU_returnDataStoreEntityForChoiceIndex expression rule returns the data store entity for a document type. Basically, given an index, it returns the data store entity that matches the index in an array of data store entities. It is used to dynamically invoke the correct data store entity when writing or querying the document data.

In order for it to return the data store entity for the new type of document, you will need to add the new data store entity constant to the expression rule.

To add the new data store entity to the expression rule:

  1. Open the DU_returnDataStoreEntityForChoiceIndex expression rule.
  2. In the choose() function, add the new constant for the data store entity as the last item in the array.
  3. Click SAVE CHANGES.

EXAMPLE

To update the DU_returnDataStoreEntityForChoiceIndex expression rule for the medical device license document type, you would add the cons!DU_MEDICAL_DEVICE_LICENSE_DSE constant as the last item in the choose() function.

1
2
3
4
5
6
7
choose(
  ri!choiceIndex,
  cons!DU_INVOICE_DSE,
  cons!DU_PURCHASE_ORDER_DSE,
  cons!DU_CLAIM_DSE`,`
  `cons!DU_LICENSE_DSE`
)

Step 6: Update IDP configuration

Before you upload any documents to IDP, you will need to configure the application to use it.

IDP provides a method for updating the configuration of the application in the Configure tab. See Configuring IDP page for instructions on how to edit the configuration of the application. You will need to activate the new document type and upload example documents of the new document types to train the classification machine learning model.

After the training is complete, you can start processing documents of the new type.

Do not upload documents for your new document type for processing in IDP until after training is complete. The AI classification model will perform poorly on these document types and users will not be able to correct the classification until the training is finished.

Open in Github Built: Thu, Oct 14, 2021 (02:43:30 PM)

On This Page

FEEDBACK