Configuring IDP

Intelligent Document Processing

Dive deeper into IDP and learn how to add new document channels and document types. Through this course, you'll also learn how to integrate IDP process models into larger workflows.

Introduction

Before you start using the Intelligent Document Processing (IDP) application, you will need to set up the initial configuration. Additionally, after you have started using IDP, you may want to update this initial configuration.

IDP allows members of the manager security groups to easily configure and update the configuration settings using the Configure tab.

This configuration includes:

After all of these steps are complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to use the IDP site.

Before you begin

Evaluate whether documents are suitable

Because IDP classifies documents, it is important to use document types that can be easily categorized by a user. If a user can't classify a document by reading it, the AI classification model likely won't be able to either. For example, a human reader that is familiar with invoices and purchase orders would be able to easily classify them based on the information they contain.

Also, make sure you understand the documents that work best for document extraction by referring to Appian Document Extraction.

Reset the configuration

If you have previously configured your document channel with dummy documents or document types, we recommend having the application administrator reset the configuration.

To reset the configuration:

  1. In the dudocchannel database table, for the document channel you are configuring, update the following values:
    • modelid: NULL
    • modeltrainedon: NULL
    • numdocsfortraining: 0
    • invalidtypeincludedinmodel: 0
  2. In the dudocunderstanding database table, delete all of the rows where the channelid matches the channelid of your document channel.

Ensure the latest version of IDP is installed

Before you get started, we recommend making sure the latest version of the IDP application is installed.

The latest version of IDP is version 1.6.

Compare the latest version with the application version displayed in the ABOUT tab. If the installed version is behind the latest version, contact your application administrator about installing the latest version.

IDP application version number

Set up document types

By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type. If you need to add a new document channel, see Adding a Document Channel.

Step 1: Start the Configuration workflow

Whether you are configuring the application for the first time, or updating the configuration settings, you will access the configuration settings on the CONFIGURE tab.

Keep in mind that only system administrators and members of certain groups have access to the CONFIGURE tab. If you don't see the tab, contact the application administrators. See the Groups Reference Page for more information about what actions the different user groups can take in IDP.

configure_idp_configure_tab

Whether you are running the configuration for the first time, or editing an existing configuration, you will see the current configuration settings in a grid. To edit the configuration, click the edit icon configure_idp_edit_icon.png in the right-most column of the grid.

configure_idp_configure_process

Step 2: Select automation services

The next step in configuring IDP is to select whether you need to process multiple document types and need IDP to classify them, or if you only need to extract information from one document type.

If you only need to process one document type, you can skip training the classification model by selecting the extraction-only workflow. You will be able to start processing documents as soon as the configuration is complete, rather than waiting for the classification model to be trained.

To select the automation services:

  1. If you want IDP to automatically classify documents uploaded in this channel, select Classification and Extraction. Note this automation service is available only if you have Google's AutoML service enabled for IDP.
  2. If you want to pre-classify or manually classify documents uploaded in this channel, select Extraction.

Classification and Extraction is disabled if you don't use Google's AutoML service. If you don't use Google's AutoML service, all IDP document channels will use Extraction as the automation service.

Select automation services

Step 3: Select document types

Now you will select the document types that you want to process.

By default, the application includes Invoice, Purchase Order, Claim, and Receipt document types. If you need to extract more or fewer fields than what is available out of the box for these document types, see Modifying Fields for Document Types. If you need to extract data from different document types than what is provided out of the box, see Adding a Document Type.

To select the document types:

  1. Click the document type or types that you want to process. Hover over the information symbol configure_idp_info_icon to view the fields that each document type extracts.
  2. Click NEXT.

configure_idp_select_document_types

Step 4a: (Classification services only) Upload example documents

If you chose Classification and Extraction in Step 2, you will need upload example documents to train the AI classification model to learn the characteristics that are common for each document type. Keep in mind that this process only trains the classification model. It does not affect the document extraction.

The documents you upload must be ZIP files containing PDFs.

When you are collecting documents to upload for training, choose a representative set of the actual documents that you will want to process. The better this sample set matches your actual documents, the better the classification model will perform.

If you retrain the classification model, you will need a new set of documents for the training. If documents are the same or too similar to documents that were previously uploaded, the process model might encounter an error during the training and time out.

To upload your example documents:

  1. For each document type, prepare a ZIP file that contains example PDFs of the document type. The model predictions will be more accurate if you follow these guidelines:
    • Upload documents that are representative of the actual documents you will be using.
    • Upload as many documents as possible. It is important to upload at least 10 documents to avoid an exception during the classification training. The more documents you include, the more accurate the model will be.
    • Make sure the documents are sufficiently different from each other. Otherwise they may be treated as duplicates which could possibly cause an exception in the model training.
    • Make sure the documents have not been uploaded for training that document type in the past.
  2. Click UPLOAD, then choose the ZIP file for that document type.
  3. Check Has Structured Data? if the example documents contain information in the same fields in the same place. Structured data helps train the feature and improve recognition.
  4. Choose which Extraction Vendor to use for each document type: Appian or Google.
    • Configure the optional Processor Id field when you select Google as the Extraction Vendor. This field lets you use Google's Document AI API verions v1beta3 when extracting data from documents. After you set up a Form Parser processor in your Google Cloud Platform, paste the processor ID into this field. If you leave this field blank, documents are processed using the existing v1beta2 functionality. This field isn't available if you choose Appian as the Extraction Vendor.

    If you upgrade to IDP 1.5 from an older version, document types without an extraction vendor set will be presumed to be Google and the processor id field will be available.

  5. Click NEXT.

configure-idp-channel-1.5.png

Step 4b: (Extraction services only) Configure new document types

This step applies only if you're not using Google's AutoML service with IDP. If you choose Extraction in Step 2, you won't need to upload example documents to train the AI classification model. However, you'll need to configure a few settings:

  1. Check Has Structured Data? if the example documents contain information in the same fields in the same place. Structured data helps train the feature and improve recognition.
  2. Choose which Extraction Vendor to use for each document type: Appian or Google.
  3. Click NEXT.

configure-doc-types.png

Step 5: Set preferences

The final step is to set the preferences for your configuration. Note that if you chose Extraction in Step 2, some fields in the Set Preferences page will not display since they don't affect extraction-only workflows.

After you set these preferences, you can start the training for the AI classification model.

To determine an appropriate confidence threshold, it may be helpful to understand a couple of concepts.

A confidence interval is a range of plausible values for an unknown parameter.

The confidence interval has a confidence level that the true parameter is in the proposed range.

If a prediction has a 95% confidence level, then it has a 95% probability of containing the true parameter. If the confidence level meets the confidence threshold, then the document will be auto-classified.

You can configure preferences based on the services you chose for the channel:

Classification and Extraction services preferences

  1. For Classification Confidence Threshold, enter the percentage of confidence that the AI classification model must meet when it makes a prediction. If it doesn't meet this threshold, a task will be created for a user to confirm the classification.
    • Tip: A higher threshold percentage, such as 95%, means fewer documents will be auto-classified incorrectly, but it will also increase the number of documents that must be manually classified.
  2. For Documents for Retraining, enter the number of documents that will need to be classified before triggering the retraining of the AI classification machine learning model. This enables the AI classification model to improve over time. A lower number allows the model to learn more quickly, but also increases the Google Cloud Platform costs. This could cause you to hit your Appian AI or Google account limits more quickly.
  3. For Extraction Confidence Threshold, enter the percentage of confidence that the data must meet when it is extracted from a document. If it doesn't meet this threshold, the value will not be automatically populated. During the reconciliation task, a user will need to provide the value.
    • Tip: As with the classification threshold, a higher confidence percentage will increase auto-extraction accuracy, but it will also increase the number of fields that need to be populated by the user.
  4. Select users to add to the Managers, Reconciliation Desk Members, Editors, and Viewers groups. See the Groups Reference Page for more information on the access level that each group provides.
  5. Click TRAIN MODEL.

configure_idp_set_preferences

Classification training begins and can take several hours. You will receive an email when the training is complete. Once the training is complete, users can start to classify, extract, and reconcile documents. See the User Guide for instructions on how to get started using the application. If there is an issue with the classification model training, contact your application administrator who can follow the troubleshooting steps.

Extraction services preferences

  1. For Extraction Confidence Threshold, enter the percentage of confidence that the data must meet when it is extracted from a document. If it doesn't meet this threshold, the value won't be automatically populated. A user will need to provide the value during the reconciliation task.
  2. Select users to add to the Managers, Reconciliation Desk Members, Editors, and Viewers groups. See the Groups Reference Page for more information on the access level that each group provides.
  3. Click Submit.

configure_idp_set_preferences-extraction

Users can start uploading documents right away.

Step 6: Configure automated validations (optional)

If you plan to process a high volume of documents and want to save time, you can configure IDP to automatically validate extractions and bypass the reconciliation step for certain fields. Users can still review document extractions if the information wasn't successfully extracted.

You'll need to complete the configuration workflow for a channel before configuring automated validations for certain document fields.

To configure automated validations:

  1. Go to the CONFIGURE tab.
  2. In the Document Channels list, find the channel you want to use automated validations.
  3. In that channel's row, select the pencil icon.
  4. Click Manage field validations.
  5. Click NEXT.

On the Manage Field Validations page, you can enable IDP to automatically validate certain fields in your documents. This speeds up document processing time by eliminating the need for reconciliation. If you're confident in IDP's ability to detect and extract information in certain fields, you can automate this validation.

idp-manage-field-validations

To add automated validations:

  1. Click the Document Type for which you want to set up automated validations.
  2. On the right side of the screen, select the Validation Type in the drop-down menu. There are three options:
    1. Manual Review: Create a reconciliation task for a user to confirm the data extracted from the document is correct.
    2. Required Fields: Choose whether a field value is required to proceed with automatic validation. When you indicate a field is required, IDP must detect and extract data from that field to automatically validate the document. In other words, this field can't be empty. When you select Required Fields for the Validation Type, the document type's primitive fields appear. Fields from nested child CDTs won't appear. Check the box for fields that are required for automated validation. If a field is marked as required, it must be successfully extracted for IDP to complete processing. A reconciliation task is triggered if any required field isn't present or isn't successfully extracted.
    3. Custom: Choose from custom validation rules available in your IDP application. You can configure custom validation rules in Appian Designer.
  3. Click UPDATE to save your changes.

To edit automated validations:

  1. Click the Document Type for which you want to set up automated validations.
  2. On the right side of the screen, select the Validation Type in the drop-down menu.
  3. Update the existing configurations.
  4. Click UPDATE to save your changes.

If you want to remove validations for all active document types in the channel, click Set all to Manual Review.

The Documents tab shows documents that skip the reconciliation task when automated validations are properly set up and configured. Straight Through Processed appears in the Reconciled By column for these documents.

Custom validations

You can configure custom validations in IDP using expression rules. Custom validations can be useful if you want documents with certain data to be automatically validated. For example, you can use custom validation types to specify that invoices with balances over a certain amount shouldn't be validated automatically. If invoices are below this amount, IDP can still automatically validate the extracted information.

You'll create custom validation types using expression rules in Appian Designer. You'll then need to update a constant to include these expression rules so that the options appear as custom validation types on the Manage Field Validations page.

To create a custom validation type:

  1. Open the Intelligent Document Procession (IDP) application in the Appian Designer.
  2. Click NEW.
  3. Select Expression Rule.
  4. Type a Name for the expression rule object.
  5. In the expression designer, add a rule input with the Name data and Type Any Type.
  6. Write the expression you want to use to validate document fields. Your expression should evaluate fields in the document type CDT passed in through ri!data. The evaluated expression must return a single true or false result. Documents that evaluate to true will be processed automatically when using this custom validation type. Documents that evaluate to false will require manual review.
  7. Test your expression and click SAVE CHANGES when you're finished.

Here are some example expression validations:

Example 1

1
2
3
4
5
6
7
8
9
10
11
/*Example custom validation on invoice document type*/
/*This expression returns true as long as both fields have values,*/
/*invoice id is shorter than 8 characters,*/
/*and the invoice date is later than January 10th, 2021*/

and(
  not(rule!DU_checkIsNullorEmpty(index(ri!data,"invoiceId",null))),
  not(rule!DU_checkIsNullorEmpty(index(ri!data,"invoiceDate",null))),
  length(index(ri!data,"invoiceId",{}))<8,
  index(ri!data,"invoiceDate",null)> date(2021,01,10)
)

Example 2

1
2
3
4
5
6
/*Example custom validation that checks that 3 required fields are all populated*/
and(
  not(rule!DU_checkIsNullorEmpty(index(ri!data,"field1",null))),
  not(rule!DU_checkIsNullorEmpty(index(ri!data,"field2",null))),
  not(rule!DU_checkIsNullorEmpty(index(ri!data,"field3",null)))
)

For the validation type to appear in the configuration menu:

  1. Open the Intelligent Document Procession (IDP) application in the Appian Designer.
  2. Open the DU_CUSTOM_VALIDATION_TYPES constant.
  3. In the Values field, add user friendly text to describe the custom validation you created using the expression rule above.
  4. Back in the Appian Designer, open the DU_returnCustomValidationResultForCustomValidationType constant.
  5. In the expression designer, update the choose() function to include the expression rule you created above. Note the order of the expressions included in the choose() function since it needs to match the order of the labels in the DU_CUSTOM_VALIDATION_TYPES constant.

    custom-validation-type-pick

  6. Click SAVE CHANGES.

To test that your custom validation type appears properly:

  1. Open the Intelligent Document Processing application.
  2. Go to the Configure page.
  3. Click the channel's pencil icon to manage configurations.
  4. Click Manage Field Validations.
  5. Select a document type and choose Custom for its Validation Type.
  6. Confirm that your custom validation type appears in the menu, using the label you want.

If you have the Manage Field Validations page open in another tab or browser while you edit the expression or constant, you'll need to refresh the page for your changes to appear.

Deactivate a document channel

You can deactivate document channels you no longer need. Deactivating a channel can help you control costs related to models using Google's AutoML service, which remain active even if you're not currently using them. When you deactivate a channel, you won't be able to change its configuration settings until you reactivate it.

To deactivate a document channel:

  1. Go to the CONFIGURE tab.
  2. Locate the channel you want to deactivate.
  3. In that model's row, click the Deactivate icon in the right-most column.

When a document channel is deactivated, that channel's document types no longer appear in the metrics report.

Reactivate a document channel

You can reactivate a document channel if you need to use it at a later time. When you reactivate a previously deactivated channel, you'll have the option to configure it.

To reactivate a document channel:

  1. Go to the CONFIGURE tab.
  2. Locate the channel you want to reactivate.
  3. In that model's row, click the Reactivate icon in the right-most column.

You'll be asked to choose a channel workflow type if you choose to configure the reactivated channel. If you choose the same channel workflow that the channel used prior to deactivation, the Google AutoML model is not impacted. However, if you make a different selection, the Google AutoML model is impacted in these ways:

Previous selection New selection Model impact
Extraction and Classification Extraction only Existing AutoML model deleted, with a warning to let you know.
Extraction only Classification and Extraction A new Google AutoML model is created for the channel.

Troubleshooting the classification model training

If you receive an email notification about an error with the classification model training, refer to the table below to troubleshoot the cause of the error.

After you have determined the cause of the error:

  1. Cancel the DU Configure Document Understanding process instance.
    • Note: It is not possible to manage the configuration on the configure tab until the process instance is completed or canceled.
  2. Have the user who initiated the configuration repeat the document channel configuration.
Issue Resolution
There was an error when uploading the dataset CSV file.
  • Make sure the service account has the correct permissions.
  • Make sure the DU_AUTOML_CLOUD_BUCKET constant reflects the bucket you're using.
There was an error when creating the training dataset.
There was an error when importing the training dataset.
  • Make sure the service account has the correct permissions.
  • Make sure there are no duplicates in your training dataset. This includes the ZIP file uploaded for training and documents uploaded for processing.
  • Make sure the bucket you're using matches the DU_STORAGE_CLOUD_BUCKET constant and is a persistent storage bucket.
There was an error when deploying the model.
  • Make sure the service account has the correct permissions.
  • Make sure the region in the DU_CLOUD_REGION constant is correct. For storage buckets in the US region, it should be us-central1. For storage buckets in the EU region, it should be either europe-north1, europe-west1, europe-west2, europe-west3, or europe-west4 depending on your storage bucket set up.
  • Make sure the region is supported by Google Cloud.
Open in Github Built: Mon, Sep 27, 2021 (12:29:11 PM)

On This Page

FEEDBACK