View this page in the latest version of Appian.

Build a Doc Extraction Process with AI Skill

Estimated time to complete this tutorial: 1 hour

User experience level: Beginner

Overview

In this tutorial, you'll build an Appian process that:

takes a document as input,
uses an AI skill to extract data from that document,
sends a task to someone to reconcile the extracted data, and
save the reconciled data as a new record.

This process model relies on a series of nodes that leverage artificial intelligence (AI) to map fields from a document to fields in a record type. Once the data is extracted, users can confirm or correct the results using a simple task interface. As you test your process model and reconcile results, the extraction will become smarter and more accurate over time.

This page will walk you through how to create your own document extraction process in three parts:

Create a document extraction AI skill and define your document structure.
Create record types and other design objects to map and save your extraction results.
Configure a basic document extraction process model.

Tip: Check out the Document Classification Tutorial to build a process that classifies documents in addition to extraction.

Scenario

Acme Logistics is a shipping and receiving company that manages inventory for its customers. In addition to physical items, Acme has to manage and act on documents such as invoices. Acme wants to create an Appian process to extract data from invoices, which customers and vendors submit through Acme's website. Acme also wants to save this data as records.

Before you build the process, you'll build all of the supporting design objects, starting with the AI skill.

Setup

This tutorial assumes you have an Appian application created already. We'll walk you through creating each of the design objects you need to automate document classification.

Tip: Objects in this tutorial use the AL prefix. If you're creating objects in an application that uses a different prefix, use your application's prefix in new object names.

Gather example documents

Before the AI skill can serve its purpose, it needs to learn a lot about the documents your business encounters. One of your first steps should be to build a complete and representative dataset to train the model. The model can only learn from the documents you provide to it, so it's important to have a large number and variety of realistic examples.

We've provided sample Invoices for you to use in this tutorial. Download these files to your computer, since you'll use them to setup and train the AI skill. Unzip the compressed folders, as you'll need to upload the documents individually and not as a ZIP file.

System requirements

This tutorial is designed to be used with Appian 23.2 and later.

Part 1: Create AI skill

The Document Extraction AI Skill takes a document as input and uses machine learning to extract data from that document.

To create the AI Skill:

In your application, go to the Build view.
In the application toolbar, click NEW > AI Skill.
On the Create AI Skill page, choose Document Extraction.

Configure the following properties:

Property	Value
Name	`AL_ExtractInvoice`
Description (Optional)	`AI skill to extract data from invoices Acme receives`

Click CREATE. The Review AI Skill Security window displays.
Select Viewer permissions for the AL Users group and Administrator permissions for the AL Administrators group.
Click SAVE.

Create a model and add documents

To get started, you'll create a model and add examples of a typical invoice.

In your new AI skill, click CREATE FIRST MODEL.
In Step 1: Provide Documents, choose Semi-structured or Unstructured for the Document Layout.
Click UPLOAD to add training documents.
Browse for and select invoices.
After the documents finish uploading, click NEXT.

Define the invoice structure

In Step 2: Add Fields to Extract, enter the names of the fields that appear on most invoices. Start with name.
Click Add Field six more times, so you have a total of seven fields. Use the default data type of Text for all seven fields.
Enter the following field names:
- email
- phone
- address
- invoiceNumber
- date
- total
Add another field. Name this field items and select type Table.
In the nested table fields that appear below items, add four fields with the default data type of Text.
Enter the following four field names:
- quantity
- description
- unitPrice
- amount
Click NEXT.

Label field data

Once you define your document structure, you'll indicate where those fields appear in the sample documents. This process is called labeling and it helps the model learn more about where these fields appear in your documents.

In the document preview, click and drag your mouse around the company name to create a selection box.
In the dropdown that appears, click the name field.
Repeat steps 1 and 2 until you've labeled values for all fields in your invoice structure. You won't label the fields in the items table, but that information will still be extracted.

Tip: Regularly click SAVE CHANGES to save your progress.
Above the document preview, click NEXT to view the other sample documents and label additional fields.
Click REVIEW to see a summary of how many labels appear for each field.

Review labeled fields and train the model

The final step of creating a document extraction AI skill asks you to review the fields you've labeled in the sample documents. The more fields you label, the more the model can learn about your fields. This is what makes the model smarter and better at extracting data of interest.

Each field will need labels in at least half of the documents you uploaded. If you haven't labeled enough fields in the set of documents you uploaded, you'll see a message encouraging you to add more files and fields.

After your review, click TRAIN MODEL. Training may take a few minutes. While you wait, proceed to create the additional design objects in your process.

Part 2: Create additional design objects

When extracting data, Appian will identify key-value pairs from the document and map them to the fields of your desired data object (a CDT or record type). Your data object should be constructed to reflect the data available in your document. Therefore, it's important that your fields match the data that will be extracted from your document.

If your document contains field names and a table, like an invoice document that contains a table of items, you'll ultimately create two data objects: one that represents the document, and one that represents the table.

In this step, you're creating two record types to store the extracted data: one record type for the invoice data, and one record type for the data contained in the items tables commonly found in invoices Acme receives.

Create a record type for a document

To create a record type for your document, you'll want to create all the form fields as fields in your record type.

In your application, go to the Build view.
Click NEW > Record Type.

In Create Record Type, configure the following properties:

Property	Value
Name	`AL Invoice`
Display Name (Plural)	`AL Invoices`
Description	`A record type to store data on invoices sent to Acme Logistics.`

Click CREATE. The Review Record Type Security window displays.
Select Viewer permissions for the AL Users group and Administrator permissions for the AL Administrators group.
Click SAVE. The record type opens in a new tab.
Click TELL US ABOUT YOUR DATA.
On the Configure Data Source page, click I want to start from scratch.
Click NEXT.
Choose your Data Source where you'd like to create the data.
Click NEXT.

On the Create Data Model page, keep the default settings for the following fields:

Field	Type
`id`	`Number (Integer)`
`createdBy`	`User`
`createdOn`	`Date and Time`
`modifiedBy`	`User`
`modifiedOn`	`Date and Time`

Click NEXT twice to skip adding record type relationships for now. You'll do this in a later step. You can keep any suggested relationships, such as relationships with User records.

Click NEW FIELD to create seven new fields in the data model:

Field	Type
`invoiceNumber`	`Number (Integer)`
`name`	`Text`
`email`	`Text`
`phone`	`Text`
`address`	`Text`
`date`	`Text`
`total`	`Text`

Tip: Notice that these fields match the top-level fields in the AI skill you created earlier.

Keep the Create Table checkbox checked and click SAVE CHANGES.
Click FINISH.

Create a record type for a table

To extract and save table data from your document, you need to create a separate record type to represent the table. After, you will create a relationship to reference this new record type from the record type representing the document.

To create a record type for the table:

In your application, go to the Build view.
Click NEW > Record Type.

In Create Record Type, configure the following properties:

Property	Value
Name	`AL Invoice Item`
Display Name (Plural)	`Invoice Items`
Description	`A record type to store table data on invoices sent to Acme Logistics.`

Click CREATE. The Review Record Type Security window displays.
Select Viewer permissions for the AL Users group and Administrator permissions for the AL Administrators group.
Click SAVE. The record type opens in a new tab.
Click TELL US ABOUT YOUR DATA.
On the Configure Data Source page, click I want to start from scratch.
Click NEXT.
Choose your Data Source where you'd like to create the data.
Click NEXT.

On the Create Data Model page, keep the default settings for the following fields:

Field	Type
`id`	`Number (Integer)`
`createdBy`	`User`
`createdOn`	`Date and Time`
`modifiedBy`	`User`
`modifiedOn`	`Date and Time`

Click NEXT twice to skip adding record type relationships for now.

Click NEW FIELD to create five new fields in the data model:

Field	Type
`invoiceId`	`Number (Integer)`
`quantity`	`Text`
`description`	`Text`
`unitPrice`	`Text`
`amount`	`Text`

Tip: Notice that these fields match the table fields in the AI skill you created earlier. invoiceId is a separate field from the id field generated automatically by the record type wizard.

Keep the Create Table checkbox checked and click SAVE CHANGES.
Click FINISH.

Record type to store tabulated items in an invoice

Add record type relationships

Now that you've set up a record type for both the invoice and the table, you'll need to add record type relationships to associate them. For document extraction data to write to both records, you'll need to set up relationships in both record types.

In your application, go to the Build view.
In the list of design objects, open the AL Invoice record type.
Click ADD RELATIONSHIP.
In the Related Record Type field, select AL Invoice Item.
Click NEXT.
For the Relationship Name, enter items. Note that this matches the table field you added in the AI skill document structure.
For the Relationship Type, choose One to Many.

For Common Fields, select the following:

Record Type	Field
AL Invoice	`id - Number (Integer)`
AL Invoice Item	`invoiceId - Number (Integer)`

Set up a record type relationship between invoices and invoice table items

Click ADD.
Click SAVE CHANGES.

Repeat the process in the AL Invoice Item record type:

In your application, go to the Build view.
In the list of design objects, open the AL Invoice Item record type.
Click ADD RELATIONSHIP.
In the Related Record Type field, select AL Invoice.
Click NEXT.
For the Relationship Type, choose Many to One.

For Common Fields, select the following:

Record Type	Field
AL Invoice Item	`invoiceId - Number (Integer)`
AL Invoice	`id - Number (Integer)`

Click ADD.
Click SAVE CHANGES.

Create a document folder

To keep things organized as users upload invoices, create a folder in your application to store the document files.

In your application, go to the Build view.

Click NEW > Folder.

Property	Value
Type	Document Folder
Name	`AL Uploaded Documents`
Description	`Folder containing documents submitted via Acme's website.`
Parent Folder	AL Knowledge Center

Click CREATE.

Create a constant for the document folder

To reference the folder in your interface, you'll need to create a constant.

In your application, go to the Build view.

Click NEW > Constant.

Property	Value
Name	`AL_UPLOADED_DOCUMENTS`
Description	`Constant referencing the AL Uploaded Documents folder.`
Type	`Folder`
Value	`AL Uploaded Documents`

Click CREATE.

Add a start form

Acme's customers submit a form with invoices attached. You can create an interface to collect and save all of the necessary information, including documents.

In your application, go to the Build view.
Click NEW > Interface.

Configure the following fields:

Property	Value
Name	`AL_IntakeForm`
Description (Optional)	`Interface to allow vendors to upload documents.`
Save In	Select the Rules & Interfaces folder in your application.

Click CREATE.

In the Rule Inputs pane, click New Rule Input and configure the following parameters:

Property	Value
Name	`document`
Type	Document (Appian data type)

Click CREATE.
In the templates panel, find the FORMS section.
Click One Column Form.
Double-click the title.
Enter Submit to Acme Logistics.
In the top Section Layout component in the editor, drag and drop a TEXT field from the component palette.
Click the Text component.

In the COMPONENT CONFIGURATION, configure the following:

Parameter	Value
Label Position	Hidden
Display Value	Click Edit as Expression and enter: `"Thank you for contacting Acme! Upload your document and we'll be in touch."`
Read-only	Selected

Click the bottom Section layout.
In the COMPONENT CONFIGURATION, delete the default text in the Label field.
In the bottom Section Layout component in the editor, drag and drop a FILE UPLOAD field from the component palette.
Click the File Upload.

In the COMPONENT CONFIGURATION, configure the following:

Parameter	Value
Target Folder	`AL_UPLOADED_DOCUMENTS`
Selected Files	`ri!document`
Save Files To	`ri!document`

Click OK.
Click SAVE CHANGES.

Create a constant for the analyst

Later, you'll add a node in the process model for the analyst to reconcile the extracted data. To assign that task, you'll first need to create a constant referencing the analyst.

In your application, go to the Build view.
Click NEW > Constant.

In Create Constant, configure the following properties:

Property	Value
Create from scratch	Leave selected
Name	`AL_ANALYST`
Description	`Constant pointing to the analyst at Acme Logistics.`
Type	`User`
Value	Select your username.

Click CREATE.

Part 3: Build the document extraction process

With your record types and AI skill in place, you can now start building your end-to-end process.

The following instructions walk you through how to configure your process model and the three key nodes of a document extraction process.

As you build your process, you have the flexibility to incorporate other design objects and decisions that fit your specific business needs. See some additional process configuration options you can add to your own process model.

Create a process model

To easily pass data throughout your process, you'll want to create process variables that represent your document, extraction ID, and extracted data:

In the application toolbar, click NEW > Process Model.

Configure the following properties:

Property	Value
Name	`AL Invoice Extraction`
Description	`Process to extract invoice data from Acme vendors.`

Click CREATE.
Configure security and click SAVE.
From the File menu, click Properties.
In the Process Model Properties dialog, go to the Variables tab.

Create the following process variables:

Name	Type	Value	Parameter?	Required?	Multiple?
`cancel`	Boolean	Blank	Yes	No	No
`document`	Document	Blank	Yes	No	Yes
`docExtractionId`	Text	Blank	No	No	No
`record`	`AL Invoice (Record Type)`	Blank	No	No	No

Click OK.

Configure the Start Form

The process kicks off when a user submits the start form. Configure the Start node to use the form you created:

In the Appian Process Modeler page, click File > Properties in the menu bar. The process model properties window displays. By default, the General tab is active.
Go to the Process Start Form tab.
In the Interface text box, enter AL.
Select AL_IntakeForm when it displays in the dropdown list.
Click Yes to create process variables based on rule inputs from the submission form, even though you created all of the variables in the previous step.
Click OK to return to the process model.
In the menu bar, click File > Save & Publish.

Configure the Extract from Document Smart Service

After defining your process variables, the first node to add to the process is the Extract from Document smart service. This smart service takes a document as input, extracts data using a machine learning model, and returns the extracted data as output.

To configure the smart service:

From the Palette, drag in a Extract from Document smart service.
Open the Extract from Document smart service.
Select the Setup tab.
Under Select AI Skill, select the AL_ExtractInvoice skill you created earlier.
Select the Data tab.

On the Inputs tab, configure the inputs with the following values:

Input	Value
Document	`pv!document`
Confidence Threshold	`80`

On the Outputs tab, configure the outputs with the following values:

Output	Value
Doc Extraction Id	Choose the `docExtractionId` process variable.
Extracted Data	Choose the `record` process variable.
Confidence Scores	Leave blank.

Click OK.

Extract from document smart service configuration

Configure the Reconcile Doc Extraction Smart Service

The next node you will configure is the Reconcile Doc Extraction Smart Service. This smart service will assign a reconciliation task to a user to confirm or correct the extracted results.

To configure the smart service:

Drag in a Reconcile Doc Extraction Smart Service node.
Open the Reconcile Doc Extraction Smart Service.
Select the Data tab.
On the Inputs tab, configure the default inputs with the following values:

Input Value

Doc Extraction Id Choose the docExtractionId process variable.
On the Outputs tab, configure the outputs with the following values:

Output Value

Reconciled Data Choose the record process variable.
Select the Assignment tab.
Open the expression editor next to the the Assign to the following: field.
In the Expression window, enter cons!AL_analyst.
Click SAVE AND CLOSE.
Click OK.

Output	Value
Reconciled Data	Choose the `record` process variable.

Reconcile document extraction smart service configuration

Update the invoice record with reconciled data

Finally, you'll add another node to write records for the reconciled data.

In the Smart Service Search bar, search for Write Records. You can also find this in Automation Smart Services > Data Services.
Click and drag from the search results to the flow connector between the Reconcile Doc Extraction and End nodes.
Double-click the Write Records node.
Go to the Setup tab.
In the Record Input field, select the record process variable.
In the Record Type field, AL Invoice will be automatically selected.
Go to the Data tab.
Click the Outputs tab.
Click the Records Updated output.
In the Result Properties pane, select record for the Target.
Click OK.

Add a cancel flow

It's a best practice to include a pathway in your process model in case the user clicks Cancel on the start form.

To add a cancel flow:

In the Smart Service palette, locate the XOR gateway.
Click and drag the XOR gateway to the flow connector between the Start Node and Extract from Document.
Add a connector from the XOR gateway to the End Node.
Double-click the XOR gateway to configure it.
On the General tab, enter Cancel? in the Name field.
Go to the Decision tab.
Click NEW CONDITION.
Click Open the Expression Editor next to the Condition field.
Expand the list of Process Variables.
Click cancel.
Click SAVE AND CLOSE.
In the Result field, select End Node.
Click OK.

That's it! Your process is set up to extract data. It should look like this, but it may contain additional nodes based on how you customized it:

Test your process

After creating your process model, run it with a few samples to test the extraction and to see how your auto-extracted results change.

To test the document extraction process created above:

Go to your process model.
From the File menu, click Start Process for Debugging.
Use the form to upload an invoice document.
After the Extract from Document node completes, you should receive a task to reconcile the extracted data.
Return to the process model monitoring view to observe the results.

As you test, Appian will use the field names from the data type to find a match. Over time, Appian learns how to map your data to your data type fields from the user interactions with the reconcile interface.

Appian will delete document extraction runs after 30 days, or when the total amount of data surpasses 10 GB. If you attempt to access a run that has been deleted, you will see an error. Appian will not delete the documents you uploaded. Learn more about your document's security.

Additional process configuration options

The process model detailed above provides the basic nodes needed to create an extraction process, but you aren't bound to this model. In fact, the major benefit of creating your own document extraction process is the flexibility to add additional rules or decisions that are specific to your business needs.

There a few ways you can enhance or modify this process, for example:

Dynamically skip reconciliation: After the Extract from Document smart service, you can use a Script Task to evaluate the extracted data. For example, you may want to validate that all fields were extracted, or that the extracted data meets your business validations. If your validations are met, you can use an XOR node to skip the Reconcile Doc Extraction smart service and write the extracted data directly to your record type, without any human review.
Use the two optional outputs in the Reconcile Doc Extraction Smart Service to route the process model after reconciliation:
- isSubmit is true when the user selects the Submit button on the reconciliation task. Add logic after this smart service to use isSubmit=true() to trigger an email notification or a confirmation dialog.
- isException is true when the user selects the Mark as Invalid button on the reconciliation task. Add logic after this smart service to use isException=true() to route to a chained user input task, where the user provides more information.
Add conditionality to the Reconcile Doc Extraction Smart Service to determine who should be assigned the reconciliation task based on certain business criteria or rules.

Feedback

Was this page helpful?

Build a Doc Extraction Process with AI Skill

OverviewCopy link to clipboard

ScenarioCopy link to clipboard

SetupCopy link to clipboard

Gather example documentsCopy link to clipboard

System requirementsCopy link to clipboard

Part 1: Create AI skillCopy link to clipboard

Create a model and add documentsCopy link to clipboard

Define the invoice structureCopy link to clipboard

Label field dataCopy link to clipboard

Review labeled fields and train the modelCopy link to clipboard

Part 2: Create additional design objectsCopy link to clipboard

Create a record type for a documentCopy link to clipboard

Create a record type for a tableCopy link to clipboard

Add record type relationshipsCopy link to clipboard

Create a document folderCopy link to clipboard

Create a constant for the document folderCopy link to clipboard

Add a start formCopy link to clipboard

Create a constant for the analystCopy link to clipboard

Part 3: Build the document extraction processCopy link to clipboard

Create a process modelCopy link to clipboard

Configure the Start FormCopy link to clipboard

Configure the Extract from Document Smart ServiceCopy link to clipboard

Configure the Reconcile Doc Extraction Smart ServiceCopy link to clipboard

Update the invoice record with reconciled dataCopy link to clipboard

Add a cancel flowCopy link to clipboard

Test your processCopy link to clipboard

Additional process configuration optionsCopy link to clipboard

FeedbackCopy link to clipboard

Overview

Scenario

Setup

Gather example documents

System requirements

Part 1: Create AI skill

Create a model and add documents

Define the invoice structure

Label field data

Review labeled fields and train the model

Part 2: Create additional design objects

Create a record type for a document

Create a record type for a table

Add record type relationships

Create a document folder

Create a constant for the document folder

Add a start form

Create a constant for the analyst

Part 3: Build the document extraction process

Create a process model

Configure the Start Form

Configure the Extract from Document Smart Service

Configure the Reconcile Doc Extraction Smart Service

Update the invoice record with reconciled data

Add a cancel flow

Test your process

Additional process configuration options

Feedback