Estimated time to complete this tutorial: 1 hour
User experience level: Beginner
In this tutorial, you'll build an Appian process that:
This process model relies on a series of nodes that leverage artificial intelligence (AI) to map fields from a document to fields in a record type. Once the data is extracted, users can confirm or correct the results using a simple task interface. As you test your process model and reconcile results, the extraction will become smarter and more accurate over time.
This page will walk you through how to create your own document extraction process in three parts:
Tip: Check out the Document Classification Tutorial to build a process that classifies documents in addition to extraction.
Acme Logistics is a shipping and receiving company that manages inventory for its customers. In addition to physical items, Acme has to manage and act on documents such as invoices. Acme wants to create an Appian process to extract data from invoices, which customers and vendors submit through Acme's website. Acme also wants to save this data as records.
Before you build the process, you'll build all of the supporting design objects, starting with the AI skill.
This tutorial assumes you have an Appian application created already. We'll walk you through creating each of the design objects you need to automate document classification.
Tip: Objects in this tutorial use the AL
prefix. If you're creating objects in an application that uses a different prefix, use your application's prefix in new object names.
Before the AI skill can serve its purpose, it needs to learn a lot about the documents your business encounters. One of your first steps should be to build a complete and representative dataset to train the model. The model can only learn from the documents you provide to it, so it's important to have a large number and variety of realistic examples.
We've provided sample Invoices for you to use in this tutorial. Download these files to your computer, since you'll use them to setup and train the AI skill. Unzip the compressed folders, as you'll need to upload the documents individually and not as a ZIP file.
This tutorial is designed to be used with Appian 23.2 and later.
The Document Extraction AI Skill takes a document as input and uses machine learning to extract data from that document.
To create the AI Skill:
Configure the following properties:
Property | Value |
---|---|
Name | AL_ExtractInvoice |
Description (Optional) | AI skill to extract data from invoices Acme receives |
To get started, you'll create a model and add examples of a typical invoice.
name
.Text
for all seven fields.email
phone
address
invoiceNumber
date
total
items
and select type Table
.items
, add four fields with the default data type of Text
.quantity
description
unitPrice
amount
Once you define your document structure, you'll indicate where those fields appear in the sample documents. This process is called labeling and it helps the model learn more about where these fields appear in your documents.
name
field.Repeat steps 1 and 2 until you've labeled values for all fields in your invoice structure. You won't label the fields in the items
table, but that information will still be extracted.
Tip: Regularly click SAVE CHANGES to save your progress.
The final step of creating a document extraction AI skill asks you to review the fields you've labeled in the sample documents. The more fields you label, the more the model can learn about your fields. This is what makes the model smarter and better at extracting data of interest.
Each field will need labels in at least half of the documents you uploaded. If you haven't labeled enough fields in the set of documents you uploaded, you'll see a message encouraging you to add more files and fields.
After your review, click TRAIN MODEL. Training may take a few minutes. While you wait, proceed to create the additional design objects in your process.
When extracting data, Appian will identify key-value pairs from the document and map them to the fields of your desired data object (a CDT or record type). Your data object should be constructed to reflect the data available in your document. Therefore, it's important that your fields match the data that will be extracted from your document.
If your document contains field names and a table, like an invoice document that contains a table of items, you'll ultimately create two data objects: one that represents the document, and one that represents the table.
In this step, you're creating two record types to store the extracted data: one record type for the invoice data, and one record type for the data contained in the items
tables commonly found in invoices Acme receives.
To create a record type for your document, you'll want to create all the form fields as fields in your record type.
In Create Record Type, configure the following properties:
Property | Value |
---|---|
Name | AL Invoice |
Display Name (Plural) | AL Invoices |
Description | A record type to store data on invoices sent to Acme Logistics. |
On the Create Data Model page, keep the default settings for the following fields:
Field | Type |
---|---|
id |
Number (Integer) |
createdBy |
User |
createdOn |
Date and Time |
modifiedBy |
User |
modifiedOn |
Date and Time |
Click NEW FIELD to create seven new fields in the data model:
Field | Type |
---|---|
invoiceNumber |
Number (Integer) |
name |
Text |
email |
Text |
phone |
Text |
address |
Text |
date |
Text |
total |
Text |
Tip: Notice that these fields match the top-level fields in the AI skill you created earlier.
To extract and save table data from your document, you need to create a separate record type to represent the table. After, you will create a relationship to reference this new record type from the record type representing the document.
To create a record type for the table:
In Create Record Type, configure the following properties:
Property | Value |
---|---|
Name | AL Invoice Item |
Display Name (Plural) | Invoice Items |
Description | A record type to store table data on invoices sent to Acme Logistics. |
On the Create Data Model page, keep the default settings for the following fields:
Field | Type |
---|---|
id |
Number (Integer) |
createdBy |
User |
createdOn |
Date and Time |
modifiedBy |
User |
modifiedOn |
Date and Time |
Click NEW FIELD to create five new fields in the data model:
Field | Type |
---|---|
invoiceId |
Number (Integer) |
quantity |
Text |
description |
Text |
unitPrice |
Text |
amount |
Text |
Tip: Notice that these fields match the table fields in the AI skill you created earlier. invoiceId
is a separate field from the id
field generated automatically by the record type wizard.
Now that you've set up a record type for both the invoice and the table, you'll need to add record type relationships to associate them. For document extraction data to write to both records, you'll need to set up relationships in both record types.
AL Invoice
record type.AL Invoice Item
.items
. Note that this matches the table field you added in the AI skill document structure.For Common Fields, select the following:
Record Type | Field |
AL Invoice | id - Number (Integer) |
AL Invoice Item | invoiceId - Number (Integer) |
Repeat the process in the AL Invoice Item
record type:
AL Invoice Item
record type.AL Invoice
.For Common Fields, select the following:
Record Type | Field |
AL Invoice Item | invoiceId - Number (Integer) |
AL Invoice | id - Number (Integer) |
To keep things organized as users upload invoices, create a folder in your application to store the document files.
Click NEW > Folder.
Property | Value |
---|---|
Type | Document Folder |
Name | AL Uploaded Documents |
Description | Folder containing documents submitted via Acme's website. |
Parent Folder | AL Knowledge Center |
To reference the folder in your interface, you'll need to create a constant.
Click NEW > Constant.
Property | Value |
---|---|
Name | AL_UPLOADED_DOCUMENTS |
Description | Constant referencing the AL Uploaded Documents folder. |
Type | Folder |
Value | AL Uploaded Documents |
Acme's customers submit a form with invoices attached. You can create an interface to collect and save all of the necessary information, including documents.
Configure the following fields:
Property | Value |
---|---|
Name | AL_IntakeForm |
Description (Optional) | Interface to allow vendors to upload documents. |
Save In | Select the Rules & Interfaces folder in your application. |
In the Rule Inputs pane, click New Rule Input and configure the following parameters:
Property | Value |
---|---|
Name | document |
Type | Document (Appian data type) |
In the COMPONENT CONFIGURATION, configure the following:
Parameter | Value |
---|---|
Label Position | Hidden |
Display Value | Click Edit as Expression and enter: "Thank you for contacting Acme! Upload your document and we'll be in touch." |
Read-only | Selected |
In the COMPONENT CONFIGURATION, configure the following:
Parameter | Value |
---|---|
Target Folder | AL_UPLOADED_DOCUMENTS |
Selected Files | ri!document |
Save Files To | ri!document |
Later, you'll add a node in the process model for the analyst to reconcile the extracted data. To assign that task, you'll first need to create a constant referencing the analyst.
In Create Constant, configure the following properties:
Property | Value |
---|---|
Create from scratch | Leave selected |
Name | AL_ANALYST |
Description | Constant pointing to the analyst at Acme Logistics. |
Type | User |
Value | Select your username. |
With your record types and AI skill in place, you can now start building your end-to-end process.
The following instructions walk you through how to configure your process model and the three key nodes of a document extraction process.
As you build your process, you have the flexibility to incorporate other design objects and decisions that fit your specific business needs. See some additional process configuration options you can add to your own process model.
To easily pass data throughout your process, you'll want to create process variables that represent your document, extraction ID, and extracted data:
Configure the following properties:
Property | Value |
---|---|
Name | AL Invoice Extraction |
Description | Process to extract invoice data from Acme vendors. |
Create the following process variables:
Name | Type | Value | Parameter? | Required? | Multiple? |
---|---|---|---|---|---|
cancel |
Boolean | Blank | Yes | No | No |
document |
Document | Blank | Yes | No | Yes |
docExtractionId |
Text | Blank | No | No | No |
record |
AL Invoice (Record Type) |
Blank | No | No | No |
The process kicks off when a user submits the start form. Configure the Start node to use the form you created:
AL
.AL_IntakeForm
when it displays in the dropdown list.After defining your process variables, the first node to add to the process is the Extract from Document smart service. This smart service takes a document as input, extracts data using a machine learning model, and returns the extracted data as output.
To configure the smart service:
AL_ExtractInvoice
skill you created earlier.On the Inputs tab, configure the inputs with the following values:
Input | Value |
---|---|
Document | pv!document |
Confidence Threshold | 80 |
On the Outputs tab, configure the outputs with the following values:
Output | Value |
---|---|
Doc Extraction Id | Choose the docExtractionId process variable. |
Extracted Data | Choose the record process variable. |
Confidence Scores | Leave blank. |
The next node you will configure is the Reconcile Doc Extraction Smart Service. This smart service will assign a reconciliation task to a user to confirm or correct the extracted results.
To configure the smart service:
On the Inputs tab, configure the default inputs with the following values:
Input | Value |
---|---|
Doc Extraction Id | Choose the docExtractionId process variable. |
On the Outputs tab, configure the outputs with the following values:
Output | Value |
---|---|
Reconciled Data | Choose the record process variable. |
cons!AL_analyst
.Finally, you'll add another node to write records for the reconciled data.
record
process variable.AL Invoice
will be automatically selected.record
for the Target.It's a best practice to include a pathway in your process model in case the user clicks Cancel on the start form.
To add a cancel flow:
Cancel?
in the Name field.That's it! Your process is set up to extract data. It should look like this, but it may contain additional nodes based on how you customized it:
After creating your process model, run it with a few samples to test the extraction and to see how your auto-extracted results change.
To test the document extraction process created above:
As you test, Appian will use the field names from the data type to find a match. Over time, Appian learns how to map your data to your data type fields from the user interactions with the reconcile interface.
Appian will delete document extraction runs after 30 days, or when the total amount of data surpasses 10 GB. If you attempt to access a run that has been deleted, you will see an error. Appian will not delete the documents you uploaded. Learn more about your document's security.
The process model detailed above provides the basic nodes needed to create an extraction process, but you aren't bound to this model. In fact, the major benefit of creating your own document extraction process is the flexibility to add additional rules or decisions that are specific to your business needs.
There a few ways you can enhance or modify this process, for example:
isSubmit
is true
when the user selects the Submit button on the reconciliation task. Add logic after this smart service to use isSubmit=true()
to trigger an email notification or a confirmation dialog.isException
is true
when the user selects the Mark as Invalid button on the reconciliation task. Add logic after this smart service to use isException=true()
to route to a chained user input task, where the user provides more information.Build a Doc Extraction Process with AI Skill