Free cookie consent management tool by TermsFeed Build a Doc Extraction Process with AI Skill [AI Capabilities]
Build a Doc Extraction Process with AI Skill

Estimated time to complete this tutorial: 1 hour

User experience level: Beginner

Overview

In this tutorial, you'll build an Appian process that:

  1. takes a document as input,
  2. uses an AI skill to extract data from that document,
  3. sends a task to someone to reconcile the extracted data, and
  4. save the reconciled data as a new record.

This process model relies on a series of nodes that leverage artificial intelligence (AI) to map fields from a document to fields in a record type. Once the data is extracted, users can confirm or correct the results using a simple task interface. As you test your process model and reconcile results, the extraction will become smarter and more accurate over time.

This page will walk you through how to create your own document extraction process in three parts:

  1. Create a document extraction AI skill and define your document structure.
  2. Create record types and other design objects to map and save your extraction results.
  3. Configure a basic document extraction process model.

Tip:  Check out the Document Classification Tutorial to build a process that classifies documents in addition to extraction.

Scenario

Acme Logistics is a shipping and receiving company that manages inventory for its customers. In addition to physical items, Acme has to manage and act on documents such as invoices. Acme wants to create an Appian process to extract data from invoices, which customers and vendors submit through Acme's website. Acme also wants to save this data as records.

invoice_example_extraction.png

Before you build the process, you'll build all of the supporting design objects, starting with the AI skill.

Setup

This tutorial assumes you have an Appian application created already. We'll walk you through creating each of the design objects you need to automate document classification.

Tip:  Objects in this tutorial use the AL prefix. If you're creating objects in an application that uses a different prefix, use your application's prefix in new object names.

Gather example documents

Before the AI skill can serve its purpose, it needs to learn a lot about the documents your business encounters. One of your first steps should be to build a complete and representative dataset to train the model. The model can only learn from the documents you provide to it, so it's important to have a large number and variety of realistic examples.

We've provided sample Invoices for you to use in this tutorial. Download these files to your computer, since you'll use them to setup and train the AI skill. Unzip the compressed folders, as you'll need to upload the documents individually and not as a ZIP file.

System requirements

This tutorial is designed to be used with Appian 23.2 and later.

Part 1: Create AI skill

The Document Extraction AI Skill takes a document as input and uses machine learning to extract data from that document.

To create the AI Skill:

  1. In your application, go to the Build view.
  2. In the application toolbar, click NEW > AI Skill.
  3. On the Create AI Skill page, in the Extraction section, choose Document.
  4. Choose Semi or Highly Structured.
  5. Configure the following properties:

    Property Value
    Name AL_ExtractInvoice
    Description (Optional) AI skill to extract data from invoices Acme receives
  6. Click CREATE. The Review AI Skill Security window displays.
  7. Select Viewer permissions for the AL Users group and Administrator permissions for the AL Administrators group.
  8. Click SAVE.

Create a model and add documents

To get started, you'll create a model and add examples of a typical invoice.

  1. In your new AI skill, click CREATE FIRST MODEL.
  2. In Step 1: Provide Documents, choose Semi-structured for the Document Layout.
  3. Click UPLOAD to add training documents.
  4. Browse for and select invoices.
  5. After the documents finish uploading, click NEXT.

Define the invoice structure

  1. In Step 2: Add Fields to Extract, enter the names of the fields that appear on most invoices. Start with name.
  2. Click Add Field six more times, so you have a total of seven fields. Use the default data type of Text for all seven fields.
  3. Enter the following field names:
    • email
    • phone
    • address
    • invoiceNumber
    • date
    • total
  4. Add another field. Name this field items and select type Table.
  5. In the nested table fields that appear below items, add four fields with the default data type of Text.
  6. Enter the following four field names:
    • quantity
    • description
    • unitPrice
    • amount
  7. Click NEXT.

Label field data

Once you define your document structure, you'll indicate where those fields appear in the sample documents. This process is called labeling and it helps the model learn more about where these fields appear in your documents.

  1. In the document preview, click and drag your mouse around the company name to create a selection box.
  2. In the dropdown that appears, click the name field.
  3. Repeat steps 1 and 2 until you've labeled values for all fields in your invoice structure. You won't label the fields in the items table, but that information will still be extracted.

    Tip:  Regularly click SAVE CHANGES to save your progress.

  4. Above the document preview, click NEXT to view the other sample documents and label additional fields.
  5. Click REVIEW to see a summary of how many labels appear for each field.

Review labeled fields and train the model

The final step of creating a document extraction AI skill asks you to review the fields you've labeled in the sample documents. The more fields you label, the more the model can learn about your fields. This is what makes the model smarter and better at extracting data of interest.

Each field will need labels in at least half of the documents you uploaded. If you haven't labeled enough fields in the set of documents you uploaded, you'll see a message encouraging you to add more files and fields.

After your review, click TRAIN MODEL. Training may take a few minutes. While you wait, proceed to create the additional design objects in your process.

Part 2: Create additional design objects

When extracting data, Appian will identify key-value pairs from the document and map them to the fields of your desired data object (a CDT or record type). Your data object should be constructed to reflect the data available in your document. Therefore, it's important that your fields match the data that will be extracted from your document.

If your document contains field names and a table, like an invoice document that contains a table of items, you'll ultimately create two data objects: one that represents the document, and one that represents the table.

In this step, you're creating two record types to store the extracted data: one record type for the invoice data, and one record type for the data contained in the items tables commonly found in invoices Acme receives.

Create a record type for a document

To create a record type for your document, you'll want to create all the form fields as fields in your record type.

  1. In your application, go to the Build view.
  2. Click NEW > Record Type.
  3. In Create Record Type, configure the following properties:

    Property Value
    Name AL Invoice
    Display Name (Plural) AL Invoices
    Description A record type to store data on invoices sent to Acme Logistics.
  4. Click CREATE. The Review Record Type Security window displays.
  5. Select Viewer permissions for the AL Users group and Administrator permissions for the AL Administrators group.
  6. Click SAVE. The record type opens in a new tab.
  7. Click TELL US ABOUT YOUR DATA.
  8. On the Configure Data Source page, click I want to start from scratch.
  9. Click NEXT.
  10. Choose your Data Source where you'd like to create the data.
  11. Click NEXT.
  12. On the Create Data Model page, keep the default settings for the following fields:

    Field Type
    id Number (Integer)
    createdBy User
    createdOn Date and Time
    modifiedBy User
    modifiedOn Date and Time
  13. Click NEXT twice to skip adding record type relationships for now. You'll do this in a later step. You can keep any suggested relationships, such as relationships with User records.
  14. Click NEW FIELD to create seven new fields in the data model:

    Field Type
    invoiceNumber Number (Integer)
    name Text
    email Text
    phone Text
    address Text
    date Text
    total Text

    Tip:  Notice that these fields match the top-level fields in the AI skill you created earlier.

  15. Keep the Create Table checkbox checked and click SAVE CHANGES.
  16. Click FINISH.

Create a record type for a table

To extract and save table data from your document, you need to create a separate record type to represent the table. After, you will create a relationship to reference this new record type from the record type representing the document.

To create a record type for the table:

  1. In your application, go to the Build view.
  2. Click NEW > Record Type.
  3. In Create Record Type, configure the following properties:

    Property Value
    Name AL Invoice Item
    Display Name (Plural) Invoice Items
    Description A record type to store table data on invoices sent to Acme Logistics.
  4. Click CREATE. The Review Record Type Security window displays.
  5. Select Viewer permissions for the AL Users group and Administrator permissions for the AL Administrators group.
  6. Click SAVE. The record type opens in a new tab.
  7. Click TELL US ABOUT YOUR DATA.
  8. On the Configure Data Source page, click I want to start from scratch.
  9. Click NEXT.
  10. Choose your Data Source where you'd like to create the data.
  11. Click NEXT.
  12. On the Create Data Model page, keep the default settings for the following fields:

    Field Type
    id Number (Integer)
    createdBy User
    createdOn Date and Time
    modifiedBy User
    modifiedOn Date and Time
  13. Click NEXT twice to skip adding record type relationships for now.
  14. Click NEW FIELD to create five new fields in the data model:

    Field Type
    invoiceId Number (Integer)
    quantity Text
    description Text
    unitPrice Text
    amount Text

    Tip:  Notice that these fields match the table fields in the AI skill you created earlier. invoiceId is a separate field from the id field generated automatically by the record type wizard.

  15. Keep the Create Table checkbox checked and click SAVE CHANGES.
  16. Click FINISH.

Add record type relationships

Now that you've set up a record type for both the invoice and the table, you'll need to add record type relationships to associate them. For document extraction data to write to both records, you'll need to set up relationships in both record types.

  1. In your application, go to the Build view.
  2. In the list of design objects, open the AL Invoice record type.
  3. Click ADD RELATIONSHIP.
  4. In the Related Record Type field, select AL Invoice Item.
  5. Click NEXT.
  6. For the Relationship Name, enter items. Note that this matches the table field you added in the AI skill document structure.
  7. For the Relationship Type, choose One to Many.
  8. For Common Fields, select the following:

    Record Type Field
    AL Invoice id - Number (Integer)
    AL Invoice Item invoiceId - Number (Integer)

  9. Click ADD.
  10. Click SAVE CHANGES.

Repeat the process in the AL Invoice Item record type:

  1. In your application, go to the Build view.
  2. In the list of design objects, open the AL Invoice Item record type.
  3. Click ADD RELATIONSHIP.
  4. In the Related Record Type field, select AL Invoice.
  5. Click NEXT.
  6. For the Relationship Type, choose Many to One.
  7. For Common Fields, select the following:

    Record Type Field
    AL Invoice Item invoiceId - Number (Integer)
    AL Invoice id - Number (Integer)

  8. Click ADD.
  9. Click SAVE CHANGES.

Create a document folder

To keep things organized as users upload invoices, create a folder in your application to store the document files.

  1. In your application, go to the Build view.
  2. Click NEW > Folder.

    Property Value
    Type Document Folder
    Name AL Uploaded Documents
    Description Folder containing documents submitted via Acme's website.
    Parent Folder AL Knowledge Center
  3. Click CREATE.

Create a constant for the document folder

To reference the folder in your interface, you'll need to create a constant.

  1. In your application, go to the Build view.
  2. Click NEW > Constant.

    Property Value
    Name AL_UPLOADED_DOCUMENTS
    Description Constant referencing the AL Uploaded Documents folder.
    Type Folder
    Value AL Uploaded Documents
  3. Click CREATE.

Add a start form

Acme's customers submit a form with invoices attached. You can create an interface to collect and save all of the necessary information, including documents.

  1. In your application, go to the Build view.
  2. Click NEW > Interface.
  3. Configure the following fields:

    Property Value
    Name AL_IntakeForm
    Description (Optional) Interface to allow vendors to upload documents.
    Save In Select the Rules & Interfaces folder in your application.
  4. Click CREATE.
  5. In the Rule Inputs pane, click New Rule Input and configure the following parameters:

    Property Value
    Name document
    Type Document (Appian data type)
  6. Click CREATE.
  7. In the templates panel, find the FORMS section.
  8. Click One Column Form.
  9. Double-click the title.
  10. Enter Submit to Acme Logistics.
  11. In the top Section Layout component in the editor, drag and drop a TEXT field from the component palette.
  12. Click the Text component.
  13. In the COMPONENT CONFIGURATION, configure the following:

    Parameter Value
    Label Position Hidden
    Display Value Click Edit as Expression and enter: "Thank you for contacting Acme! Upload your document and we'll be in touch."
    Read-only Selected
  14. Click the bottom Section layout.
  15. In the COMPONENT CONFIGURATION, delete the default text in the Label field.
  16. In the bottom Section Layout component in the editor, drag and drop a FILE UPLOAD field from the component palette.
  17. Click the File Upload.
  18. In the COMPONENT CONFIGURATION, configure the following:

    Parameter Value
    Target Folder AL_UPLOADED_DOCUMENTS
    Selected Files ri!document
    Save Files To ri!document
  19. Click OK.
  20. Click SAVE CHANGES.

Create a constant for the analyst

Later, you'll add a node in the process model for the analyst to reconcile the extracted data. To assign that task, you'll first need to create a constant referencing the analyst.

  1. In your application, go to the Build view.
  2. Click NEW > Constant.
  3. In Create Constant, configure the following properties:

    Property Value
    Create from scratch Leave selected
    Name AL_ANALYST
    Description Constant pointing to the analyst at Acme Logistics.
    Type User
    Value Select your username.
  4. Click CREATE.

Part 3: Build the document extraction process

With your record types and AI skill in place, you can now start building your end-to-end process.

The following instructions walk you through how to configure your process model and the three key nodes of a document extraction process.

As you build your process, you have the flexibility to incorporate other design objects and decisions that fit your specific business needs. See some additional process configuration options you can add to your own process model.

Create a process model

To easily pass data throughout your process, you'll want to create process variables that represent your document, extraction ID, and extracted data:

  1. In the application toolbar, click NEW > Process Model.
  2. Configure the following properties:

    Property Value
    Name AL Invoice Extraction
    Description Process to extract invoice data from Acme vendors.
  3. Click CREATE.
  4. Configure security and click SAVE.
  5. From the File menu, click Properties.
  6. In the Process Model Properties dialog, go to the Variables tab.
  7. Create the following process variables:

    Name Type Value Parameter? Required? Multiple?
    cancel Boolean Blank Yes No No
    document Document Blank Yes No Yes
    docExtractionId Text Blank No No No
    record AL Invoice (Record Type) Blank No No No

  8. Click OK.

Configure the Start Form

The process kicks off when a user submits the start form. Configure the Start node to use the form you created:

  1. In the Appian Process Modeler page, click File > Properties in the menu bar. The process model properties window displays. By default, the General tab is active.
  2. Go to the Process Start Form tab.
  3. In the Interface text box, enter AL.
  4. Select AL_IntakeForm when it displays in the dropdown list.
  5. Click Yes to create process variables based on rule inputs from the submission form, even though you created all of the variables in the previous step.
  6. Click OK to return to the process model.
  7. In the menu bar, click File > Save & Publish.

Configure the Extract from Document Smart Service

After defining your process variables, the first node to add to the process is the Extract from Document smart service. This smart service takes a document as input, extracts data using a machine learning model, and returns the extracted data as output.

To configure the smart service:

  1. From the Palette, drag in a Extract from Document smart service.
  2. Open the Extract from Document smart service.
  3. Select the Setup tab.
  4. Under Select AI Skill, select the AL_ExtractInvoice skill you created earlier.
  5. Select the Data tab.
  6. On the Inputs tab, configure the inputs with the following values:

    Input Value
    Document pv!document
    Confidence Threshold 80
  7. On the Outputs tab, configure the outputs with the following values:

    Output Value
    Doc Extraction Id Choose the docExtractionId process variable.
    Extracted Data Choose the record process variable.
    Confidence Scores Leave blank.
  8. Click OK.

Configure the Reconcile Doc Extraction Smart Service

The next node you will configure is the Reconcile Doc Extraction Smart Service. This smart service will assign a reconciliation task to a user to confirm or correct the extracted results.

To configure the smart service:

  1. Drag in a Reconcile Doc Extraction Smart Service node.
  2. Open the Reconcile Doc Extraction Smart Service.
  3. Select the Data tab.
  4. On the Inputs tab, configure the default inputs with the following values:

    Input Value
    Doc Extraction Id Choose the docExtractionId process variable.
  5. On the Outputs tab, configure the outputs with the following values:

    Output Value
    Reconciled Data Choose the record process variable.
  6. Select the Assignment tab.
  7. Open the expression editor next to the the Assign to the following: field.
  8. In the Expression window, enter cons!AL_analyst.
  9. Click SAVE AND CLOSE.
  10. Click OK.

Update the invoice record with reconciled data

Finally, you'll add another node to write records for the reconciled data.

  1. In the Smart Service Search bar, search for Write Records. You can also find this in Automation Smart Services > Data Services.
  2. Click and drag from the search results to the flow connector between the Reconcile Doc Extraction and End nodes.
  3. Double-click the Write Records node.
  4. Go to the Setup tab.
  5. In the Record Input field, select the record process variable.
  6. In the Record Type field, AL Invoice will be automatically selected.
  7. Go to the Data tab.
  8. Click the Outputs tab.
  9. Click the Records Updated output.
  10. In the Result Properties pane, select record for the Target.
  11. Click OK.

Add a cancel flow

It's a best practice to include a pathway in your process model in case the user clicks Cancel on the start form.

To add a cancel flow:

  1. In the Smart Service palette, locate the XOR gateway.
  2. Click and drag the XOR gateway to the flow connector between the Start Node and Extract from Document.
  3. Add a connector from the XOR gateway to the End Node.
  4. Double-click the XOR gateway to configure it.
  5. On the General tab, enter Cancel? in the Name field.
  6. Go to the Decision tab.
  7. Click NEW CONDITION.
  8. Click Open the Expression Editor next to the Condition field.
  9. Expand the list of Process Variables.
  10. Click cancel.
  11. Click SAVE AND CLOSE.
  12. In the Result field, select End Node.
  13. Click OK.

That's it! Your process is set up to extract data. It should look like this, but it may contain additional nodes based on how you customized it:

doc_extraction_tutorial_process.png

Test your process

After creating your process model, run it with a few samples to test the extraction and to see how your auto-extracted results change.

To test the document extraction process created above:

  1. Go to your process model.
  2. From the File menu, click Start Process for Debugging.
  3. Use the form to upload an invoice document.
  4. After the Extract from Document node completes, you should receive a task to reconcile the extracted data.
  5. Return to the process model monitoring view to observe the results.

doc-extraction-success.png

As you test, Appian will use the field names from the data type to find a match. Over time, Appian learns how to map your data to your data type fields from the user interactions with the reconcile interface.

Appian will delete document extraction runs after 30 days, or when the total amount of data surpasses 10 GB. If you attempt to access a run that has been deleted, you will see an error. Appian will not delete the documents you uploaded. Learn more about your document's security.

Additional process configuration options

The process model detailed above provides the basic nodes needed to create an extraction process, but you aren't bound to this model. In fact, the major benefit of creating your own document extraction process is the flexibility to add additional rules or decisions that are specific to your business needs.

There a few ways you can enhance or modify this process, for example:

  • Dynamically skip reconciliation: After the Extract from Document smart service, you can use a Script Task to evaluate the extracted data. For example, you may want to validate that all fields were extracted, or that the extracted data meets your business validations. If your validations are met, you can use an XOR node to skip the Reconcile Doc Extraction smart service and write the extracted data directly to your record type, without any human review.
  • Use the two optional outputs in the Reconcile Doc Extraction Smart Service to route the process model after reconciliation:
    • isSubmit is true when the user selects the Submit button on the reconciliation task. Add logic after this smart service to use isSubmit=true() to trigger an email notification or a confirmation dialog.
    • isException is true when the user selects the Mark as Invalid button on the reconciliation task. Add logic after this smart service to use isException=true() to route to a chained user input task, where the user provides more information.
  • Add conditionality to the Reconcile Doc Extraction Smart Service to determine who should be assigned the reconciliation task based on certain business criteria or rules.

Build a Doc Extraction Process with AI Skill

FEEDBACK