A guided, low-code experience for document extraction is here! AI skills are a faster and simpler way to classify documents and extract data from them. |
BCopy link to clipboard
Built-in servicesCopy link to clipboard
Appian's own document extraction capabilities. You can extract data from documents and keep all of this information within your Appian Cloud instance by using built-in services.
CCopy link to clipboard
ChannelCopy link to clipboard
See Document channel.
CheckboxCopy link to clipboard
A user interface component that allows a user to make a binary choice, often "yes" or "no."
In the context of document extraction, this information is extracted as a label and the selection is saved as a Boolean (true for checked and false for unchecked).
ClassificationCopy link to clipboard
In the Intelligent Document Processing application, classification is the optional process of identifying a document's type based on certain traits, and assigning it to a group accordingly. IDP document channels are able to intelligently identify the document's type if the classification model is configured and trained to do so. Users may also be asked to complete a manual classification task if the model isn't confident in its ability to classify it automatically.
Confidence thresholdCopy link to clipboard
A pre-set value to determine when IDP cannot classify a document automatically. If the auto-classification confidence is below the threshold, for example 90%, IDP generates a task for a human to perform the review manually.
DCopy link to clipboard
Document channelCopy link to clipboard
A document channel is a grouping of document types that have their own configurations and security settings for the purposes of processing. This allows various teams using IDP to control documents and data that may contain sensitive information.
Document structureCopy link to clipboard
In the context of document extraction, structure describes how the content in a document is organized. Appian's document extraction features are more effective on certain structures than others. Document structures include: structured, semi-structured, and unstructured.
Document typeCopy link to clipboard
A category of document you use in your business operations. For example, a purchase order or invoice.
Not to be confused with the file type or extension, such as .pdf or .xlsx.
Document extractionCopy link to clipboard
The process of identifying data relationships in a PDF document and digitally representing that information. Appian may use built-in capabilities to extract this information or, if the user prefers, Google's AI services are available as well. The process of extraction can also include a user reconciliation task. After reconciliation, Appian will store and recall the mapping of the extracted key to an Appian field.
ECopy link to clipboard
ExtractionCopy link to clipboard
See Document extraction.
FCopy link to clipboard
FieldCopy link to clipboard
A single piece of data to be extracted from a document and mapped to a CDT.
Fillable PDFCopy link to clipboard
Similar to a searchable PDF, this file allows the user to input and save additional information into form fields.
Flattened PDFCopy link to clipboard
A PDF with no text data associated with the file. It doesn't contain digital text or form fields. Often, these types of PDFs are created from paper documents that have been scanned.
KCopy link to clipboard
KeyCopy link to clipboard
See label.
Key-value pairCopy link to clipboard
A match between two data elements (a label and value) that are extracted from a document.
LCopy link to clipboard
LabelCopy link to clipboard
The extracted constant that defines a part of a data set. This information isn't changed based on the user's selection or input. It is matched with the value to create a key-value pair in the extracted data. For example, "Name" is a label, and "John Smith" is a value.
MCopy link to clipboard
MappingCopy link to clipboard
The act of matching data extracted from a field in a document to a field in a CDT.
OCopy link to clipboard
Optical character recognition (OCR)Copy link to clipboard
OCR software recognizes text within a digital image. This technology is well-suited for unstructured documents, but it is less accurate and requires more maintenance than purpose-built document extraction models.
PCopy link to clipboard
Positional extractionCopy link to clipboard
Ability to extract data from a document based on its location in a document. Appian can use positional extraction if it has processed the documents and learned from the results previously.
RCopy link to clipboard
ReconciliationCopy link to clipboard
The manual task of confirming or updating data Appian extracted from a document. Functionally, users compare the data that was extracted to an image of the uploaded document. When reconciliation occurs, Appian learns how to map the data in documents to the proper fields in the corresponding data type. Over time, this will make auto-extraction more accurate and reconciliation easier and less frequent.
SCopy link to clipboard
Searchable PDFCopy link to clipboard
A PDF that contains digital text data that can be highlighted, copied, searched, and accessed programmatically. This type of PDF has undergone previous processing or was saved from a word processor.
Semi-structured documentCopy link to clipboard
Documents that include similar pieces of information, but in varying layouts. Invoices, receipts, and utility bills are good examples of documents with semi-structured data. Appian's document extraction features are well equipped to identify and extract semi-structured data. Through AI and machine learning, the services improve as you process additional documents.
Straight-through processingCopy link to clipboard
Extracted data that is 100% accurate and eliminates the need for a reconciliation task.
Structured documentCopy link to clipboard
Document containing information that is arranged in a fixed layout. Tax forms, passport applications, and hospital forms are good examples of documents with structured data. Appian can extract data from these types of documents easily due to the predictable and consistent positions of labels and values.
TCopy link to clipboard
TableCopy link to clipboard
Information displayed in a grid-like format, often using columns and rows to show similar information in a predictable way.
In document extraction, a table is a subset of the overall document data and requires additional configuration to extract and store the data properly.
TrainCopy link to clipboard
The process of improving the ability of IDP to extract correct information. This is achieved by providing example documents for IDP to process and manually confirming or fixing the extracted data in a reconciliation task. When a user provides this feedback through correction or confirmation, the model that extracted the data learns and improves in the future.
TypeCopy link to clipboard
See Document type.
UCopy link to clipboard
Unstructured documentCopy link to clipboard
Documents that include free-flowing paragraphs of text. Legal contracts and emails often include unstructured data. This type of information is more difficult to extract because the machine learning algorithms that identify the information are looking for key-value pairs. Larger blocks of text, or parts of that text, are more difficult to extract.
VCopy link to clipboard
ValueCopy link to clipboard
The extracted variable that defines a part of a data set. This information is changed based on the user's selection or input. It is matched with the label to create a key-value pair in the extracted data. For example, "Name" is a label, and "John Smith" is a value.
VendorCopy link to clipboard
The company that provides document extraction services. Customers can choose to use either Appian or Google for document extraction, based on their preferences and use cases.