Prepare Data [Process Mining v5.3]

Prepare Data

This content applies solely to Process Mining, which must be purchased separately from the Appian base platform.

Before you begin

As you prepare to load your data into Mining Prep, you may need to configure some network settings to allow communication from Appian. This step is required before you add a data source and add data sets directly to Mining Prep.

Learn how to allow traffic from Appian Cloud based on IPs in Appian Community.
For more on using IP addresses to control access, see the Appian Cloud documentation.

You may also want to review your data for personal identifiable information (PII) or other legally protected information before loading data into Mining Prep. For more information, see Security Considerations.

If you aren't sure how to access Mining Prep for the first time, see Access and sign in instructions.

Data management

On the Data Management tab, you can add the data you want to transform into an event log for process mining. An event log is a list of events that process mining uses to analyze processes. Events represent activities in process mining. Activities are tasks in a business process that may be automated or performed by a human. They are often associated with start and end time stamps, who performed the activity, or how much the activity cost.

You can either upload the data from a CSV file or connect to another data source like a database or Enterprise Resource Planning (ERP) application.

Note: You can upload a file up to 1 GB in size for Appian Cloud Mining Prep installations depending on your local network speeds.

There are a couple of options to import your data into Process Mining:

Send transformed event logs from Mining Prep.
Upload event logs directly to Process Mining.

Data sets

Data Management Tab

The Data Sets section displays all files that you've uploaded to Mining Prep. You can use these files to start a new transformation or add them to an existing transformation. To preview the contents of a data set, click the row it belongs to in the table.

What makes a good data set?

At a basic level, processes with large amounts of relevant data are good candidates for process mining. For example, an automated purchase-to-pay system probably logs an event whenever an invoice is created and another when an invoice is paid. These events are often associated with time stamps and attributes like who initiated the event or the purchase price. A lot of value is hidden in these large data sets, and process mining can help you make sense of it.

The previous example highlights an automated process. Automated processes that record lots of data and are well understood by your organization are great candidates for optimization in Process Mining. That is, you can use Process Mining to compare how your process actually functions against how you expect it to operate.

Unstructured processes that have a lot of available data but aren't well understood by your organization are perfect candidates for discovery. That is, you can use Process Mining to visualize your process for the first time and make sense of how it operates.

In either instance, you'll want to consider some of the characteristics that make these data sets good candidates for process mining. Data sets should include:

Case IDs: The unique identifier of a business case, which usually consists of a combination of letters and numbers. A business case represents a real business transaction in your IT system.
Events: The digital representation of when an activity occurred. An instance of an activity. A case often consists of multiple events.
Time stamps: At least one column in your data set should have a time stamp for when activities took place.

In addition to these requirements, it is helpful to also include case and event attributes in your data sets.

Case Attributes: Characteristics of a case that do not vary from event to event like customer or vehicle types.
Event Attributes: Characteristics of a single activity like who performed the activity or total payment amount.

It is a good idea to include these attributes in your data sets because they are available as filters in Process Mining. Additionally, this extra information can help you make observations such as how costly a deviation or variant is in your process.

The last area to consider is what period of time to include in your data set. This depends heavily on your specific process and process mining goals. For example some processes may complete within a single week, and others may complete within several months.

It is generally a good practice to select a timeframe that allows for at least a few cases to fully start and complete. When in doubt, it is always better to include more data, as this offers more analysis opportunities. Even if you import a larger timeframe than is necessary, there are options to refine and filter the data to suit your specific goals.

The following image shows a potential data set that includes all of the requirements for process mining analysis.

Example data set

Upload a data set

Before you upload a data set, make sure the file conforms with these standards:

Columns with a header must contain data.
Columns with data must have a header.
Column headers cannot include @, `, or \ special characters.

To upload a data set from a local CSV file:

Click the Add button on the Data Sets section.
Click the Select file button and select your CSV file.
The fields on the window automatically populate. Edit the auto-populated values as necessary.
Click the Upload button.

Tip: If your data is in an Excel spreadsheet, you can save your sheets as CSV files from the Save As menu in Excel.

Update a data set

mp-data-sets-update

If you need to update data sets that you've already uploaded, you can either append or replace the file:

Append: Use if the new CSV file only contains new entries that don't exist in the current data set.
Replace: Use if the new CSV file still contains entries that have already been uploaded.

Mining Prep expects the same column headers as the original file when you append or replace. When you append or replace a data set, all of your transformations that rely on that data set will automatically use the new data the next time the Mining Prep executes the transformation.

If you don't want to append or replace, you can also delete and re-upload the file as a new data set.

Data sources

In addition to uploading CSV files, Mining Prep can import data sets from a database or ERP application. Standard connectors come out of the box for the following systems:

Add a data source

To connect Mining Prep to a data source, you need to allow network traffic from Appian Cloud. Self-managed installations don't need to complete this step.

To connect to a supported data source:

Click the Add button on the Data sources section.
Select the type of database you want to connect to.
Complete the required fields.
Click Save.

If you do not know the necessary credentials, contact your database administrator or IT department.

Field Name	Description	Example
Name	The name that displays in Mining Prep.	`erp_production_db`
Fetch Size	Optional. The size of chunks that Mining Prep uses to import the data. Leave blank if you are unsure.	`65536`
Server	The domain name or IP address of the database server.	`db.example.com` or `192.0.2.6`
Port	The port used to reach the database. The default port for Oracle is 1521.	`1521`
Schema	Optional. Defaults to the username. Specifies the database schema of all of the displayed tables.	`public`
Database	The database name on the server.	`INVOICES`
User	The login's username.	`janedoe`
Password	The password for the username.	`********`

Import data from a data source

Once you've added a data source, you can import the data from this database connection into Mining Prep as a data set. There are multiple ways to import data sets from a data source:

Import Tables: Provides an interface to choose which tables to import.
Import by Query: Allows you to specify a SQL query to retrieve tables.

Import tables

Mining Prep can provide you with the overview of all tables and views that are available for the specific user and schema.

To import tables:

Click the Add button next to the data source.
Select the Import Tables option.
Optionally, change the table names as you want them to display in Mining Prep.
Click the Import button next to each table you want to import into Mining Prep.

Mining Prep displays the original table names alongside an editable field. You may want to edit the table names so that they are easier to understand.

Import by query

Mining Prep supports custom SQL queries to select data from your data source.

If you are performing more advanced queries on the data, such as queries beyond simple SELECT statements, we strongly suggest you create a view for that SQL statement inside your database. Then, you can import this view into Mining Prep as a table. Examples of advanced queries might include data aggregations, joins, or manipulations.

Note: We suggest creating a database user with only limited rights to access the necessary data. Mining Prep inherits the same privileges as the database user.

To query for tables:

Click the Add button next to the data source.
Select the Import by Query option.
Configure the name of the query in the Name field.
Configure the SQL query in the text field.
Click Save.

The data source overview displays all of the imported tables.

Additional data considerations

If your data set has imprecise time stamps—for example, times that are only exact to a day and year—be aware that the order of events in your models may not appear as you would expect. This is common for SAP data. Events with identical time stamps display in the order they appear in the data set.

This can lead to more process variants which may influence the accuracy of your analysis and conformance checks.

What's next?

If you want to start building your event log from your data set(s), click CREATE TRANSFORMATION PROJECT. This automatically creates a new transformation project on the Transformation Projects tab.

Go to Transform Data to learn more.

Open in Github Built: Fri, Apr 19, 2024 (06:08:09 PM)

Prepare Data