Transform and Load Data
This content applies solely to Process Mining, which must be purchased separately from the Appian base platform.

Transformation Projects

After you've created your data sets, the next step is to create a transformation project.

On the Transformation Projects tab, you can create transformation projects to transform data sets into event logs. Mining Prep must transform data sets into event logs for process mining analysis. An event log is a list of events that process mining uses to analyze processes. Events represent activities in process mining. Activities are tasks in a business process that may be automated or performed by a human. They are often associated with start and end timestamps, who performed the activity, or how much the activity cost.

Transformation Projects

To create a transformation project, click the Add button on the Transformation Projects tab. You can also create a transformation project directly from a data set by clicking the Create Transformation Project button.

Transformation projects consist of:

Once you've built the transformation project, you can click the Transform & Load button to execute the transformation actions and load the event log into Process Mining.

Transformation Project

Transformation Project Statuses

Transformation project statuses indicate the execution status of the project whenever you manually or automatically execute a transformation project.

Statuses display in the Status column alongside a timestamp that indicates when the status occurred:

  • Running: The transformation project is currently extracting, transforming, and loading into Process Mining.
  • Completed: The transformation project successfully executed, and the event log is available in Process Mining.
  • Failed: The transformation project did not execute successfully due to an error.

Building blocks

Use building blocks to organize your transformation projects. A building block is automatically created when you add a transformation project.

Building blocks can contain one or more data sets.

In each building block, you identify the case ID, events, and attributes for the data set. You must define these properties before you can run the final transformation. You also add transformation actions to building blocks. The final transformation is comprised of a series of transformation actions. A transformation action defines a single data manipulation instance.

Transformation actions run inside the building blocks in which they are defined. For smaller projects, it may make sense to keep all data sets in a single building block, so you only have to add the transformation actions once. As projects grow, you may wish to add more building blocks to more clearly organize data sets. However, when you have multiple building blocks, you will have to add transformation actions to each block.

Blank building block

Building blocks can contain one or many data sets. If you add multiple data sets to a single building block, they must share a common column of values to serve as a key.

During the final transformation, Mining Prep automatically combines all valid building blocks into one event log for process mining analysis.

Add data sets to building blocks

After you add a new building block, you can choose to add an existing or new data set.

To add an existing data set to a building block:

  1. Select the data set from the drop-down field in the USE DATA SOURCE panel.
  2. Click Use Data Source.

mp-select-data-source

To add a new data set to a building block:

  1. Click Upload Data Source in the USE DATA SOURCE panel.
  2. Click the Select file button and select the CSV file.
  3. The fields on the window automatically populate. Edit the auto-populated values as necessary.
  4. Click the Upload button.

mp-upload-data-source

Transformation actions

Transformation actions define changes to the data's format, structure, and name in the event log. You add transformation actions from the building-block level, and these actions only apply to data inside the building block.

To create a complete event log that process mining can analyze, it must contain the following items:

  • Case ID
  • Events
  • Case Attributes (optional)

Although case attributes are optional, they can be helpful during the process mining phase. Case attributes that you define during Mining Prep are available as filters in process mining and can be used as dimensions when creating charts in a dashboard.

It is common to need to transform your data set so that you can define these properties. For example, events must contain columns in a datetime format to serve as timestamps. You can add a date conversion transformation action to a column so that it contains the correct data type. See Identify data properties for more information.

Transformation building blocks store transformation actions. You access these actions from a building block.

To add a transformation action:

  1. Add a data set to the building block if you haven't already.
  2. Click the Add button in the Transformation Actions panel.
  3. Select the transformation action you want to add.
  4. Complete the required fields in the right-hand panel.
  5. Click Preview to verify the transformation result.
  6. Click Save to add the transformation action to the building block.

Mining prep supports the following transformation actions:

Add transformation drop down

Match columns

The match columns transformation action adds new columns and populates them according to match conditions.

To create a new column, enter the new column name in the New Column Name field.

Screenshot of New Column Name field with string type

After creating the column, you need to specify its data type in the Column Type field. In the preceding example, the new column is a string type. The following table describes the column types you can select:

Type Description Example
String Text or combinations of alphabetical and numerical characters. jdoe44
Integer Whole, non-decimal numbers. 8675
Double Double-precision floating-point numbers. 1403.22

After selecting the data type, you can define the conditions that determine how the match column populates. Each condition starts with a comparison between two columns. If the comparison is true, the data from the Then field is returned; if the comparison is false, the data from the IF NOT THEN field is returned.

The comparison operators you can choose from are:

Comparison Operators Description Example
= is equal to 1=2 returns false
!= is not equal to 1!=2 returns true
Contains contains "booker12" contains b returns true
Is null Empty cell "Jane Doe" is null returns false
Is not null Not empty cell "Jane Doe" is not null returns true

mp-add-condition

If you have more than one condition you want to define, click Add Condition to add additional conditions.

Replace values

The replace values transformation action replaces values in existing columns based on conditions you define. For example, you may need to replace a special character or abbreviation in an attribute column.

To start, select an existing column from the Select Column field.

Screenshot of Select Column field

After selecting the column, you can define the conditions that determine how the column values are replaced. Each condition starts with a comparison between two columns. If the comparison is true, the data from the Then field is returned; if the comparison is false, the data from the IF NOT THEN field is returned.

The comparison operators you can choose from are:

Comparison Operators Description Example
= is equal to 1=2 returns false
!= is not equal to 1!=2 returns true
Contains contains "booker12" contains b returns true
Is null Empty cell "Jane Doe" is null returns false
Is not null Not empty cell "Jane Doe" is not null returns true

If you have more than one condition you want to define, click Add Condition to add additional conditions.

Filter

The filter transformation action reduces the number of rows in your data set to only those that match filter criteria.

Filters on a column of type string are case sensitive and support the following comparison operators:

Comparison Operators Description Example
= is equal to 1=2 returns false
!= is not equal to 1!=2 returns true
Contains contains "booker12" contains b returns true
Is null Empty cell "Jane Doe" is null returns false
Is not null Not empty cell "Jane Doe" is not null returns true

Filters on a column of type datetime, integer, or double additionally support the following comparison operators:

Comparison Operators Description Example
< Less than 1<2 returns true
> Greater than 1>2 returns false
<= Less than or equal 1<=2 returns true
>= Greater than or equal 1>=2 returns false

These operators allow you to filter ranges of values in your data set.

mp-filter-action

Add data

Additional information from other data sets can enrich your process mining analysis. This might be data from a different system like your CRM system or internal tables from your business unit. With the add data transformation action, you can add supplemental data sets to your transformation project.

To add the add data transformation action to a building block:

  1. Click Add and select the Add Data transformation action. A preview of your current transformation displays.
  2. Select an existing data set from the Select Table drop-down.
  3. Select a key column from the current data set that shares values with a column in the new table.
  4. Click Preview to verify the transformation result.
  5. Click Save to add the new data to the data set in the building block.

The add data transformation action only adds rows where values match between the new and source data key columns.

After adding data, you can use any new columns as events or attributes, and you can continue to transform the new combined data set.

mp-add-data

Date conversion

Use the date conversion transformation action to standardize dates within your transformations and convert strings into UTC timestamps. In addition to case IDs and events, timestamps are a fundamental component of the data used for process mining.

The date conversion transformation action converts a string into the UTC time zone based on the original time zone of the data set. The result of this conversion is UTC, not your local time zone.

When you add a date conversion transformation action, you must specify the following:

  • The columns that contain the timestamp you want to use in your event log.
  • The current datetime format as a format string. Appian automatically suggests the correct format for the most common datetime formats. For example, the MM/dd/yy hh:mm string matches data like this: 11/23/1955 12:00.
  • The local time zone that applies to the data set.

datetime format strings use the following case-sensitive characters:

Character Description Example
y year 1996
M month Jul or 07
d day 10
a half day PM
h hour (0-23) 0
m min 30

Click Preview to verify the transformation result, before clicking Save.

mp-date-conversion

Timestamps and time zones

Mining Prep saves, operates, and previews all date and time information using the Coordinated Universal Time (UTC) time zone. When you perform date conversions, you must specify the format of the timestamp and the local time zone of the data set.

When you preview and save the date conversion, Mining Prep converts the value from your selected time zone to UTC.

For example, let's say your data set contains a column with timestamps that look like this: 2016/01/04 12:09:44.

You would specify the format string as yyyy/MM/dd hh:mm:ss for this data. Assuming the event occurred in McLean, VA, and your IT system uses this local time zone when creating the log, you would select US/Eastern as the respective time zone.

After you save the date conversion, timestamp will now display as 04.01.2016 05:09:44 UTC. This represents 2016/01/04 12:09:44 EST.

Identify data properties

To create a complete event log that process mining can analyze, it must contain the following items:

  • Case ID
  • Events
  • Case Attributes (optional)

Although case attributes are optional, they can be helpful during the process mining phase. Case attributes that you define during Mining Prep are available as filters in process mining and can be used as dimensions when creating charts in a dashboard.

It is common to need to transform your data set before you can define these properties.

mp-data-properties

Case ID

The case ID is the unique identifier of a business case, which usually consists of a combination of letters and numbers. A business case represents a real business transaction in an IT system. A case often consists of multiple events. Each event can have unique timestamps and case attributes.

In process mining, a business case runs through a particular process variant; therefore, it is the image of exactly one process cycle. The case ID identifies each case individually.

To identify the case ID property from a building block:

  1. Click the Case ID button in the Identify section.
  2. Select the column that contains the case ID in the Case ID drop-down field.
  3. Click Save.

Events

An event represents an activity in a system. All events that are assigned to a business transaction form a unique case. Similarly, a business process consists of several activities. An event has an associated case ID, timestamp, and other contextual data. The timestamps correspond to the start or end time of the event.

Not all events have start and end timestamps, but you need at least a start timestamp to successfully define an event. Additionally, process mining can only calculate the duration of events if a start and end timestamp exist.

You can only identify events if Mining Prep recognizes a column of type datetime. If you don't have datetime columns in your data set, Mining Prep will automatically prompt you to select a column from your data set to perform a date conversion.

Mining Prep supports two event identification types:

  • Single Events: One column contains all timestamps for one specific activity. In this instance, the column header usually indicates the specific logged activity name.
  • Multiple Events: Multiple columns contain event names and timestamps for several activities. One column contains the logged activity names, and the other column(s) contain timestamps.

mp-add-events.png

To identify single events:

  1. Click the Events button in the Identify section on your building block.
  2. Click the Add Event button.
  3. Select the Single Event option.
  4. Configure the event options:
    • Event Name: Define the name of the event as it should appear in process mining. The name can differ from the original column name.
    • Start or Start & End: Choose whether the timestamps have a start column or both start and end columns.
    • Start: Choose the column that contains the timestamp for the start of the event.
    • End: Optionally, choose the column that contains the timestamp for the end of the event. Only displays if Start & End is selected.
    • Event Attributes: Optionally, choose characteristics that are specific to a certain activity. For example, "amount" or "responsible department".
  5. Click Save.

To identify multiple events:

  1. Click the Events button in the Identify section on your building block.
  2. Click the Add Event button.
  3. Select the Multiple Events option.
  4. Configure the event options:
    • Name of Event Group: Define the name of the group of events. This is only used in Mining Prep and doesn't display in Process Mining.
    • Event Name Column: Choose the column that contains multiple event names.
    • Start or Start & End: Choose whether the timestamps have a start column or both start and end columns.
    • Start: Choose the column that contains the timestamp for the start of the event.
    • End: Optionally, choose the column that contains the timestamp for the end of the event. Only displays if Start & End is selected.
    • Event Attributes: Optionally, choose characteristics that are specific to a certain activity. For example, "amount" or "responsible department".
  5. Click Save.

Case Attributes

Case attributes are an optional property of an event log and describe characteristics of a case. Case attributes help you make observations such as how costly a deviation or variant is in your process. Each case can have one or more attributes that describe it in more detail. For example, customer or vehicle types can be case attributes. Attribute characteristics can vary from case to case.

You can select the following attribute types:

Attribute Description Valid Types
Numerical Numeric characteristics like “total payment amount.” integer, double
Impact Numeric characteristics that are useful when summed together. In the Process Mining module, the Variants and Deviations pages display the summed values of impact attributes beneath the Impact column. integer, double
Categorical Descriptive characteristics like “responsible department.” integer, double, string

Numerical and categorical attributes that you define in Mining Prep are available as filters in the Process Mining module.

Make sure you use valid data types with your attributes to prevent transformation errors later on. Appian displays a warning if you select a column with an invalid data type. Note that if your data uses a comma as a decimal separator, it is assumed to be a string. This may not be accurate depending on your specific data set.

Data types

Each column in your data set is associated with a data type. The data type indicates what kind of data the column contains. This is important to keep in mind, because you can only merge columns of the same data type. To help you keep track each column's data type, Mining Prep displays a data type icon in the column headers of your data previews.

The following table describes the possible data types and associated data type icon:

Data Type Icon Example
string Context menu icon angela.lewis
integer Context menu icon 20
decimal Context menu icon 20.50
datetime Context menu icon 20.02.2022 12:10:13 UTC

Execute the transformation project

When your transformation project is complete, you are ready to run the transformation and load the event log into Process Mining.

To transform and load your project into Process Mining:

  1. Go to the Transformation Projects tab in Mining Prep.
  2. Select the checkbox next to your transformation project.
  3. Click Transform & Load.

    mp-transform-load.png

  4. Select the environment in which to load the event log. The API key is automatically populated for you.
  5. Click Ok.

Once you've manually transformed and loaded a transformation project, you can schedule these actions to happen automatically.

Schedule updates

You can define a schedule for your transformation project to load updated data to Process Mining for continuous analytics, monitoring, and notifications.

Mining Prep automatically extracts, transforms, and loads the updated data from your source system to Process Mining on a reoccurring basis.

See Schedule for more information.

Open in Github Built: Fri, Sep 23, 2022 (01:18:25 PM)

On This Page

FEEDBACK