Available since: v.18.1.0 Mar 1, 2022

Status: Active

Table of Content

1 Overview
- 1.1 Description
  - 1.1.1 Data Points Preview
  - 1.1.2 Extensions v37
  - 1.1.3 Schema Validation & Automation
    - 1.1.3.1 Data Type
    - 1.1.3.2 Data Transformation
    - 1.1.3.3 Validation Rules
    - 1.1.3.4 Confidence Validation
    - 1.1.3.5 Export Automation
  - 1.1.4 Data Point Export
    - 1.1.4.1 Bulk Export
- 1.2 Prerequisites
- 1.3 Settings and Options
2 How to use
- 2.1 Getting Started
  - 2.1.1 Manual Annotation
    - 2.1.1.1 Shortcuts
  - 2.1.2 Auto-Label & Auto-Schema
- 2.2 Usage example
3 Sub-Pages

Overview

Description

The data points step is used to annotate text elements with labels. The annotated text is tagged with the respective label in the export.

A Data Point describes the name of a text element (sometimes also referred to as semantic of a text or “Label”)

A Group can be used to logically group different data points together. E.g. some data points in a group “Header Data Points” might be all data points that is appearing in the beginning of a document

A List can be used to handle data points that appear a (potentially unknown) number of times across the document. Lists can contain multiple data points or even lists again.

A Schema refers to all the data points, groups and lists in a data points step. It defines how your data points are grouped and appear in the Data Points | Data Point Export

An example of a data points step consisting of 5 data points. 3 are contained in a group “Header Data Points” and 2 data points are repeating inside a list “Classification”

Data Point annotations can be created through

Data Points | Manual Annotation
Data Points | Auto Label & Auto Schema
global annotation predictions (refer to Annotation Steps)

Data Points Preview

The data points step also offers a step-specific Analysis Document Viewer | Previews called “Data Points”.

The data points serves the user to

provide an overview of the data points and their annotated text
manage Data Points | Schema Validation & Automation
download any of the Data Points | Data Point Export

The data points preview is organized in tabs:

Data Points
Schema

The Data Points tab gives an overview of each annotated text (or text value) and its corresponding data point. Moreover text values can be manually edited.

The Schema tab’s purpose is to manage the data points, lists and groups in a schema. Moreover, Data Points | Schema Validation & Automation can be configured.

The Data Points tab in the Data Points preview shows the data point text values (e.g. “8.0” with the corresponding data point (“Version”), repeating text values (“H317” for “Code”) in lists (“Classification”) and groups (“Header Data Points”)

The Schema tab in the Data Points preview lets the user organize the schema and set validation and automation

Extensions v37

For special customer use cases that cannot be addressed using the standard data point tooling, Acodis offers customized Extensions.

If Extensions are available for a document type, the Data Points Preview includes an additional tab labeled Configuration.

The Configuration tab is only available when extensions are available.

From the Configuration tab, Extensions can be added and arranged. They are applied in the configured order.

Configuration of Extensions

For more information about the capabilities of Extensions, please contact Customer Success.

Schema Validation & Automation

For each data point, validation and automation rules can be configured in the Schema tab. These allow to ensure the text value of a data point has the right format and lets the user configure rules to transform the text.

Schema validation rules and automation can be configured in the Schema tab.

The schema tab lets the user configure specific validation and automation rules for each data point

The validation and automation are executed in the following order as outlined in the figure below

The validation and automation that are specified, are executed as outlined in the figure

If the data point text value violates the selected Data Points | Data Type validation or one of the Data Points | Validation Rules

the data points step will produce a review message of type error Review Messages | Error
the data points tab in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/edit-v2/878772300#Data-Points-Preview will highlight the data point in red
The text value “30.02.2021” is not a valid value for the data point “Revision Date” of type date.

The options available to specify validation and automation rules are described in the following sub-chapters.

Data Type

Specifying a data type for your data point, lets you define what type of format you are expecting from the text value of this data point. E.g. if you have a data point “Revision Date”, you want the text value to be convertible to a date.

The data type can be selected from the dropdown in the schema tab for each data point

The available data types for data points can be found in the table below

Data Type	Description	Validation	Export Format

Data Type	Description	Validation	Export Format
Single Line Text	A text to be exported as a single line	no validation applies	text value is exported without line-breaks
Multi Line Text	A text to be exported as a multi-line	no validation applies	text value is exported including line-breaks
Float	A numeric value	text value can be converted to a numeric value	text value is exported as numeric value
Integer	An integer value	text value can be converted to an integer	text value is exported as integer
IBAN	A text-numeric value that follow the ISO 13616 standard for an International Bank Account Number (IBAN)	text value follows the ISO 13616 standard for IBAN	text-numeric value is exported as text
Date	A date value	text value can be converted to a date	text value is exported in the format YYYY-MM-DD
Chapter	The content of a (sub-)chapter. This data type can be used, if the content of a whole chapter needs to be tagged. The content of a chapter spans from a title until the occurrence of another title of the same hierarchical level or a higher level. If choosing this data type, a https://acodis.atlassian.net/wiki/spaces/DOC/pages/870645762 step is required in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/814710786 appearing before this step. When annotating a text element, the annotation will auto-correct to the next title above, defined by the structure step.	no validation applies	text content of the chapter is exported defined by the annotated title. To tag the whole content of the chapter “2.3 Other hazards” a data point of type “Chapter” can be used to export the chapter content. The content spans until the next higher title. This is the title of “SECTION 3:…” in this case.

Data Transformation

Certain data types allow data transformations that are applied to the corresponding text values of the data point. The following data transformations are available:

Text Replace: Replaces any occurrence of the search text in the text value with the replace text.

In the text of the data point “SDS Number”, all occurrences of “S” are replaced with “X”

Regex Replace Modifies any occurrence of text in the text value that matches the regular expression with according to the regular expression in the “Data transformation” field.

In the text of the data point “SDS Number”, all characters (that match the regular expression \D) are replaced with an “X”

Numeric Correction Removes invalid characters or allows custom decimal separators for data types of type Float and Integer.

Non-numbers can be automatically removed or a custom decimal separator can be specified

Validation Rules

Validation rules can be defined to make sure the data point text values are of a specific expected format. This ensures that the extracted or entered data matches the expectation of the downstream system and raises an error otherwise Review Messages | Error.

E.g Using a validation rule, it could be ensured that exported US zip-numbers are always 5 digits.

The text value of the data point “Code” should be between 4 and 6 characters long

The following validation rules are available for the respective data point Data Points | Data Type

Validation Rule	Description	Available for the Data Points \| Data Type

Validation Rule	Description	Available for the Data Points \| Data Type
Date Range	Specify a range of dates that are allowed for this data point	Date
Length Range	Specify the minimum and/or maximum length of the text value that is allowed for this data point	Single Line Text Multi Line Text
Match Regular Expression	Specify a regular expression that the text value of this data point needs to match	Single Line Text Multi Line Text
Numeric Range	Specify the minimum and/or maximum numeric value that is allowed for the text value of this data point	Integer Float

Confidence Validation

v38

The data point step supports confidence validation in production. Thresholds can be configured per https://acodis.atlassian.net/wiki/spaces/AP/pages/418775067. These messages will appear during review.

The classifier generates a confidence for every annotation. If this confidence is below the configured threshold value, a warning will be generated in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/860946433 Node. By veryfing the page or deleting annotations with low confidence score, the messages can be handled.

Export Automation

v26

Export automation can be used for

validating that the data point text value is not empty
specifying the default (dynamic v30) value if the data point text value is empty
If the data point “Global SDS Identifier” has no text value, the text value is set to the text value of the data point “SDS Number”, concatenated with “_v” and the text value of the data point “Version”

Data Point Export

The data points step produces export assets, that can be downloaded directly from the Acodis platform or exported via one of the export channel nodes. The available formats for the data point export assets are

XML
Json
Excel
CSV

For more details refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/906362881

Bulk Export

The excel export V37 offers a special bulk export from the https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776321's Transaction Listing when multiple transactions are selected (see https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776321/Workflow#Multi-Selection-Actions). When Excel is selected as export format, the user can choose to export it as ZIP or as bulk Excel:

Configuration of Excel Bulk Export

Aggregation:

As zip - Produces a ZIP file containing exactly one Excel file per transaction with all data points in tabular form
As bulk Excel - Produces a single Excel containing all entries in tabular form. Additionally, every a Document Name and a Transaction Id are added such that entries can be backtracked to their transactions.

Prerequisites

This feature is available to users in the role of ADMINistrators Workflow managers Case handlers

To use the step, you need either:

These steps ensure that text is extracted and available for this step.

If you are using a data point of type “Chapter”, titles need to be present for this in the document. Titles can either be present on import (in case for example for MS Word files) or if any of the following steps precedes the data points step:

Settings and Options

The settings dialog of the data points step offers configurations on:

parameters for the model that is trained (refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776331)

The settings dialog of the data points step

For details on the available parameters refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/911114304

How to use

Getting Started

To use the data points step, follow these steps:

Open a document in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/867500034
Click on Add to include the step in your analysis.
Locate the step under the NEW column or the EXISTING column if you have this step created already.
Select the just added step by clicking on it.
If necessary, adjust the configuration settings via the contextual menu (right-click > Settings).
Select the https://acodis.atlassian.net/wiki/spaces/DOC/pages/edit-v2/878772300#Data-Points-Preview to define your data points, the schema or manage Data Points | Schema Validation & Automation
Create a schema by either
1. Manual adding your data points, groups and lists
  1. if using lists and the data points inside the list appear in a document table, use the setting “Optimize for Table Structure and Duplicate Merged Cells” = Yes. This will optimize the the grouping of list elements according to the table structure. Access this setting via the three dot menu on the list in the data points tab.
    Edit the settings of a list in a data points step
    The Optimize for Table Structure and Duplicate Merged Cells setting for lists
2. Auto Schema to automatically create a schema from a selection area on the document Data Points | Auto Label & Auto Schema
Start creating annotations by either
1. Run Analysis to create predictions of titles from the defined model https://acodis.atlassian.net/wiki/spaces/DOC/pages/867762197/Annotation+Steps#Run-Analysis
2. Auto-label to create predictions using a general model (more details below Data Points | Auto Label & Auto Schema)
3. Manually annotate the data points (more details below Data Points | Manual Annotation)
Select the
In the data points preview, download the data points as Excel, CSV, JSON or XML using the download button
Mark fully annotated pages as verified using the toolbar at the bottom of the page

Manual Annotation

To manually annotate text using the data points step, drag along the text using the mouse and select the desired label for it

Drag along the text and select the label

Shortcuts

Manage data point annotations at ease using keyboard shortcuts:

Select annotations keeping the SHIFT key pressed
Un-select annotations keeping the SHIFT + CTRL (CMD on Mac) keys pressed
Annotate multiple text segments keeping the CTRL (CMD on Mac) key pressed and release to assign all selected text segments a label
Annotate all text segments in a selection box keeping the ALT key pressed

Use the SHIFT key to select multiple annotations

Auto-Label & Auto-Schema

For the data points step, there are two https://acodis.atlassian.net/wiki/spaces/DOC/pages/867762197/Annotation+Steps#Auto-Labeling mechanisms to accelerate the initial labeling. These do not require any manual annotations to start with.

Auto-Label can be used to suggest annotations in a selected area when a schema is already created (i.e. if already data points are added with data point names). Auto-Label will suggest annotations based on the data point names and the text in the selected area.

To use auto-label, select the “Auto Label” button in the top toolbar

Create annotation suggestions already defined data points “version”, “Revision Date”, “SDS Number” and “H410”.

Auto Schema can be used to suggest a schema and annotations in a selected area. Auto-Schema will create data points and annotations based on the text in the selected area.

To use auto-schema, select the “Auto Schema” button in the top toolbar

Create a data point schema and annotation suggestions based on the selected area

Usage example

Sub-Pages

For any questions you can contact our support team.