Data Points

Data Points

Available since: v.18.1.0 Mar 1, 2022

Status: Active

Table of Content

 

Overview

 

Description

image-20250318-214003.png

The data points step is used to annotate text elements with labels. The annotated text is tagged with the respective label in the export.

A Data Point describes the name of a text element (sometimes also referred to as semantic of a text or “Label”)

A Group can be used to logically group different data points together. E.g. some data points in a group “Header Data Points” might be all data points that is appearing in the beginning of a document

A List can be used to handle data points that appear a (potentially unknown) number of times across the document. Lists can contain multiple data points or even lists again.

A Schema refers to all the data points, groups and lists in a data points step. It defines how your data points are grouped and appear in the Data Points | Data Point Export

image-20250319-213606.png
An example of a data points step consisting of 5 data points. 3 are contained in a group “Header Data Points” and 2 data points are repeating inside a list “Classification”

Data Point annotations can be created through

  1. Data Points | Manual Annotation

  2. Data Points | Auto Label & Auto Schema

  3. global annotation predictions (refer to Annotation Steps)

Data Points Preview

The data points step also offers a step-specific Analysis Document Viewer | Previews called “Data Points”.

The data points serves the user to

  1. provide an overview of the data points and their annotated text

  2. manage Data Points | Schema Validation & Automation

  3. download any of the Data Points | Data Point Export

 

The data points preview is organized in tabs:

  1. Data Points

  2. Schema

The Data Points tab gives an overview of each annotated text (or text value) and its corresponding data point. Moreover text values can be manually edited.

The Schema tab’s purpose is to manage the data points, lists and groups in a schema. Moreover, Data Points | Schema Validation & Automation can be configured.

image-20250320-073009.png
The Data Points tab in the Data Points preview shows the data point text values (e.g. “8.0” with the corresponding data point (“Version”), repeating text values (“H317” for “Code”) in lists (“Classification”) and groups (“Header Data Points”)
image-20250320-075224.png
The Schema tab in the Data Points preview lets the user organize the schema and set validation and automation

 

 

 

 

Extensions v37

For special customer use cases that cannot be addressed using the standard data point tooling, Acodis offers customized Extensions.

If Extensions are available for a document type, the Data Points Preview includes an additional tab labeled Configuration.

image-20250523-092325.png
The Configuration tab is only available when extensions are available.

From the Configuration tab, Extensions can be added and arranged. They are applied in the configured order.

image-20250523-092557.png
Configuration of Extensions

For more information about the capabilities of Extensions, please contact Customer Success.

Schema Validation & Automation

For each data point, validation and automation rules can be configured in the Schema tab. These allow to ensure the text value of a data point has the right format and lets the user configure rules to transform the text.

Schema validation rules and automation can be configured in the Schema tab.

image-20250324-212316.png
The schema tab lets the user configure specific validation and automation rules for each data point

The validation and automation are executed in the following order as outlined in the figure below

  1. Data Points | Export Automation v26

  2. Data Points | Data Transformation

  3. Data Points | Validation Rules

  4. Data Points | Data Type

image-20250326-181354.png
The validation and automation that are specified, are executed as outlined in the figure

If the data point text value violates the selected Data Points | Data Type validation or one of the Data Points | Validation Rules

  1. the data points step will produce a review message of type error Review Messages | Error

  2. the data points tab in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/edit-v2/878772300#Data-Points-Preview will highlight the data point in red

    image-20250320-180332.png
    The text value “30.02.2021” is not a valid value for the data point “Revision Date” of type date.

The options available to specify validation and automation rules are described in the following sub-chapters.

Data Type

Specifying a data type for your data point, lets you define what type of format you are expecting from the text value of this data point. E.g. if you have a data point “Revision Date”, you want the text value to be convertible to a date.

image-20250325-061335.png
The data type can be selected from the dropdown in the schema tab for each data point

The available data types for data points can be found in the table below

Data Type

Description

Validation

Export Format

Data Type

Description

Validation

Export Format

image-20250326-172206.png Single Line Text

A text to be exported as a single line

no validation applies

text value is exported without line-breaks

image-20250326-172244.png Multi Line Text

A text to be exported as a multi-line

no validation applies

text value is exported including line-breaks

image-20250326-172411.png Float

A numeric value

text value can be converted to a numeric value

text value is exported as numeric value

image-20250326-172303.png Integer

An integer value

text value can be converted to an integer

text value is exported as integer

image-20250326-172323.png IBAN

A text-numeric value that follow the ISO 13616 standard for an International Bank Account Number (IBAN)

text value follows the ISO 13616 standard for IBAN

text-numeric value is exported as text

image-20250326-172337.png Date

A date value

text value can be converted to a date

text value is exported in the format YYYY-MM-DD

image-20250326-172352.png Chapter

The content of a (sub-)chapter.

This data type can be used, if the content of a whole chapter needs to be tagged. The content of a chapter spans from a title until the occurrence of another title of the same hierarchical level or a higher level.

If choosing this data type, a https://acodis.atlassian.net/wiki/spaces/DOC/pages/870645762 step is required in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/814710786 appearing before this step.

When annotating a text element, the annotation will auto-correct to the next title above, defined by the structure step.

no validation applies

text content of the chapter is exported defined by the annotated title.

DataPoint_Chapter_HowToUse.gif
To tag the whole content of the chapter “2.3 Other hazards” a data point of type “Chapter” can be used to export the chapter content. The content spans until the next higher title. This is the title of “SECTION 3:…” in this case.

Data Transformation

Certain data types allow data transformations that are applied to the corresponding text values of the data point. The following data transformations are available:

Text Replace: Replaces any occurrence of the search text in the text value with the replace text.

image-20250326-170543.png
In the text of the data point “SDS Number”, all occurrences of “S” are replaced with “X”

Regex Replace Modifies any occurrence of text in the text value that matches the regular expression with according to the regular expression in the “Data transformation” field.

image-20250326-171327.png
In the text of the data point “SDS Number”, all characters (that match the regular expression \D) are replaced with an “X”

Numeric Correction Removes invalid characters or allows custom decimal separators for data types of type image-20250326-172411.png Float and image-20250326-172303.png Integer.

image-20250326-171817.png
Non-numbers can be automatically removed or a custom decimal separator can be specified

Validation Rules

Validation rules can be defined to make sure the data point text values are of a specific expected format. This ensures that the extracted or entered data matches the expectation of the downstream system and raises an error otherwise Review Messages | Error.

E.g Using a validation rule, it could be ensured that exported US zip-numbers are always 5 digits.

image-20250326-174228.png
The text value of the data point “Code” should be between 4 and 6 characters long

The following validation rules are available for the respective data point Data Points | Data Type

Validation Rule

Description

Available for the Data Points | Data Type

Validation Rule

Description

Available for the Data Points | Data Type

Date Range

Specify a range of dates that are allowed for this data point

image-20250326-172337.png Date

Length Range

Specify the minimum and/or maximum length of the text value that is allowed for this data point

image-20250326-172206.png Single Line Text

image-20250326-172244.png Multi Line Text

Match Regular Expression

Specify a regular expression that the text value of this data point needs to match

image-20250326-172206.png Single Line Text

image-20250326-172244.png Multi Line Text

Numeric Range

Specify the minimum and/or maximum numeric value that is allowed for the text value of this data point

image-20250326-172303.png Integer

image-20250326-172411.png Float

Confidence Validation

v38

The data point step supports confidence validation in production. Thresholds can be configured per https://acodis.atlassian.net/wiki/spaces/AP/pages/418775067. These messages will appear during review.

The classifier generates a confidence for every annotation. If this confidence is below the configured threshold value, a warning will be generated in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/860946433 Node. By veryfing the page or deleting annotations with low confidence score, the messages can be handled.

Export Automation

v26

Export automation can be used for

  1. validating that the data point text value is not empty

    image-20250326-175322.png
  2. specifying the default (dynamic v30) value if the data point text value is empty

    image-20250326-180630.png
    If the data point “Global SDS Identifier” has no text value, the text value is set to the text value of the data point “SDS Number”, concatenated with “_v” and the text value of the data point “Version”

 

Data Point Export

The data points step produces export assets, that can be downloaded directly from the Acodis platform or exported via one of the export channel nodes. The available formats for the data point export assets are

  • XML

  • Json

  • Excel

  • CSV

For more details refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/906362881

Bulk Export

The excel export V37 offers a special bulk export from the https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776321's Transaction Listing when multiple transactions are selected (see https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776321/Workflow#Multi-Selection-Actions). When Excel is selected as export format, the user can choose to export it as ZIP or as bulk Excel:

image-20250515-165302.png
Configuration of Excel Bulk Export

Aggregation:

  • As zip - Produces a ZIP file containing exactly one Excel file per transaction with all data points in tabular form

  • As bulk Excel - Produces a single Excel containing all entries in tabular form. Additionally, every a Document Name and a Transaction Id are added such that entries can be backtracked to their transactions.

Prerequisites

This feature is available to users in the role of ADMINistrators Workflow managers Case handlers

To use the step, you need either:

These steps ensure that text is extracted and available for this step.

If you are using a data point of type “Chapter”, titles need to be present for this in the document. Titles can either be present on import (in case for example for MS Word files) or if any of the following steps precedes the data points step:

Settings and Options

The settings dialog of the data points step offers configurations on:

  1. parameters for the model that is trained (refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776331)

 

image-20250326-195119.png
The settings dialog of the data points step

For details on the available parameters refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/911114304

How to use

Getting Started

To use the data points step, follow these steps:

  1. Open a document in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/867500034

  2. Click on Add to include the step in your analysis.

  3. Locate the step under the NEW column or the EXISTING column if you have this step created already.

  4. Select the just added step by clicking on it.

  5. If necessary, adjust the configuration settings via the contextual menu (right-click > Settings).

  6. Select the https://acodis.atlassian.net/wiki/spaces/DOC/pages/edit-v2/878772300#Data-Points-Preview to define your data points, the schema or manage Data Points | Schema Validation & Automation

    image-20250319-213606.png
  7. Create a schema by either

    1. Manual adding your data points, groups and lists

      1. if using lists and the data points inside the list appear in a document table, use the setting “Optimize for Table Structure and Duplicate Merged Cells” = Yes. This will optimize the the grouping of list elements according to the table structure. Access this setting via the three dot menu on the list in the data points tab.

        image-20250407-153935.png
        Edit the settings of a list in a data points step
        image-20240820-082700.png
        The Optimize for Table Structure and Duplicate Merged Cells setting for lists
    2. Auto Schema to automatically create a schema from a selection area on the document Data Points | Auto Label & Auto Schema

  8. Start creating annotations by either

    1. Run Analysis to create predictions of titles from the defined model https://acodis.atlassian.net/wiki/spaces/DOC/pages/867762197/Annotation+Steps#Run-Analysis

    2. Auto-label to create predictions using a general model (more details below Data Points | Auto Label & Auto Schema)

    3. Manually annotate the data points (more details below Data Points | Manual Annotation)

  9. Select the

    image-20250318-082014.png
  10. In the data points preview, download the data points as Excel, CSV, JSON or XML using the download button

    image-20250326-204038.png
  11. Mark fully annotated pages as verified using the toolbar at the bottom of the page

    image-20250318-082222.png

Manual Annotation

To manually annotate text using the data points step, drag along the text using the mouse and select the desired label for it

CreateAnnotationDrag.gif
Drag along the text and select the label

Shortcuts

Manage data point annotations at ease using keyboard shortcuts:

  • Select annotations keeping the SHIFT key pressed

  • Un-select annotations keeping the SHIFT + CTRL (CMD on Mac) keys pressed

  • Annotate multiple text segments keeping the CTRL (CMD on Mac) key pressed and release to assign all selected text segments a label

  • Annotate all text segments in a selection box keeping the ALT key pressed

DataPoint_BulkSelectAnnotation.gif
Use the SHIFT key to select multiple annotations
DataPoint_BulkAnnotate.gif

 

DataPoint_BulkAnnotateSelect.gif

Auto-Label & Auto-Schema

For the data points step, there are two https://acodis.atlassian.net/wiki/spaces/DOC/pages/867762197/Annotation+Steps#Auto-Labeling mechanisms to accelerate the initial labeling. These do not require any manual annotations to start with.

Auto-Label can be used to suggest annotations in a selected area when a schema is already created (i.e. if already data points are added with data point names). Auto-Label will suggest annotations based on the data point names and the text in the selected area.

To use auto-label, select the “Auto Label” button in the top toolbar image-20250326-202800.png

AutoLabel.gif
Create annotation suggestions already defined data points “version”, “Revision Date”, “SDS Number” and “H410”.

Auto Schema can be used to suggest a schema and annotations in a selected area. Auto-Schema will create data points and annotations based on the text in the selected area.

To use auto-schema, select the “Auto Schema” button in the top toolbar image-20250326-202800.png

DataPoint_AutoSchema.gif
Create a data point schema and annotation suggestions based on the selected area

 

Usage example

DataPoint_HowToUse.gif

 

Sub-Pages

 

 

For any questions you can contact our support team.