Data Points
Available since: v.18.1.0 Mar 1, 2022 | Status: Active |
Table of Content
- 1 Overview
- 1.1 Description
- 1.1.1 Data Points Preview
- 1.1.2 Extensions v37
- 1.1.3 Schema Validation & Automation
- 1.1.3.1 Data Type
- 1.1.3.2 Data Transformation
- 1.1.3.3 Validation Rules
- 1.1.3.4 Confidence Validation
- 1.1.3.5 Export Automation
- 1.1.4 Data Point Export
- 1.1.4.1 Bulk Export
- 1.2 Prerequisites
- 1.3 Settings and Options
- 1.1 Description
- 2 How to use
- 2.1 Getting Started
- 2.1.1 Manual Annotation
- 2.1.1.1 Shortcuts
- 2.1.2 Auto-Label & Auto-Schema
- 2.1.1 Manual Annotation
- 2.2 Usage example
- 2.1 Getting Started
- 3 Sub-Pages
Overview
Description
The data points step is used to annotate text elements with labels. The annotated text is tagged with the respective label in the export.
A Data Point describes the name of a text element (sometimes also referred to as semantic of a text or “Label”)
A Group can be used to logically group different data points together. E.g. some data points in a group “Header Data Points” might be all data points that is appearing in the beginning of a document
A List can be used to handle data points that appear a (potentially unknown) number of times across the document. Lists can contain multiple data points or even lists again.
A Schema refers to all the data points, groups and lists in a data points step. It defines how your data points are grouped and appear in the Data Points | Data Point Export
Data Point annotations can be created through
global annotation predictions (refer to Annotation Steps)
Data Points Preview
The data points step also offers a step-specific Analysis Document Viewer | Previews called “Data Points”.
The data points serves the user to
provide an overview of the data points and their annotated text
download any of the Data Points | Data Point Export
The data points preview is organized in tabs:
Data Points
Schema
The Data Points tab gives an overview of each annotated text (or text value) and its corresponding data point. Moreover text values can be manually edited.
The Schema tab’s purpose is to manage the data points, lists and groups in a schema. Moreover, Data Points | Schema Validation & Automation can be configured.
Extensions v37
For special customer use cases that cannot be addressed using the standard data point tooling, Acodis offers customized Extensions.
If Extensions are available for a document type, the Data Points Preview includes an additional tab labeled Configuration.
From the Configuration tab, Extensions can be added and arranged. They are applied in the configured order.
For more information about the capabilities of Extensions, please contact Customer Success.
Schema Validation & Automation
For each data point, validation and automation rules can be configured in the Schema tab. These allow to ensure the text value of a data point has the right format and lets the user configure rules to transform the text.
Schema validation rules and automation can be configured in the Schema tab.
The validation and automation are executed in the following order as outlined in the figure below
If the data point text value violates the selected Data Points | Data Type validation or one of the Data Points | Validation Rules
the data points step will produce a review message of type error Review Messages | Error
the data points tab in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/edit-v2/878772300#Data-Points-Preview will highlight the data point in red
The text value “30.02.2021” is not a valid value for the data point “Revision Date” of type date.
The options available to specify validation and automation rules are described in the following sub-chapters.
Data Type
Specifying a data type for your data point, lets you define what type of format you are expecting from the text value of this data point. E.g. if you have a data point “Revision Date”, you want the text value to be convertible to a date.
The available data types for data points can be found in the table below
Data Type | Description | Validation | Export Format |
---|---|---|---|
| A text to be exported as a single line | no validation applies | text value is exported without line-breaks |
| A text to be exported as a multi-line | no validation applies | text value is exported including line-breaks |
| A numeric value | text value can be converted to a numeric value | text value is exported as numeric value |
| An integer value | text value can be converted to an integer | text value is exported as integer |
| A text-numeric value that follow the ISO 13616 standard for an International Bank Account Number (IBAN) | text value follows the ISO 13616 standard for IBAN | text-numeric value is exported as text |
| A date value | text value can be converted to a date | text value is exported in the format YYYY-MM-DD |
| The content of a (sub-)chapter. This data type can be used, if the content of a whole chapter needs to be tagged. The content of a chapter spans from a title until the occurrence of another title of the same hierarchical level or a higher level. If choosing this data type, a https://acodis.atlassian.net/wiki/spaces/DOC/pages/870645762 step is required in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/814710786 appearing before this step. When annotating a text element, the annotation will auto-correct to the next title above, defined by the structure step. | no validation applies | text content of the chapter is exported defined by the annotated title. To tag the whole content of the chapter “2.3 Other hazards” a data point of type “Chapter” can be used to export the chapter content. The content spans until the next higher title. This is the title of “SECTION 3:…” in this case. |
Data Transformation
Certain data types allow data transformations that are applied to the corresponding text values of the data point. The following data transformations are available:
Text Replace: Replaces any occurrence of the search text in the text value with the replace text.
Regex Replace Modifies any occurrence of text in the text value that matches the regular expression with according to the regular expression in the “Data transformation” field.
Numeric Correction Removes invalid characters or allows custom decimal separators for data types of type Float and
Integer.
Validation Rules
Validation rules can be defined to make sure the data point text values are of a specific expected format. This ensures that the extracted or entered data matches the expectation of the downstream system and raises an error otherwise Review Messages | Error.
E.g Using a validation rule, it could be ensured that exported US zip-numbers are always 5 digits.
The following validation rules are available for the respective data point Data Points | Data Type
Validation Rule | Description | Available for the Data Points | Data Type |
---|---|---|
Date Range | Specify a range of dates that are allowed for this data point |
|
Length Range | Specify the minimum and/or maximum length of the text value that is allowed for this data point |
|
Match Regular Expression | Specify a regular expression that the text value of this data point needs to match |
|
Numeric Range | Specify the minimum and/or maximum numeric value that is allowed for the text value of this data point |
|
Confidence Validation
v38
The data point step supports confidence validation in production. Thresholds can be configured per https://acodis.atlassian.net/wiki/spaces/AP/pages/418775067. These messages will appear during review.
The classifier generates a confidence for every annotation. If this confidence is below the configured threshold value, a warning will be generated in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/860946433 Node. By veryfing the page or deleting annotations with low confidence score, the messages can be handled.
Export Automation
v26
Export automation can be used for
validating that the data point text value is not empty
specifying the default (dynamic v30) value if the data point text value is empty
If the data point “Global SDS Identifier” has no text value, the text value is set to the text value of the data point “SDS Number”, concatenated with “_v” and the text value of the data point “Version”
Data Point Export
The data points step produces export assets, that can be downloaded directly from the Acodis platform or exported via one of the export channel nodes. The available formats for the data point export assets are
XML
Json
Excel
CSV
For more details refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/906362881
Bulk Export
The excel export V37 offers a special bulk export from the https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776321's Transaction Listing when multiple transactions are selected (see https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776321/Workflow#Multi-Selection-Actions). When Excel is selected as export format, the user can choose to export it as ZIP or as bulk Excel:
Aggregation:
As zip - Produces a ZIP file containing exactly one Excel file per transaction with all data points in tabular form
As bulk Excel - Produces a single Excel containing all entries in tabular form. Additionally, every a Document Name and a Transaction Id are added such that entries can be backtracked to their transactions.
Prerequisites
This feature is available to users in the role of ADMINistrators Workflow managers Case handlers
To use the step, you need either:
These steps ensure that text is extracted and available for this step.
If you are using a data point of type “Chapter”, titles need to be present for this in the document. Titles can either be present on import (in case for example for MS Word files) or if any of the following steps precedes the data points step:
https://acodis.atlassian.net/wiki/spaces/DOC/pages/870645762
https://acodis.atlassian.net/wiki/spaces/DOC/pages/879657002
Settings and Options
The settings dialog of the data points step offers configurations on:
parameters for the model that is trained (refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/814776331)
For details on the available parameters refer to https://acodis.atlassian.net/wiki/spaces/DOC/pages/911114304
How to use
Getting Started
To use the data points step, follow these steps:
Open a document in the https://acodis.atlassian.net/wiki/spaces/DOC/pages/867500034
Click on Add to include the step in your analysis.
Locate the step under the NEW column or the EXISTING column if you have this step created already.
Select the just added step by clicking on it.
If necessary, adjust the configuration settings via the contextual menu (right-click > Settings).
Select the https://acodis.atlassian.net/wiki/spaces/DOC/pages/edit-v2/878772300#Data-Points-Preview to define your data points, the schema or manage Data Points | Schema Validation & Automation
Create a schema by either
Manual adding your data points, groups and lists
if using lists and the data points inside the list appear in a document table, use the setting “Optimize for Table Structure and Duplicate Merged Cells” = Yes. This will optimize the the grouping of list elements according to the table structure. Access this setting via the three dot menu on the list in the data points tab.
Edit the settings of a list in a data points stepThe Optimize for Table Structure and Duplicate Merged Cells setting for lists
Auto Schema to automatically create a schema from a selection area on the document Data Points | Auto Label & Auto Schema
Start creating annotations by either
Run Analysis to create predictions of titles from the defined model https://acodis.atlassian.net/wiki/spaces/DOC/pages/867762197/Annotation+Steps#Run-Analysis
Auto-label to create predictions using a general model (more details below Data Points | Auto Label & Auto Schema)
Manually annotate the data points (more details below Data Points | Manual Annotation)
Select the
In the data points preview, download the data points as Excel, CSV, JSON or XML using the download button
Mark fully annotated pages as verified using the toolbar at the bottom of the page
Manual Annotation
To manually annotate text using the data points step, drag along the text using the mouse and select the desired label for it
Shortcuts
Manage data point annotations at ease using keyboard shortcuts:
Select annotations keeping the SHIFT key pressed
Un-select annotations keeping the SHIFT + CTRL (CMD on Mac) keys pressed
Annotate multiple text segments keeping the CTRL (CMD on Mac) key pressed and release to assign all selected text segments a label
Annotate all text segments in a selection box keeping the ALT key pressed
Auto-Label & Auto-Schema
For the data points step, there are two https://acodis.atlassian.net/wiki/spaces/DOC/pages/867762197/Annotation+Steps#Auto-Labeling mechanisms to accelerate the initial labeling. These do not require any manual annotations to start with.
Auto-Label can be used to suggest annotations in a selected area when a schema is already created (i.e. if already data points are added with data point names). Auto-Label will suggest annotations based on the data point names and the text in the selected area.
To use auto-label, select the “Auto Label” button in the top toolbar
Auto Schema can be used to suggest a schema and annotations in a selected area. Auto-Schema will create data points and annotations based on the text in the selected area.
To use auto-schema, select the “Auto Schema” button in the top toolbar
Usage example
Sub-Pages
For any questions you can contact our support team.