Fuzzy Grouping

Contents[Hide]

The Fuzzy Grouping transform allows grouping of records by looking at the similarity between the values of various columns.

Two records in which a possible misspelling occurs can be grouped together for further analysis, or duplicates can be removed by the setting Output Top Level Records Only. The sensitivity can be set adjusted by setting the Probability Threshold.

Transform - Fuzzy Grouping
Transform - Fuzzy Grouping

1. Input

The Fuzzy Grouping transform requires 1 input transform that has at least 1 column.

The input could be a SQL Select transform, or the result of another transform. For example, the input data is:

Input Data
Input Data

2. Add the Transform

Steps to add the transform:

  1. Select the connector link.

    Adding the Fuzzy Grouping transform - Step 1
    Adding the Fuzzy Grouping transform - Step 1

  2. Select the transform from the menu.

    Adding the Fuzzy Grouping transform - Step 2
    Adding the Fuzzy Grouping transform - Step 2

  3. To Edit/Configue the transform, select the newly added transform, and click the Configure menu.

    Adding the Fuzzy Grouping transform - Step 3
    Adding the Fuzzy Grouping transform - Step 3

3. Configure

Steps to configure the Fuzzy Grouping transform:

For example, you want to retrieve the Top 5 products in terms of 'OrderQty'.

Fuzzy Grouping transform configuration
Fuzzy Grouping transform configuration

  1. Select the columns to be included in the output.
  2. Drag and Drop the column(s) you want to be part of the grouping.
  3. Enter the Probability Threshold. Valid values are from .0001 to 1.0. Value of 1.0 will require input data to be an "exact match" for them to be grouped together.
  4. Select Ignore String Case if you want a non-case sensitive match.
  5. Select Output Top Level Records Only if you want to omit the duplicate records.

4. Output

The figure below illustrates the output from the Fuzzy Grouping transform.

  • With the Output Top Level Records Only option not selected:

  • With the Output Top Level Records Only option selected:

5. See also

Dundas Data Visualization, Inc.
500-250 Ferrand Drive
Toronto, ON, Canada
M3C 3G8

North America: 1.800.463.1492
International: 1.416.467.5100

Dundas Support Hours: 7am-6pm, ET, Mon-Fri