Using a data cube to join tables

Contents[Hide]

1. Overview

Use data cubes to integrate data from different sources, transform the data, and produce a reusable data model with optional data storage capabilities.

This walkthrough shows you how to:

  • Create a new data cube.
  • Set up an ETL process consisting of transforms.
  • Use transforms to join three database tables.
  • Link the output columns of a data cube to existing hierarchies.

You need to be a user with a Developer seat to create or edit a data cube. The figures use the Adventure Works for SQL Server 2012 database as an example.

Related video: Introduction to Data Cubes

Tip
When editing a metric set, you can also join tables and data sources automatically without editing a data cube.

2. Walkthrough

2.1. Create a new data cube

From the main menu, click New, click Data Cube, and select the Blank Data Cube option.

New data cube
New data cube

Note
You need to be a developer user to create and edit data cubes.

The data cube editor is displayed.

2.2. Add a database table

Next, locate your data connector in the Explore window.

Under the data connector, find the [Sales].[SalesOrderDetail] and [Sales].[SalesOrderHeader] tables.

Drag the [Sales].[SalesOrderDetail] table to the canvas.

When you add the first table, Dundas BI automatically creates a simple ETL process consisting of two transforms, with data flowing from left to right. Note that the name of your new data cube (e.g., Data Cube 1) is displayed in the status bar at the bottom.

ETL process
ETL process

The first node in the ETL process is a SQL Select transform, which represents a query to select data from the [Sales].[SalesOrderDetail] table.

The second node is a Process Result transform, which represents the output or result of the data cube (ETL process). This transform doesn't do any data processing but it allows you to configure the measures and hierarchies that will be made available to downstream items, such as metric sets.

An ETL process can have multiple inputs (for example, multiple Select transforms that query data from different databases) but always only a single output (the process result).

2.3. Add a second database table

Add the second table [Sales].[SalesOrderHeader] to the canvas. The second SQL Select transform appears highlighted, which means the transform is not yet connected to the ETL process.

Drag second table to canvas
Drag second table to canvas

2.4. Insert a join transform node

Click to select the connection link between the first SQL Select transform and the Process Result transform. 

From the toolbar, click Insert Common and select Join.

Select connection link and insert Join a transform
Select connection link and insert Join a transform

Note
Most transformations are incorporated into a query to be executed by your database or other data source in order to optimize performance. Certain transforms such as Calculated Element can introduce icons with an exclamation mark in the top-left corner of the transform, indicating that data will be taken in-memory for processing at this point.

A Join transform is inserted into the ETL process where the link was selected between the first SQL Select and the Process Result. If nothing was selected on the canvas, the new transform will be disconnected and you can click and drag between the transforms to connect them. If a transform was selected, the new transform would be placed after it and connected to it.

In our example, the join transform is highlighted in red because it requires two inputs but only one input is connected.

Join transform inserted
Join transform inserted

Tip
If you make a mistake connecting transforms, you can break a connection link by selecting the link and then clicking Disconnect from the toolbar.

2.5. Connect the 2nd select transform to the join transform

To connect the second SQL Select transform to the join transform, drag a connection line from the SQL Select to the join.

Connect the two transforms by dragging a line
Connect the two transforms by dragging a line

Alternatively, you can hold the Ctrl or Shift keys while clicking to select the two transforms, and then choose Connect in the toolbar.

With both SQL Select transforms connected to the join transform as inputs, the highlight color has disappeared.

Transforms are connected
Transforms are connected

Tip
Instead of inserting a join transform and connecting it yourself, you can simply drag a connecting line between the two transforms that you want to join and a join transform will be inserted and connected automatically.

Shortcut: drag a connecting line between two transforms to join them
Shortcut: drag a connecting line between two transforms to join them

2.6. Configure the join transform

Once the join transform has been connected, you typically need to configure it. You can click to select the transform and choose Configure in the toolbar or from the right-click context menu.

Configure Join transform
Configure Join transform

The Join configuration dialog is displayed.

Join configuration dialog
Join configuration dialog

There are three main steps to perform when configuring a join:

  1. Choose the Join Type (which can be Inner Join, Left Join, Right Join, or Full Join).

  2. Set up the unique key binding, indicated by key icons next to columns containing matching values from each table. To add one, drag a column from one list and drop it over the matching column from the other list. In this example, Dundas BI automatically identifies SalesOrderID as the unique key binding based on the native relationships defined in the database.

  3. Use the checkboxes to select or de-select columns for inclusion in the output. For example, select OrderQty, ProductID, and SalesOrderID from the order details, and OrderDate and SalesPersonID from the order headers.

See the Join transform article for more details on these options.

2.7. Join with a third database table

The ETL process so far joins two database tables. To add a third table, because a join transform has only two inputs, you will need to insert a second join transform into the ETL process.

In the Explore window, locate the [Person].[Person] table and drag it to the canvas. The table appears as a third SQL Select transform, which is highlighted because it is not connected to the ETL process yet.

Next, select the last connection link, click Insert Common in the toolbar, and choose Join. This inserts a second join transform into the ETL process. The join transform is highlighted in color because it has only one input connected.

Add a connection between the SQL Select transform for the Person table and the new join transform, for example by clicking and dragging between them.

With all the transforms connected, the final ETL process will appear similar to the figure below.

Final ETL process
Final ETL process

2.8. Configure the 2nd join transform

Once the second join transform is connected in this example, it must be configured. Click or right-click it, and select Configure from the toolbar or context menu.

In the Join configuration dialog, choose Inner as the Join Type.

Add the unique key binding by dragging the SalesPersonID column from the sales order data and dropping it onto the BusinessEntityID column from the Person table.

Finally, de-select all column checkboxes from the Person table except for FirstName to exclude the others from the output.

Configure second Join transform
Configure second Join transform

2.9. Configure the process result

The last step of the ETL process is the Process Result, where you can configure the measures and hierarchies that can be used in metric sets.

Click the Process Result transform to see the Data Cube Elements panel, which shows the list of measures and hierarchies output from this data cube. Measures (numbers) and hierarchies (usually non-numeric) are color-coded differently.

Configure Process Result transform
Configure Process Result transform

You can customize each element's name, description, predefined formatting, and more.

Simple columns of data identified as a hierarchy rather than a measure, such as ProductID, are referred to as implicit hierarchies. You can replace these with hierarchies that were created from the main menu to define multiple levels or other settings.

For example, click to edit ProductID, then click Select a hierarchy or level to use as a replacement. Select the hierarchy, or expand it and select a particular level that matches the column's values.

Selecting a hierarchy to use as a replacement
Selecting a hierarchy to use as a replacement

You can follow the same steps to replace the OrderDate column with a built-in time dimension hierarchy or one of your own.

For more details on these options, see the Process Result article.

2.10. Data preview

Open the Data Preview window to preview the data output by whichever transform is selected. With Process Result selected, this shows the data produced by the ETL process.

Click Data Preview to preview the resulting data
Click Data Preview to preview the resulting data

The data is presented in the form of a table with all the columns output by the selected transform.

The row above the data table provides access to warnings and statistics about the data. Click Statistics to view a summary of each column, such as the average, minimum & maximum, distinct count, and the number of null values (which can result in Unknown members when used in metric sets).

Click Statistics to view data summaries
Click Statistics to view data summaries

2.11. Check in

Check in your data cube from the toolbar so others can use it too. Everyone with access to the cube with a power user seat or higher can use its data to create metric sets.

2.12. Re-using in other data cubes

You can reuse the process flow of a data cube in other data cubes to avoid setting up the same transforms multiple times, simply by dragging another data cube onto your canvas.

For details, see the Data Cube transform.

3. See also

Dundas Data Visualization, Inc.
400-15 Gervais Drive
Toronto, ON, Canada
M3C 1Y8

North America: 1.800.463.1492
International: 1.416.467.5100

Dundas Support Hours:
Phone: 9am-6pm, ET, Mon-Fri
Email: 7am-6pm, ET, Mon-Fri