Using a data cube to join tables

Contents[Hide]

1. Overview

Learn how to create a new data cube in Dundas BI and add transforms to it in order to join three database tables.

This walkthrough shows you how to:

  • Create a new data cube in Dundas BI.
  • Set up an ETL process consisting of transforms.
  • Use transforms to join three database tables.
  • Link the output columns of a data cube to existing hierarchies.

The figures use the Adventure Works for SQL Server 2012 database as an example.

Related video: Introduction to Data Cubes

2. Walkthrough

2.1. Create a new data cube

From the main menu, click New, click Data Cube, and select the Blank Data Cube option.

New data cube
New data cube

The Data Cube Designer screen is displayed.

2.2. Add a database table

Next, locate your data connector in the EXPLORE window.

Under the data connector, find the [Sales].[SalesOrderDetail] and [Sales].[SalesOrderHeader] tables.

Drag the [Sales].[SalesOrderDetail] table to the Data Cube Designer canvas.

Drag table to canvas
Drag table to canvas

When you add the first table, Dundas BI automatically creates a simple ETL process consisting of two transforms, with data flowing from left to right. Note that the name of your new data cube (e.g. DataCube1) is displayed in the status bar at the bottom.

ETL process
ETL process

The first node in the ETL process is a SQL Select transform which represents a query to select data from the [Sales].[SalesOrderDetail] table.

The second node is a Process Result transform which represents the output or result of the data cube (ETL process). This transform doesn't do any data processing but it allows you to configure the measures and hierarchies that will be made available to downstream items such as metric sets.

An ETL process can have multiple inputs (e.g. multiple Select transforms which query for data from different databases) but always only a single output (i.e. the process result).

2.3. Add a second database table

Add the second table [Sales].[SalesOrderHeader] to the canvas. A second SQL Select transform appears with a highlighted color, which means the transform is not yet connected to the ETL process.

Drag second table to canvas
Drag second table to canvas

2.4. Insert a join transform node

Next, go to the toolbar and click Insert Common or Insert Other to see the available transforms which you can add to the canvas. What happens when you insert one of these transforms depends on what you've already selected on the canvas. If you don't have anything selected beforehand, this will insert the new transform and leave it unconnected. If you have selected an existing transform before inserting, the new transform will be placed just after the existing transform and connected to it. You can also select an existing connection link between two transforms and insert the new transform inbetween as described next.

On the Data Cube Designer canvas, select the connection link between the first SQL Select transform and the Process Result transform. 

Go to the toolbar and click Insert Common. From the dropdown menu, click Join.

Select connection link and insert Join transform
Select connection link and insert Join transform

A Join transform is inserted into the ETL process between the first SQL Select and the Process Result. The Join transform appears highlighted in color because it requires two inputs but only one input is connected.

Join transform inserted
Join transform inserted

Tip
If you make a mistake connecting transforms, you can break a connection link by selecting the link and then clicking Disconnect from the toolbar.

2.5. Connect the 2nd select transform to the join transform

To connect the second SQL Select transform to the Join transform, drag a connection line from the SQL Select node to the Join node and then release. The two transform nodes will be connected automatically.

Connect the two transforms by dragging a line
Connect the two transforms by dragging a line

Both SQL Select transforms are now connected to the Join transform (as inputs) and the highlighted color has disappeared.

Transforms are connected
Transforms are connected

Tip
Instead of inserting a join transform manually and connecting the 2nd select transform to it, there is a shortcut which does this all in one step. Simply drag a connecting line from the 2nd (unconnected) transform to the 1st transform and then release. A join transform will be automatically inserted and connected to both inputs.

Shortcut way to join two transforms: Drag connecting line from unconnected transform to first transform
Shortcut way to join two transforms: Drag connecting line from unconnected transform to first transform

2.6. Configure the join transform

Once the Join transform has been connected, you need to configure it by specifying the type of join, the columns to join on, and the columns to include or exclude with the output.

Right-click (or long-tap) over the Join transform and select Configure.

Configure Join transform
Configure Join transform

The Join configuration dialog is displayed.

Join configuration dialog
Join configuration dialog

In the Join configuration dialog, there are three steps to perform:

  1. Choose the Join Type (which can be Inner Join, Left Join, Right Join, or Full Join).
  2. Look at the two lists of database table columns. Each list corresponds to one of the inputs to the Join transform. To form a unique key binding for the join operation, drag a column from the first list and drop it over a column from the second list. For example, drag SalesOrderID from the left list and drop it onto SalesOrderID in the right list. You'll see an icon displayed beside each column that is part of the binding, and the columns will also be highlighted in color if you hover over them. Note that Dundas BI will set this binding for you automatically in some cases.
  3. Use the checkboxes to select or de-select columns for inclusion in the output. For example, select OrderQty, ProductID, and SalesOrderID from the left list. Select OrderDate and SalesPersonID from the right list.

2.7. Join with a third database table

The ETL process so far does a join on two database tables. What if you add a third table? Since a join transform has only two inputs, you need to insert a second join transform into the ETL process to accomplish this.

First, go to the EXPLORE window and locate the [Person].[Person] table. Drag the table to the Data Cube Designer canvas. The table appears as a third SQL Select transform which is highlighted because it is not connected to the ETL process yet.

Next, select the last connection link, click Insert Common from the toolbar, and click Join. This inserts a second Join transform into the ETL process. The Join transform is highlighted in color because it has only one input connected.

Click the SQL Select transform for the Person table, press and hold the CTRL key, and then click the Join transform that was just inserted. Click Connect from the toolbar. Everything is connected properly now and the final ETL process appears similar to the figure below.

Final ETL process
Final ETL process

2.8. Configure the 2nd join transform

Once the second Join transform is connected, it must be configured. Right-click (or long-tap) over the second Join transform and select Configure.

In the Join configuration dialog, choose Inner as the Join Type.

Form the unique key binding by dragging the SalesPersonID column from the left list and dropping it onto the BusinessEntityID column from the right list. Finally, de-select all columns in the right list except for FirstName. The de-selected columns won't appear in the output of the join transform.

Configure second Join transform
Configure second Join transform

2.9. Configure the process result

The ETL process is almost complete. The remaining step is to configure the measures and hierarchies that will be the output of this data cube.

Click the Process Result transform to see the Data Cube Elements panel, which shows the list of output measures and hierarchies for this data cube (ETL process). Measures and hierarchies are color-coded differently.

At this stage, you can optionally hide a measure or hierarchy from the output by clicking its grey minus icon on the very right. The measure or hierarchy will be listed under a HIDDEN section but you can easily unhide it by clicking its plus icon. A measure or hierarchy hidden this way is hidden from the data cube output but will still be available for linking in hierarchy keys (e.g., if you subsequently want to use this data cube to define a hierarchy).

Configure Process Result transform
Configure Process Result transform

Columns such as 'ProductID' are referred to as implicit hierarchies. You can add value to your data cube by replacing such columns with multi-level hierarchies which have been pre-defined outside of the data cube.

For example, to replace the ProductID column with an existing 'Product' hierarchy, click the Edit icon on the right of the ProductID column. This opens the Edit Data Cube Output Element dialog which shows the available hierarchies in the current project you can use to replace the ProductID column. Find your Product hierarchy, expand to see its levels, and then drag the hierarchy onto the drop region as shown in the figure below.

Drag Product hierarchy to drop region
Drag Product hierarchy to drop region

Follow the same steps as above to replace the OrderDate column with a time dimension hierarchy. The final output of the data cube is shown below.

Final configuration of Process Result transform
Final configuration of Process Result transform

Tip
Right-click or long-tap on a Process Result transform to access its context menu. From the menu, you can open the Data Cube Elements panel or the configuration dialog for the transform.

2.10. Data preview

Once the Process Result transform is configured, click the DATA PREVIEW tab to preview some of the resulting data produced by the overall ETL process.

Click DATA PREVIEW to preview the resulting data
Click DATA PREVIEW to preview the resulting data

2.11. Check in

Check in your data cube from the toolbar so others can use it too.

2.12. Re-using Data Cubes

In most cases, the same process flow has to be repeated over and over in different data cubes. Re-using a data cube will allow users to create a basic data cube and re-use it as a part of other data cubes.

For example, you have an existing data cube using SalesOrderHeader and SalesOrderDetail tables:

Source data cube
Source data cube

In order to re-use an existing data cube (i.e. source data cube), simply drag and drop the source data cube to another data cube canvas.

Drag and drop existing data cube to canvas
Drag and drop existing data cube to canvas

This will add a "Data Cube transform" to the canvas.

Data Cube transform
Data Cube transform

The "Data Cube transform" can be used just like any other select transform, e.g configure, join to other tables, filter added to it, etc.

SQL Select, Join and Filter transform added
SQL Select, Join and Filter transform added

2.12.1. Notes for re-using data cubes

  • A single source data cube cannot be used twice in the same data cube. This includes recursive use of the source data cube (using a data cube that uses a data cube).
  • If the source data cube is warehouse-cached, the target data cube will also be using data from the warehouse, unless a different storage type is selected for the target data cube. In contrast, if the source data cube is in-memory-cached, the target data cube will not be pulling data from memory.
  • If the source data cube has user-defined hierarchies, those hierarchies will be ignored and implicit ones used instead.

3. See also

 

Dundas Data Visualization, Inc.
500-250 Ferrand Drive
Toronto, ON, Canada
M3C 3G8

North America: 1.800.463.1492
International: 1.416.467.5100

Dundas Support Hours: 7am-6pm, ET, Mon-Fri