Using a data cube to join tables
- Overview
- Walkthrough
- Create a new data cube
- Add a database table
- Add a second database table
- Insert a join transform node
- Connect the 2nd select transform to the join transform
- Configure the join transform
- Join with a third database table
- Configure the 2nd join transform
- Configure the process result
- Data preview
- Check in
- Re-using in other data cubes
- See also
1. Overview
Use data cubes to integrate data from different sources, transform the data, and produce a reusable data model with optional data storage capabilities.
This walkthrough shows you how to:
- Create a new data cube.
- Set up an ETL process consisting of transforms.
- Use transforms to join three database tables.
- Link the output columns of a data cube to existing hierarchies.
You need to be a user with a Developer seat to create or edit a data cube. The figures use the Adventure Works for SQL Server 2012 database as an example.
Related video: Introduction to Data Cubes
2. Walkthrough
2.1. Create a new data cube
From the main menu, click New, click Data Cube, and select the Blank Data Cube option.
The data cube editor is displayed.
2.2. Add a database table
Next, locate your data connector in the Explore window.
Under the data connector, find the [Sales].[SalesOrderDetail] and [Sales].[SalesOrderHeader] tables.
Drag the [Sales].[SalesOrderDetail] table to the canvas.
When you add the first table, Dundas BI automatically creates a simple ETL process consisting of two transforms, with data flowing from left to right. Note that the name of your new data cube (e.g., Data Cube 1) is displayed in the status bar at the bottom.
The first node in the ETL process is a SQL Select transform, which represents a query to select data from the [Sales].[SalesOrderDetail] table.
The second node is a Process Result transform, which represents the output or result of the data cube (ETL process). This transform doesn't do any data processing but it allows you to configure the measures and hierarchies that will be made available to downstream items, such as metric sets.
An ETL process can have multiple inputs (for example, multiple Select transforms that query data from different databases) but always only a single output (the process result).
2.3. Add a second database table
Add the second table [Sales].[SalesOrderHeader] to the canvas. The second SQL Select transform appears highlighted, which means the transform is not yet connected to the ETL process.
2.4. Insert a join transform node
Click to select the connection link between the first SQL Select transform and the Process Result transform.
From the toolbar, click Insert Common and select Join.
A Join transform is inserted into the ETL process where the link was selected between the first SQL Select and the Process Result. If nothing was selected on the canvas, the new transform will be disconnected and you can click and drag between the transforms to connect them. If a transform was selected, the new transform would be placed after it and connected to it.
In our example, the join transform is highlighted in red because it requires two inputs but only one input is connected.
2.5. Connect the 2nd select transform to the join transform
To connect the second SQL Select transform to the join transform, drag a connection line from the SQL Select to the join.
Alternatively, you can hold the Ctrl or Shift keys while clicking to select the two transforms, and then choose Connect in the toolbar.
With both SQL Select transforms connected to the join transform as inputs, the highlight color has disappeared.
2.6. Configure the join transform
Once the join transform has been connected, you typically need to configure it. You can click to select the transform and choose Configure in the toolbar or from the right-click context menu.
The Join configuration dialog is displayed.
There are three main steps to perform when configuring a join:
- Choose the Join Type (which can be Inner Join, Left Join, Right Join, or Full Join).
- Set up the unique key binding, indicated by key icons next to columns containing matching values from each table. To add one, drag a column from one list and drop it over the matching column from the other list. In this example, Dundas BI automatically identifies SalesOrderID as the unique key binding based on the native relationships defined in the database.
- Use the checkboxes to select or de-select columns for inclusion in the output. For example, select OrderQty, ProductID, and SalesOrderID from the order details, and OrderDate and SalesPersonID from the order headers.
See the Join transform article for more details on these options.
2.7. Join with a third database table
The ETL process so far joins two database tables. To add a third table, because a join transform has only two inputs, you will need to insert a second join transform into the ETL process.
In the Explore window, locate the [Person].[Person] table and drag it to the canvas. The table appears as a third SQL Select transform, which is highlighted because it is not connected to the ETL process yet.
Next, select the last connection link, click Insert Common in the toolbar, and choose Join. This inserts a second join transform into the ETL process. The join transform is highlighted in color because it has only one input connected.
Add a connection between the SQL Select transform for the Person table and the new join transform, for example by clicking and dragging between them.
With all the transforms connected, the final ETL process will appear similar to the figure below.
2.8. Configure the 2nd join transform
Once the second join transform is connected in this example, it must be configured. Click or right-click it, and select Configure from the toolbar or context menu.
In the Join configuration dialog, choose Inner as the Join Type.
Add the unique key binding by dragging the SalesPersonID column from the sales order data and dropping it onto the BusinessEntityID column from the Person table.
Finally, de-select all column checkboxes from the Person table except for FirstName to exclude the others from the output.
2.9. Configure the process result
The last step of the ETL process is the Process Result, where you can configure the measures and hierarchies that can be used in metric sets.
Click the Process Result transform to see the Data Cube Elements panel, which shows the list of measures and hierarchies output from this data cube. Measures (numbers) and hierarchies (usually non-numeric) are color-coded differently.
You can customize each element's name, description, predefined formatting, and more.
Simple columns of data identified as a hierarchy rather than a measure, such as ProductID, are referred to as implicit hierarchies. You can replace these with hierarchies that were created from the main menu to define multiple levels or other settings.
For example, click to edit ProductID, then click Select a hierarchy or level to use as a replacement. Select the hierarchy, or expand it and select a particular level that matches the column's values.
You can follow the same steps to replace the OrderDate column with a built-in time dimension hierarchy or one of your own.
For more details on these options, see the Process Result article.
2.10. Data preview
Open the Data Preview window to preview the data output by whichever transform is selected. With Process Result selected, this shows the data produced by the ETL process.
The data is presented in the form of a table with all the columns output by the selected transform.
The row above the data table provides access to warnings and statistics about the data. Click Statistics to view a summary of each column, such as the average, minimum & maximum, distinct count, and the number of null values (which can result in Unknown members when used in metric sets).
2.11. Check in
Check in your data cube from the toolbar so others can use it too. Everyone with access to the cube with a power user seat or higher can use its data to create metric sets.
2.12. Re-using in other data cubes
You can reuse the process flow of a data cube in other data cubes to avoid setting up the same transforms multiple times, simply by dragging another data cube onto your canvas.
For details, see the Data Cube transform.