Using a data cube to join tables
- Create a new data cube
- Add a database table
- Add a second database table
- Insert a join transform node
- Connect the 2nd select transform to the join transform
- Configure the join transform
- Join with a third database table
- Configure the 2nd join transform
- Configure the process result
- Data preview
- Check in
- Re-using Data Cubes
- See also
Learn how to create a new data cube in Dundas BI and add transforms to it in order to join three database tables.
This walkthrough shows you how to:
- Create a new data cube.
- Set up an ETL process consisting of transforms.
- Use transforms to join three database tables.
- Link the output columns of a data cube to existing hierarchies.
The figures use the Adventure Works for SQL Server 2012 database as an example.
Related video: Introduction to Data Cubes
2.1. Create a new data cube
From the main menu, click New, click Data Cube, and select the Blank Data Cube option.
The data cube editor is displayed.
2.2. Add a database table
Next, locate your data connector in the Explore window.
Under the data connector, find the [Sales].[SalesOrderDetail] and [Sales].[SalesOrderHeader] tables.
Drag the [Sales].[SalesOrderDetail] table to the canvas.
When you add the first table, Dundas BI automatically creates a simple ETL process consisting of two transforms, with data flowing from left to right. Note that the name of your new data cube (e.g. DataCube1) is displayed in the status bar at the bottom.
The first node in the ETL process is a SQL Select transform, which represents a query to select data from the [Sales].[SalesOrderDetail] table.
The second node is a Process Result transform, which represents the output or result of the data cube (ETL process). This transform doesn't do any data processing but it allows you to configure the measures and hierarchies that will be made available to downstream items, such as metric sets.
An ETL process can have multiple inputs (for example, multiple Select transforms that query data from different databases) but always only a single output (the process result).
2.3. Add a second database table
Add the second table [Sales].[SalesOrderHeader] to the canvas. The second SQL Select transform appears highlighted, which means the transform is not yet connected to the ETL process.
2.4. Insert a join transform node
From the toolbar, click Insert Common or Insert Other to see the transforms that you can add to the canvas. What happens when you insert one of these transforms depends on what you've already selected on the canvas. If you don't have anything selected beforehand, this will insert the new transform and leave it unconnected. If you have selected an existing transform before inserting, the new transform will be placed just after the existing transform and connected to it. You can also select an existing connection link between two transforms and insert the new transform between as described next.
Click to select the connection link between the first SQL Select transform and the Process Result transform.
From the toolbar, click Insert Common and select Join.
A Join transform is inserted into the ETL process between the first SQL Select and the Process Result. The Join transform appears highlighted because it requires two inputs but only one input is connected.
2.5. Connect the 2nd select transform to the join transform
To connect the second SQL Select transform to the Join transform, drag a connection line from the SQL Select node to the Join node. The two transform nodes will be connected automatically.
Both SQL Select transforms are now connected to the Join transform (as inputs) and the highlighted color has disappeared.
2.6. Configure the join transform
Once the Join transform has been connected, you need to configure it by specifying the type of join, the columns to join on, and the columns to include or exclude with the output.
Right-click (or long-tap) over the Join transform and select Configure.
The Join configuration dialog is displayed.
In the Join configuration dialog, there are three steps to perform:
- Choose the Join Type (which can be Inner Join, Left Join, Right Join, or Full Join).
- Look at the two lists of database table columns. Each list corresponds to one of the inputs to the Join transform. To form a unique key binding for the join operation, drag a column from the first list and drop it over a column from the second list. In this example, Dundas BI automatically identifies SalesOrderID as the unique key, which you can see by the icon displayed beside each column that is part of the binding. Note that Dundas BI will set this binding for you automatically in some cases.
- Use the checkboxes to select or de-select columns for inclusion in the output. For example, select OrderQty, ProductID, and SalesOrderID from the left list. Select OrderDate and SalesPersonID from the right list.
2.7. Join with a third database table
The ETL process so far does a join on two database tables. What if you add a third table? Since a join transform has only two inputs, you need to insert a second join transform into the ETL process to accomplish this.
First, go to the Explore window and locate the [Person].[Person] table. Drag the table to the canvas. The table appears as a third SQL Select transform which is highlighted because it is not connected to the ETL process yet.
Next, select the last connection link, click Insert Common from the toolbar, and click Join. This inserts a second Join transform into the ETL process. The Join transform is highlighted in color because it has only one input connected.
Click the SQL Select transform for the Person table, press and hold the CTRL key, and then click the Join transform that was just inserted. Click Connect from the toolbar. Everything is connected properly now and the final ETL process appears similar to the figure below.
2.8. Configure the 2nd join transform
Once the second Join transform is connected, it must be configured. Right-click (or long-tap) over the second Join transform and select Configure.
In the Join configuration dialog, choose Inner as the Join Type.
Form the unique key binding by dragging the SalesPersonID column from the left list and dropping it onto the BusinessEntityID column from the right list. Finally, de-select all columns in the right list except for FirstName. The de-selected columns won't appear in the output of the join transform.
2.9. Configure the process result
The ETL process is almost complete. The remaining step is to configure the measures and hierarchies that will be the output of this data cube.
Click the Process Result transform to see the Data Cube Elements panel, which shows the list of output measures and hierarchies for this data cube (ETL process). Measures and hierarchies are color-coded differently.
At this stage, you can optionally hide a measure or hierarchy from the output by clicking its grey minus icon on the right. The measure or hierarchy will be listed under a HIDDEN section but you can easily unhide it by clicking its plus icon. A measure or hierarchy hidden this way is hidden from the data cube output but will still be available for linking in hierarchy keys (for example, if you subsequently want to use this data cube to define a hierarchy).
Columns such as 'ProductID' are referred to as implicit hierarchies. You can add value to your data cube by replacing such columns with multi-level hierarchies which have been pre-defined outside of the data cube.
For example, to replace the ProductID column with an existing 'Product' hierarchy, click the Edit icon on the right of the ProductID column.
This opens the Edit Data Cube Output Element dialog, which shows the hierarchies you can use to replace the ProductID column. Find your Product hierarchy and drag it onto the drop region.
Follow the same steps to replace the OrderDate column with a time dimension hierarchy.
2.10. Data preview
Click the Data Preview tab to preview the resulting data produced by the ETL process.
The data is presented in the form of a table with all the columns included in the selected transform (in this case, the Process Result).
The row above the data table provides warnings and statistics about the data. The number next to Statistics is a count of all the output data.
Click Statistics to view additional information.
The shown list includes statistics for each column, such as the sum, average value, minimum and maximum values, and distinct count.
2.11. Check in
Check in your data cube from the toolbar so others can use it too.
In most cases, the same process flow has to be repeated over and over in different data cubes. Re-using a data cube will allow users to create a basic data cube and re-use it as a part of other data cubes.
For example, you have an existing data cube using SalesOrderHeader and SalesOrderDetail tables:
In order to re-use an existing data cube (i.e. source data cube), simply drag and drop the source data cube to another data cube canvas. This will add a "Data Cube transform" to the canvas.
The "Data Cube transform" can be used just like any other select transform, e.g configure, join to other tables, filter added to it, etc.
2.12.1. Notes for re-using data cubes
- A single source data cube cannot be used twice in the same data cube. This includes recursive use of the source data cube (using a data cube that uses a data cube).
- If the source data cube is warehouse-cached, the target data cube will also be using data from the warehouse, unless a different storage type is selected for the target data cube. In contrast, if the source data cube is in-memory-cached, the target data cube will not be pulling data from memory.
- If the source data cube has user-defined hierarchies, those hierarchies will be ignored and implicit ones used instead.
3. See also
- Video: Introduction to Data Cubes
- Video: Data Cleansing with Data Cubes
- Data cube storage types
- Connect to data and view it on a dashboard
- Automatic joins and hierarchies
- Replace a data cube column with a hierarchy
- Understanding the Dundas BI Data Model