Using a data cube to join tables
- Create a new data cube
- Add a database table
- Add a second database table
- Insert a join transform node
- Connect the 2nd select transform to the join transform
- Configure the join transform
- Join with a third database table
- Configure the 2nd join transform
- Configure the process result
- Data preview
- Check in
- Re-using Data Cubes
- See also
Learn how to create a new data cube in Dundas BI and add transforms to it in order to join three database tables.
This walkthrough shows you how to:
- Create a new data cube in Dundas BI.
- Set up an ETL process consisting of transforms.
- Use transforms to join three database tables.
- Link the output columns of a data cube to existing hierarchies.
The figures use the Adventure Works for SQL Server 2012 database as an example.
Related video: Introduction to Data Cubes
2.1. Create a new data cube
From the main menu, click New, click Data Cube, and select the Blank Data Cube option.
The Data Cube Designer screen is displayed.
2.2. Add a database table
Next, locate your data connector in the EXPLORE window.
Under the data connector, find the [Sales].[SalesOrderDetail] and [Sales].[SalesOrderHeader] tables.
Drag the [Sales].[SalesOrderDetail] table to the Data Cube Designer canvas.
When you add the first table, Dundas BI automatically creates a simple ETL process consisting of two transforms, with data flowing from left to right. Note that the name of your new data cube (e.g. DataCube1) is displayed in the status bar at the bottom.
The first node in the ETL process is a SQL Select transform which represents a query to select data from the [Sales].[SalesOrderDetail] table.
The second node is a Process Result transform which represents the output or result of the data cube (ETL process). This transform doesn't do any data processing but it allows you to configure the measures and hierarchies that will be made available to downstream items such as metric sets.
An ETL process can have multiple inputs (e.g. multiple Select transforms which query for data from different databases) but always only a single output (i.e. the process result).
2.3. Add a second database table
Add the second table [Sales].[SalesOrderHeader] to the canvas. A second SQL Select transform appears with a highlighted color, which means the transform is not yet connected to the ETL process.
2.4. Insert a join transform node
Next, go to the toolbar and click Insert Common or Insert Other to see the available transforms which you can add to the canvas. What happens when you insert one of these transforms depends on what you've already selected on the canvas. If you don't have anything selected beforehand, this will insert the new transform and leave it unconnected. If you have selected an existing transform before inserting, the new transform will be placed just after the existing transform and connected to it. You can also select an existing connection link between two transforms and insert the new transform inbetween as described next.
On the Data Cube Designer canvas, select the connection link between the first SQL Select transform and the Process Result transform.
Go to the toolbar and click Insert Common. From the dropdown menu, click Join.
A Join transform is inserted into the ETL process between the first SQL Select and the Process Result. The Join transform appears highlighted in color because it requires two inputs but only one input is connected.
2.5. Connect the 2nd select transform to the join transform
To connect the second SQL Select transform to the Join transform, drag a connection line from the SQL Select node to the Join node and then release. The two transform nodes will be connected automatically.
Both SQL Select transforms are now connected to the Join transform (as inputs) and the highlighted color has disappeared.
2.6. Configure the join transform
Once the Join transform has been connected, you need to configure it by specifying the type of join, the columns to join on, and the columns to include or exclude with the output.
Right-click (or long-tap) over the Join transform and select Configure.
The Join configuration dialog is displayed.
In the Join configuration dialog, there are three steps to perform:
- Choose the Join Type (which can be Inner Join, Left Join, Right Join, or Full Join).
- Look at the two lists of database table columns. Each list corresponds to one of the inputs to the Join transform. To form a unique key binding for the join operation, drag a column from the first list and drop it over a column from the second list. For example, drag SalesOrderID from the left list and drop it onto SalesOrderID in the right list. You'll see an icon displayed beside each column that is part of the binding, and the columns will also be highlighted in color if you hover over them. Note that Dundas BI will set this binding for you automatically in some cases.
- Use the checkboxes to select or de-select columns for inclusion in the output. For example, select OrderQty, ProductID, and SalesOrderID from the left list. Select OrderDate and SalesPersonID from the right list.
2.7. Join with a third database table
The ETL process so far does a join on two database tables. What if you add a third table? Since a join transform has only two inputs, you need to insert a second join transform into the ETL process to accomplish this.
First, go to the EXPLORE window and locate the [Person].[Person] table. Drag the table to the Data Cube Designer canvas. The table appears as a third SQL Select transform which is highlighted because it is not connected to the ETL process yet.
Next, select the last connection link, click Insert Common from the toolbar, and click Join. This inserts a second Join transform into the ETL process. The Join transform is highlighted in color because it has only one input connected.
Click the SQL Select transform for the Person table, press and hold the CTRL key, and then click the Join transform that was just inserted. Click Connect from the toolbar. Everything is connected properly now and the final ETL process appears similar to the figure below.
2.8. Configure the 2nd join transform
Once the second Join transform is connected, it must be configured. Right-click (or long-tap) over the second Join transform and select Configure.
In the Join configuration dialog, choose Inner as the Join Type.
Form the unique key binding by dragging the SalesPersonID column from the left list and dropping it onto the BusinessEntityID column from the right list. Finally, de-select all columns in the right list except for FirstName. The de-selected columns won't appear in the output of the join transform.
2.9. Configure the process result
The ETL process is almost complete. The remaining step is to configure the measures and hierarchies that will be the output of this data cube.
Click the Process Result transform to see the Data Cube Elements panel, which shows the list of output measures and hierarchies for this data cube (ETL process). Measures and hierarchies are color-coded differently.
At this stage, you can optionally hide a measure or hierarchy from the output by clicking its grey minus icon on the very right. The measure or hierarchy will be listed under a HIDDEN section but you can easily unhide it by clicking its plus icon. A measure or hierarchy hidden this way is hidden from the data cube output but will still be available for linking in hierarchy keys (e.g., if you subsequently want to use this data cube to define a hierarchy).
Columns such as 'ProductID' are referred to as implicit hierarchies. You can add value to your data cube by replacing such columns with multi-level hierarchies which have been pre-defined outside of the data cube.
For example, to replace the ProductID column with an existing 'Product' hierarchy, click the Edit icon on the right of the ProductID column. This opens the Edit Data Cube Output Element dialog which shows the available hierarchies in the current project you can use to replace the ProductID column. Find your Product hierarchy, expand to see its levels, and then drag the hierarchy onto the drop region as shown in the figure below.
Follow the same steps as above to replace the OrderDate column with a time dimension hierarchy. The final output of the data cube is shown below.
2.10. Data preview
Once the Process Result transform is configured, click the DATA PREVIEW tab to preview some of the resulting data produced by the overall ETL process.
2.11. Check in
Check in your data cube from the toolbar so others can use it too.
In most cases, the same process flow has to be repeated over and over in different data cubes. Re-using a data cube will allow users to create a basic data cube and re-use it as a part of other data cubes.
For example, you have an existing data cube using SalesOrderHeader and SalesOrderDetail tables:
In order to re-use an existing data cube (i.e. source data cube), simply drag and drop the source data cube to another data cube canvas.
This will add a "Data Cube transform" to the canvas.
The "Data Cube transform" can be used just like any other select transform, e.g configure, join to other tables, filter added to it, etc.
2.12.1. Notes for re-using data cubes
- A single source data cube cannot be used twice in the same data cube. This includes recursive use of the source data cube (using a data cube that uses a data cube).
- If the source data cube is warehouse-cached, the target data cube will also be using data from the warehouse, unless a different storage type is selected for the target data cube. In contrast, if the source data cube is in-memory-cached, the target data cube will not be pulling data from memory.
- If the source data cube has user-defined hierarchies, those hierarchies will be ignored and implicit ones used instead.
3. See also
- Video: Introduction to Data Cubes
- Video: Data Cleansing with Data Cubes
- Data cube storage types
- Connect to data and view it on a dashboard
- Automatic joins and hierarchies
- Replace a data cube column with a hierarchy
- Understanding the Dundas BI Data Model