Data cube storage types
- Data cube preparation
- Using data cube storage
- Incremental data update
- See also
After you have designed your data cube, you can optionally choose a Storage Type and potentially increase data retrieval performance or reduce the load on your database server or other data source. There are three storage options available from the toolbar when editing a data cube:
- None – (Default) Connects live to the data source directly when the data cube is used in a visualization, with the choice of short term caching or real-time updates.
- Warehouse – Stores or caches data in the Dundas BI Warehouse database when the cube is built.
- In-memory – Stores data in server memory for even greater performance potential for data analysis queries.
When using warehouse or in-memory storage, you can choose to build the cube's storage on-demand or schedule it to run periodically. Scheduling is particularly useful if you have a line of business database that is not permitted to be queried during the day. In this case, you can schedule a storage operation to run at night, so that data retrieval queries will be made against the warehouse or in-memory storage during the day.
To further increase data retrieval performance, both storage types offer the option of incremental updates. Incrementally built data cubes will retrieve only data that was added to the database since the last build when updating, instead of retrieving all of the data every time the data cube is rebuilt.
Building in-memory storage can require a lot of server memory, so we strongly recommend that you remove or hide unnecessary elements (columns, hierarchies, or measures), which you do not need from your data cube's Process Result transform. This can also benefit warehouse storage by leaving out unnecessary data. In-memory storage is also affected by the number of aggregators enabled on each measure.
For example, edit a measure from your data cube and check the list of Supported Aggregators. By default, only Sum, Count, and Average are selected in order to conserve memory usage, but you can adjust this property as needed.
Double-click the Process Result transform to open its configuration dialog, which lets you uncheck output columns that you don't need. This will completely remove the column from the data cube output.
As an alternative, you can also hide a measure or hierarchy from the output by clicking the grey minus icon on the very right. The measure or hierarchy will be listed under a Hidden section but you can easily unhide it by clicking its plus icon. A column hidden this way will be excluded from in-memory storage processing (which helps with performance) but the column will still be available for subsequent linking in hierarchy keys (for example, if you need to use the data cube later to define a hierarchy by using the column as part of a composite key).
Elements that provide details about or describe items identified by another hierarchy can be selected as Attribute Hierarchies for that hierarchy. This conserves the amount of in-memory storage required, while providing benefits when selecting data such as disabling unwanted totals and expand/collapse functionality by default.
For example, with a ProductID that uniquely identifies each product, you could select various elements such as Color, Class, and Size as attribute hierarchies for ProductID that describe each product.
Attributes can also be selected when creating a hierarchy from the main menu. For details and examples of using attributes in a metric set, see Automatic joins and hierarchies.
3. Using data cube storage
When editing a data cube that is checked out to you, click Storage Type in the toolbar and select the type you want.
The Warehouse storage type lets you store or cache data in the Dundas BI Warehouse database. The In-Memory storage type stores data in server memory for even greater performance gain potential for data analysis queries, and also stores its data in the Dundas BI Warehouse database so that it can be reloaded back into memory in the event of a server restart.
The following sections describe how to set up warehouse and in-memory storage.
3.1. Check in your data cube
Once you've chosen a storage type, you must Check In your data cube before you can build its storage or set up scheduling.
After checking in, you will be asked if you want to build the data cube. Click Ok to begin building the warehouse or in-memory storage, or click Cancel if you want to build it later as shown in the next section.
Next, you will be asked if you want to set up a schedule. Click Ok to open the Schedule Rule dialog, otherwise click Cancel and you can still choose to set up a schedule later.
3.2. Build the data cube
After checking in the data cube, you can also build its warehouse or in-memory storage on-demand any of these three ways:
- When the data cube is open for editing but still checked in, click Warehouse in the toolbar and then Build Warehouse, or click In-Memory and then Build In-Memory.
- Locate the data cube in the main menu by clicking Open and then Data Cube, or find it in the Explore dockable window found on various screens in Dundas BI within the Data Cubes folder. Next, right-click the data cube and choose Build Warehouse or Build In-Memory.
- As an administrator, you can go to the Jobs screen to run, cancel, or see the status of a data cube build.
You'll see a message indicating the build has started. The time required to build the cube depends on the amount of data and other factors described elsewhere in this article.
To open the Schedule Rule dialog for scheduling, open the data cube for editing while it is still checked in.
Click Warehouse in the toolbar and then Schedule Warehouse, or click In-Memory and then Schedule In-Memory.
In the dialog, choose from a wide variety of scheduling options to have the data cube periodically re-build its storage automatically, then click Submit at the bottom.
Administrators can also go to the Jobs screen to edit existing schedules for data cube builds.
3.4. Disable memory management
When using in-memory storage, after the storage exceeds a certain number of records, it begins compressing the records to maintain a stable use of RAM. When the records are used, they are decompressed and then recompressed again after some time.
You can increase the performance of the in-memory data cube by preventing it from recompressing records.
Click the Configure option on the toolbar.
In the Configure Data Cube dialog, tick the checkbox under Disable Memory Management. Then click Submit.
The new setting will take effect after the next time the data cube is built.
The following limitations exist for in-memory warehouse storage:
- In-memory storage type does not support binary data types.
4. Incremental data update
Incrementally building data cubes will only retrieve data that was added to the database since the last build. This has the potential to substantially decrease the time needed to build the data cube, however it will not update or remove existing data.
It is important to be cautious when using incremental updates on a data cube where changes to existing data are possible. Deleted entries will still be present, entries that have been updated to a smaller value will retain their original value, and entries that have been updated to a larger value will appear as a new entry after the data cube has been updated.
You can enable incremental data updates for either the warehouse or in-memory storage type, and then build the data cubes manually or via a schedule, as described above.
Click the Configure Settings option on the toolbar.
Select Perform Incremental Build, and then select all the input columns you want to be rebuilt.
5.1. Build Warehouse does not clear data result cache
After you build the warehouse, when you go to view a dashboard, it is possible that you will be seeing old data that was there before the warehouse was last built. This is because of data result caching, which is not cleared when the warehouse is built.
You can bypass result caching for a metric set if it's meant to update more regularly, or it's possible to customize the application's result caching settings. For details, see Refresh data in Dundas BI.
5.2. What storage type is this data cube using?
If your data cube is checked in, you can see what type of storage it is using from the Explore window, the administration File Explorer, or the Properties References dialog for a referencing item. This saves you from having to edit the data cube just to see its storage type.
When a data cube uses a storage type, it appears with a different icon in the Explore window (or File Explorer). The data cube's tooltip also indicates the type of storage it uses.
5.3. Performance of building in-memory storage
The total time required to build in-memory storage for a particular data cube, and the RAM memory required, can vary greatly depending on a range of factors. Estimating build times is therefore not as simple as saying building X records takes Y minutes, etc.
For example, memory consumption generally depends on:
- How memory is allocated by the .NET framework
- Number of hierarchies and measures used in your data cube process (especially with distinct count)
- How sparse is your fact table (end result of your data cube process)
Ideally, if the in-memory storage result fits entirely in RAM, then the total build time will generally be divided as follows on a typical 8 core CPU box:
- 50% of time spent retrieving/loading data from your data sources
- 50% of time spent building the in-memory data structures
With a higher number of processing cores, the time to build the in-memory data structures can be reduced due to parallelization.
Here are some general guidelines for optimizing in-memory builds:
- De-select columns you don't need in your data cube as early in the process as possible.
- Minimize the number of hierarchies, measures, and aggregators in your data cube.
- Include columns as attributes of a hierarchy where appropriate instead of separate hierarchies.
- Minimize the complexity of your data cube process (e.g., type and number of transforms used).
- Use the Filter transform to restrict data to required ranges.
5.4. Metric set data retrieval is set to raw
When the Data Retrieval Format on a metric set is set to Raw and <Row Number> is placed on Rows, the raw data will be retrieved directly from the original database, bypassing the data cube's storage.
For more information on raw data mode, see the article Metric set analysis tools.