When it comes to business intelligence (BI) tools and the way they work with data, there are three main data flow techniques that are typically offered. These techniques all come with strengths and weaknesses to solve pretty much any data-related issue on a dashboard.
Let’s explore the options available and how they can be used to solve various data-related issues you might experience on your dashboard.
Direct Data Flow
Sometimes your data is perfect just the way it is, and it doesn’t make sense to have a BI tool force you to redevelop your data in their system. A BI tool that allows for direct data access allows you to use your data directly from your database in real-time with no data modeling or other systems required in between the dashboards and your data.
Sometimes your data needs work and requires cleanup and storage in a BI system. This is where warehousing comes into play. You can join data sources and copy views of your data into the warehouse of a BI system that offers this capability. Don’t confuse the idea of warehousing in a BI tool and building a data warehouse. In the BI tool, you are more than likely building a mini-data warehouse for the data that you need rather than building an entirely new database.
A minority of companies use a special form of data warehousing where data is first warehoused using the concept from earlier, but the data is stored in a way to allow for hyper-fast retrieval of said data by storing the results In-Memory rather than on disk. In-Memory data lives up to the claim of providing very quick data retrieval, but you are getting that performance at a cost. Since these systems require server memory to store your data, depending on the amount of data you choose to store, it can require a more expensive server to host your BI system.
Most business intelligence tools offer one of these data techniques and some of the more flexible ones offer two or more! It’s important to have multiple options for your data flow because you will encounter different scenarios, which will each require a different technique to solve. Even if you only have a single data source, there might be cases where you want to mix and match any of the techniques above. Let’s look at some real-life scenarios and how these data flow models can help solve them.
Data Flow Technique: Direct Data Flow
Scenario A: Data Changes Rapidly
If you’re looking to provide real-time or near real-time options on your dashboards or reports, the last thing you want to do is warehouse your data. Warehouses, be they In-Memory or Traditional, both need to be ‘built’ on a schedule, which makes them incompatible with rapidly changing data.
Scenario B: OLAP Data
If you’re using an OLAP database, make sure that you have a BI tool that supports native OLAP connectivity; many don’t! Tools that are unable to connect directly to OLAP will require you to spend a lot of time and effort flattening your data before you can use it on a dashboard. Why would you want to go through the trouble of re-warehousing your data when you already have a high performance, organized data source? Going directly to the data source is the only option that makes sense. Make sure your tool supports MDX and OLAP properly.
Data Flow Technique: In-Memory Data
Scenario C: Self-Service (Many Fast Requests)
If you are set up in a self-service type of use case, it’s very common for your users to constantly make repeated data requests with slight alterations and as they are looking for insights. This type of use puts a heavy load on your database and is contrary to traditional reports where a single table might be accessed once for a large report per user. As you scale multiple self-service users all hammering your database with requests, load and performance can quickly become a problem. In-Memory data is the perfect solution in this case, as you can offload the database performance to your BI tools and give great performance since the data is in RAM.
Scenario D: Slow Historical Data
Do you have historical data that isn’t changing very often that is simply slow to access out of your existing database? In-Memory data will copy the data that you need and store it in RAM for anyone who needs it in a dashboard or report.
Data Flow Technique: Warehousing
Scenario E: Long Query Time
Do you have a metric on your dashboard that is sourced from a table in your database that is notoriously slow? It’s very common that a single visualization on a dashboard might be causing poor performance on your whole dashboard because it is waiting for that metric to finish loading. Oftentimes, it’s the crunching of the data that is slow and the result of your query is very small and that ends up in a visualization. In this case, you might want to set up a quick warehoused view of the data so that the long query is run on a schedule (maybe every day at 4 am?) when no one cares about the wait and your dashboard can use the warehoused result for the day.
Scenario F: Unstable Data Source (Third-Party)
Sometimes you will have cases where a data source is simply unstable with a flaky connection or a provider that doesn’t give you high uptime. This is especially prominent with web-based data sources or internal databases that are under-spec’d or changing due to ongoing development. The ability to schedule your data and warehouse it will allow you to keep a copy of the data that you need to power your dashboards. When the scheduler attempts to connect to the data and get the latest values, your cached data is kept until the next successful run. If the next success isn’t for a few hours, no problem since your users can still use the dashboard with the old data.
Scenario G: Sensitive Production Database
Similar to the issues discussed in self-service, if your source data is a production database being used by other tools or processes, the last thing you want is for this source database to be negatively impacted by your users who are using your BI tools. Both warehousing and In-Memory will allow you to schedule the retrieval of data, which you can set to a low period (again, like 4 am). Once the data is copied, all users requesting data will be able to use the warehoused version keeping the strain off your sensitive production database.
As you can see, there is no ‘one size fits all technique’ when it comes to using data in different scenarios. When you’re choosing a BI tool, you need to make sure that the tool you choose can adapt to meet any changing scenario you might need to throw at it. Time for a shameless plug! I know a tool that does just that: Dundas BI supports everything I just discussed.
About the Author
Jeff Hainsworth is a Senior Solutions Architect at Dundas Data Visualization with over a decade and a half of experience in Business Intelligence. He has a passion for building, coding and everything visual – you know, shiny things! Check out "Off the Charts... with Jeff", his platform for great content on all things analytics, data visualizations, dashboards, and business intelligence. There’s something for everyone!