Why Are Databases Trash Cans? (And Why Do They Make BI So Difficult?)


As someone who has spent the last 14 years prototyping, advising, and building dashboards for clients in various industries, one thing I’ve noticed that all these companies have in common are databases that are figurative trashcans.  

Guess how often I've run into a database that is perfectly suited to build the dashboards and reports that someone wanted? What do you think? 80% of the time? 50% of the time? Well, I've seen it once! Finding a perfectly clean database is like finding a unicorn, and sometimes I question if they exist. Unicorns, I mean… I'm pretty sure clean databases are real…  

Let's take the example of a CRM system and the underlying database that powers it. In many cases, companies might need to apply ongoing changes to the database to fix flaws made in the initial design. Still, even if a database supporting the business is designed well from the beginning, companies change and grow, and each new leader might come with new processes. Over time, the original database design becomes less relevant to the current needs of the business. As a result, due to this loss in relevance, band-aids and short-term changes are inevitable to keep it working.

Part of the problem is that you will have 'too many chefs in the kitchen'. Departments like Marketing, Sales, Accounting and Support systems all might interact with different parts of the CRM at any given time. Every time a manager from one of these departments wants to make a change – like adding a new field – they are independently asking the database owner to do something to the back-end to support the change (likely without realizing it). It doesn't take long for things to start to morph from an initially well-implemented system to a growing heap of trash; just fast forward a few years and keep repeating this with the regular churn of business. It's common to have a never-ending flow of new processes, new employees, new ideas and a changing organization over time.

You're probably thinking to yourself, 'why do you not just maintain and rewire the database properly as changes happen?'. Part of this problem comes down to the hoarding mindset that most people have. There is often a sentiment that nothing should be changed or deleted, so to solve problems, less elegant solutions like hacks get applied to the database. The database owner will commonly get told to, ‘not remove anything’ and ‘keep the old stuff just in case' because, 'I don't know what other people might be doing with it'. Fast forward a bit more into the future, and you're getting into some severe disorder within your back-end system.

The problem of the moving target described earlier is just one issue that databases face, but it's a serious one. There are others you should be aware of, but these are more often technical problems versus actual abuse. When I say databases are trashcans, it's not far from the truth, as after applying some of these technical problems on something that is already a patchwork disaster; you've got a trashcan indeed.

Here are some compounding problems:

  1. Weak Normalization (performance)
  2. Redundancy (performance)
  3. Referential Integrity (data quality)
  4. Ignoring Database Features (performance, quality, automation)
  5. Poor Indexing (performance)
  6. Poor Naming Conventions (usability)

Now, let's fast forward to the present, to a scenario where senior leadership determines that they need to do better reporting and chooses to implement a Business Intelligence (BI) tool. Most BI tools that business-people will gravitate to are very easy-to-use with simplistic data. These tools expect perfect and straightforward data to allow for simple and easy-to-build dashboards. The problem, of course, is that the back-end system is anything but simple, and the business is stuck with a catastrophe having spent money on a BI tool that can't solve their problems. What can you do? Well, you have two main options.

First, it's probably not practical to rebuild your entire back-end database due to all the interconnected business processes and applications as the task would be monumental. The leftover solution is to build a data warehouse for reporting use. Build a nice and clean reporting database that pulls, cleans and copies select data from your trashcan of a database. Building a reporting data warehouse can be a time consuming and potentially expensive endeavor, but it will make reporting possible.

Your other option is to return the easy-to-use BI tool you're evaluating and get one with a robust data preparation layer. Find a BI tool with an ETL layer, warehousing capabilities and enough features to ensure that it can perform the multitude of tasks needed to make your database usable. Clean and accurate data will aid in reporting and dashboard creation, and you'll be thanking yourself for it later.


About the Author

Jeff Hainsworth

Jeff Hainsworth is a Senior Solutions Architect at Dundas Data Visualization with over a decade and a half of experience in Business Intelligence. He has a passion for building, coding and everything visual – you know, shiny things! Check out "Off the Charts... with Jeff", his platform for great content on all things analytics, data visualizations, dashboards, and business intelligence. There’s something for everyone!