Curating data is more than simply storing it in a shared database.
Data curation is becoming increasingly popular in conversations related to self-service analytics, especially when the discussion is on data governance or metadata management. But if you are new to these conversations; curious about the concept and its applications, you have landed on the right page.
What is Data Curation?
Data Curation can be defined as the continuous process of organizing and managing a collection of datasets from various sources to fulfil the analytical requirements of a specific group of people.
To help you understand the concept better in a more general sense, let us take music streaming services as a reference.
There are several types of music listeners using the same streaming platform. The goal of the platform is to make it as user-friendly and self-sufficient as possible. One aspect of it is to create and optimize collections of individual songs that cater to the needs of various listeners.
Say I am feeling good about my day today and want to listen to songs that elevate my mood. So, I open the app and look for a playlist called Feel Good Soundtracks. But what if a couple of those songs that feel good to me feel romantic to you? In that case, those songs must also appear in a playlist called Top Romantic Hits made as per your taste in music. It turns out that by the genre they are all country songs. Do you know what that means? Yes, they should also appear in the Trending Country Songs playlist curated for our friend next door.
Now think about the above example from a music curator’s point of view. The initial process would involve correctly labelling those songs for all the relevant emotions and genres. Then to include them in all applicable categories. After that, it would be to create playlists and serve them to the listeners, observe their listening habits and optimize the process to match their needs as closely as.
Data curation works in a comparable manner. In a self-service analytics environment, different data analysts look at the same dataset from their own perspective. Which is where its reusability comes into play. Data curation is then a practice of observing the use of data and understanding how context, narrative and meaning can be collected to make it more reusable. That is how data curation creates the bridge between data and its real-world application.
Why is Data Curation important?
Businesses today are spending millions of dollars on Business Intelligence (BI) and Data Analytics software are with the goal of empowering their people to make more data-driven decisions. But despite implementing the best BI & analytics software, organizations often struggle to cultivate a data culture among their employees. The challenge is not the lack of capabilities of the tools, but the lack of data literacy.
As we saw in the example of music curation, how we process data sets can be different. There can be a list of transformations and calculations applied to manipulate a data set before it reaches its intended user. For them to be able to make the right business decision they need to understand data’s context, interpret it and be able to trust it.
Therefore, data curation is important.
What is the role of a Data Curator in Data Management?
A decade ago, data analysis was done on data generated from enterprise systems like ERPs and CRMs. But that situation has changed drastically. Today, data that is generated internally is only one piece of the puzzle and data being generated from uncontrolled sources is increasing exponentially. This demands a need for adding context to data created through these new sources.
It is the responsibility of the Data Curator to ensure that data is described sufficiently to give its user the correct context to then build their analysis on. Everyone who works with data has the opportunity to curate by sharing their context through their learnings and experiences. A typical organization can have many data curators depending on the degree of responsibility, access to data and corresponding time commitment.
When it comes to data analytics and business intelligence, people often talk only about visual dashboards and reports, forgetting that the first and most important step in the process is in fact data preparation. In our Guide to Data Preparation in Dundas BI video, we talk about some of the different types of ETL problems that can be solved with the Dundas BI Data Cube layer in the context of Data Preparation. If you’re a Data Nerd like me, you will certainly appreciate it!
If you wish to be a part of the ongoing conversations around data and its real-world applications, I urge you to join us in The Dundas Community. It is the place where we engage in thought-provoking data-driven conversations while also discussing the endless possibilities of Dundas BI.
About the Author
Tejas Shah is a Computer Science graduate with a major in Data Science and a marketer by trade who is passionate about data-driven storytelling. He authors content that educates and offers a fresh perspective in the world of business intelligence and analytics.