Know Thy Data: Setting the Stage for Data Analytics
You can’t tell a data story until you know thy data. According to Socrates, true wisdom is knowing what you do not know. Data on its own is meaningless until you form relationships and correlate it to increase understanding. Every organization knows that information and data are abundant. But amidst an abundance of data, what do we do with it? The key lies not just in the data itself, but what we do to inform and communicate the data story.
Starting with Data Input
Raw data can be like pieces of a puzzle. It’s meaningless until you start putting the pieces together. You start with the corners then start unveiling patterns and trends until you’re on a roll. But understanding the context behind data is the real key to success. That involves diving into the details on how data was collected, processed, and governed.
Streamlining data for consistency is another piece to the data puzzle. It ensures uniformity and reliability in data findings and communicating that consistency in the data story aligns the perspective of end users, allowing them to view the analysis better.
Understanding Bias in Data
The truth is there is bias in every dataset. It shows itself in the way values are input or managed, and can be influenced by business policies, compliance requirements, or human error. Recognizing the bias in your data can help ensure accuracy and reliability and is essential for every business to scale their analytics programs.
Knowing where data comes from is a key piece of understanding bias in data. The origins and data sources allow data scientists to handle data accordingly. Whether it be integrating multiple sources or navigating various systems, each dataset presents its own set of opportunities for building out the data puzzle.
Forming Relationships with Data
To ensure a dataset is ready for analysis, what are the business questions the data can answer? Is there a hypothesis to prove or disprove? Knowing what the data can provide will help set the right expectations.
Confusion can arise if the data is not aligned with the dataset because of the data definition. For example, there can often be a field used in accounting that is the same as what is in marketing, such as start date or end date. The values have different meanings and measures; a campaign in marketing can end while the program is still active in accounting, as it is still receiving revenue after the campaign end date.
A business’s data, no matter how big or small, is created and collected at such a rapid pace. Raw data is a narrative waiting to be told, and knowing your data, forming relationships, and understanding biases allow you to build out your data puzzle and tell your story with precision and accuracy. The more we know about data before analysis, the better we can avoid misrepresentation, mitigate oversight of important insights, and minimize unreliable conclusions. In the next blog in this series, we will dive into more ways to tell a data story and how to shift the paradigm on how data can be used.
Virginia Ryan is an associate account director for Avaap's Data & Analytics practice. Virginia has more than 20 years of experience in strategic planning, program management, process management, and data visualization and analytics.