Get more articles
The explosion of data collected by companies over the past decade has brought about an unexpected side effect: while organizations have gotten extremely proficient at identifying, gathering, and accessing their data, the sheer enormity of this data has paradoxically made drawing meaningful, actionable insight more difficult. Many companies have been focused on data — specific and immutable facts about what happened — over information — the interpretation of these observations in the pursuit of decision making.
Confronting the data — information divide
This outcome shouldn’t be all that surprising, as data becomes information only after the application of the uniquely human competencies of synthesis, inference, and critical reasoning. We have so much data precisely because collection is a task for which machines are exceptionally well suited — capture and record observable facts as they happen, both reliably and at scale. On the other hand, computers struggle to ask and answer the abstract and open-ended questions about these observed facts that humans have an innate knack for, such as formulating a theory for why something happened. While artificial intelligence promises to bridge this gap — and very well may in the near future, so far we haven’t seen this realized.
As a result, many organizations that solved the data collection problem over the past few years find themselves presented with a new and arguably more perplexing challenge — harnessing all this new data so as to make better, not just different, decisions with it all. This requires efficiently traversing all the rich dimensions being collected on key metrics to understand where changes have occurred, why they might have happened, and how the organization should react.
Enter the analyst
Up against this daunting challenge are data analysts and scientists, who are tasked with converting data (typically encoded into metrics) into information which can ultimately guide organizational decision making. While at the surface it may seem that every analysis and deep dive into metric behavior requires a unique approach, elite analysts have established several repeatable analytical techniques that dramatically improve the speed to and quality of insights that can be gleaned from data.
Following are six of these approaches that we’ve seen emerge as common patterns across top analysts in a range of seemingly unrelated industries and applications. Over the next few weeks, we’ll publish subsequent deep dives into each of these techniques so that you can bring them to your organization to help alleviate the data-information bottleneck. We’ll jump right into a few concepts covered in prior posts, including dimensions and averages, so I’d recommend checking those out too.
The six techniques
Contribution is the relative share of a metric that any one dimension value represents. Contribution can be calculated along any available dimension, or any combination of dimensions, and help you contextualize the relative importance of that cut as a part of the collective whole. This can be incredibly useful for not over- or under-indexing on the raw value of a cut and focusing you on the more important areas.
For example, consider a store that sells home goods. If they had 80 sales in a day, and 20 of those were bowls, the contribution of bowls would be 25%. If 10 of those bowls were sold on clearance, then the contribution of bowls on clearance to sales would be 12.5%.
Here’s a more detailed look into how to include contributions into your analysis workflow.
Change in contribution (mix shift)
While contribution is calculated for a specific point in time, we can look at the change in contribution for specific segments over time to get a deeper understanding of the movement taking place below the surface of our metric. To do this, you simply take the difference in each dimension value for a given dimension between two points in time (if a value doesn’t exist in one of the time periods, then treat it as zero). This can be incredibly useful for spotting subtle shifts in a metric that go unnoticed when just looking at changes in raw numbers — a way of uncovering hidden opportunities and risks.
For example, let’s say that our store that had 80 total sales today, 20 of which were bowls, had 40 daily sales with 15 bowls at the beginning of the month. In this case there has been a 5 unit increase in bowl sales, but a 12.5% decrease in bowls contribution to sales. Just looking at the raw numbers would make it much more difficult to spot that bowl sales haven’t been keeping up, especially if I have a large number of other products I’m viewing at the same time.
The article linked in the prior section also explores mix shift analysis.
Contribution to change
Continuing to work with the concept of contribution, here’s another impactful way to make sense of the drivers of change in a metric: compute the contribution of each dimension value to the change itself. This lets us determine which dimension values “drive” or “detract” from the observed change over time. Speaking in terms of contribution to change is a powerful way to understand the most active segments of a metric — oftentimes we will see the highest contribution values also driving the most change, but when smaller segments have an outsized contribution to the change (or are counteracting the movement) we can now precisely identify which cuts are having this effect and take action.
One interesting note with contribution to change is that when counter-aligned values are present — for instance dimension values that decrease when the overall metric increases — we will get both negative contribution to change (in the case of the value moving in the opposite direction) and potentially an individual contribution to change in excess of 100%. This is perfectly acceptable and is actually a benefit of this type of approach. By allowing us to see both the negative values and those above 100%, we maintain clear focus on the highest impact segments. So long as all of the values sum to 100% for a given dimension, you are in the clear!
Staying with our store example, let’s consider the two dates we referred to previously. In this case, there was a 40 unit increase in sales, and a 5 unit increase in bowls. Thus the bowls contribution to change is 12.5%. If instead there was a 20 unit decrease in bowls but still the overall 40 unit increase, then bowls would have a -50% contribution to change (but the sum of all products’ contribution to change should still be 100%).
Ratio metrics are a more complex metric representation, typically used to track efficiency in some way — such as conversion rate (number of success divided by number of attempts) or average transaction values (total value of transactions divided by total count of transactions). Because ratios have two components, we now need two ways to discuss their movement in terms of their various segments — contribution alone isn’t enough.
The two key elements to consider in this case are: rate, which is the ratio metric’s value for a specific dimension cut, and mix, which is the share of total elements that the segment represents (specifically it would be the contribution of the ratio’s denominator value). These can move more or less independently of each other, so to get maximum insight, we have to quickly switch between looking at rate changes, mix changes, and the net impact (effectively the mix-weighted rate) to understand which cuts are responsible for a change and where there might be otherwise undetected undercurrents.
In this case, we unfortunately have to leave our store sales example behind. But instead, let’s say we’ve now launched an ecommerce site to sell our goods and are analyzing the cart conversion ratio: the number of times that a product put in the cart is ultimately purchased. Let’s say that our conversion ratio increased by 10% overall. In this case, we’ll use ratio analysis to understand the impact of bowls on this in a couple ways:
- I will first look at the change of conversion ratio for bowls independently, the rate change. This will tell me how frequently bowls are getting purchased regardless of their share of total sales and lets me understand relative efficiency of bowls against other products, such as plates. If we see meaningful increases in rate, this tells us that we are making the process more efficient for those values.
- I will next look at the mix change of bowls, which again is essentially the contribution of the denominator value — in this case total sales. Changes in mix can also drastically impact the final output. For instance if my rate is unchanged for bowls but higher than the overall conversion rate for the store, just increasing the share of bowl sales will drive my overall conversion rate higher.
- I can finally combine both of these measures to understand net change for the dimension value. Note that this should never be the only thing that you look at, since the calls to action for mix-related problems (typically changing the shape of total traffic through marketing spend or SKU management) are drastically different than rate-related problems (investing in product, UX, and discounts to increase the number of ultimate sales)
As you can see, ratio metrics are hugely powerful for understanding very nuanced drivers in your KPIs, but also are quite complicated. Look for a follow-up post where I’ll pick this topic up in much more depth!
Distribution-based metrics are those which express an action (such as a purchase) taken by some entity (such as a customer) over a period of time (such as a week). Given limitations of many reporting and visualization platforms, teams with these types of KPIs are often forced to use just one aggregation of this — the most common of which is the average.
If you can instead efficiently represent and explore the full distribution of these measures, you oftentimes are able to find much more meaningful insights which can be completely hidden by using just an average alone. Not unlike using contribution to understand the relative impact of any one dimension value, using an array of percentiles help you understand hidden movement within the metric itself. Histograms are a great first step at representing your metrics with rich dimensional insight.
In even more advanced analysis, you can combine this concept with the above to look at how the dimensional composition of a specific percentile bin has shifted over time. Now we’re really getting a huge amount of information out of something which might previously just have been seen as a simple average!
For example, our newly online storefront might want to track the value of customer purchases each week. The easiest way to do this would be to look at how the average changes, since it is technically affected by all the individual transactions. However, if we take a distribution-based approach to this analysis, we will also look at what has changed in the 25th median, 75th, and 90th percentiles.
Let’s say my average transaction value increased week over week. By looking at the distributions, I can see if it was because I had a small increase in very high dollar transactions or a much broader increase of smaller magnitude across all of my transactions. The actions I’d take as a result of these two different findings would certainly be much different.
You can learn more about distribution metrics through one of my previous deep-dive write-ups on averages.
Related metric exploration
A final bonus category for top analytics techniques is related metrics. While I won’t get into the nitty gritty of causal inference here (hopefully I’ll have some time to write on that in the future!), for now we can talk about the importance of using intuition and business context to think about which metrics may be drivers of other metrics.
One framing I like to think about is that most organizations have output metrics that represent their ultimate goals (such as profit, more transactions, or more customer sign-ups) and input metrics that are elements which they can exert some amount of control over in hopes of altering the outputs. In addition to implementing all of the above techniques on observed changes to output metrics, strong analytical teams also will use the same methods on input metrics which may have had an effect. Here, they will usually dedicate specific interest to exploring along common dimensions.
In this case, we can use methods like contribution, change in contribution, contribution to change, and the rest to understand what has changed in a metric, then our mapping of related metrics to hopefully explain why this occurred. This forms the foundation of how I think about data-driven decision making.
Returning to our online store one last time, if I see that sales are down, after identifying which dimensions are driving the decrease (perhaps customer city is one), I might then explore the number of sessions cut by city to understand whether changes in site traffic might be at least partially responsible.
Wrapping it all up
We’ve covered a lot of ground in this post, which I hope has shed some light onto the approaches that any team can implement in order to draw deeper insights from and ultimately make better decisions with their data. One of our inspirations in building Falkon has been to create a platform that augments data and business teams’ ability to rapidly and accurately perform these and other methodologies to translate their data into information. While expanding your KPI vocabulary to consider concepts like contribution, rate, and mix might be challenging at first, building fluency with these concepts will empower organizations to dramatically improve their ability to be truly data-driven.
Want more articles like this? Follow Falkon on LinkedIn.