When data is the lifeblood of your business, outliers are the pulse. The unexpected deviations in your data can indicate business successes. Thus, new opportunities: had a promotional discount caused a spike in first-time customers – can you build on that success and make them come back? Outliers can also signal serious problems: a sudden surge in online sales volume without a simultaneous jump in online revenue often indicates a price glitch.
Not surprisingly, outlier detection is emerging as the next big thing in business intelligence (BI), especially in organizations with millions of metrics monitored in real-time. Old-school analytics dashboards and static thresholds mire your team in alert storms and need constant readjusting.
The three flavors of outliers
Automated outlier detection systems avoid these problems by using machine learning to detect three different types of outliers accurately:
Global outliers are rare data values drastically different from the rest of the data they’re found in. You winning the lottery would be a global outlier since the odds of a particular person winning the jackpot is often over a million to one.
Contextual outliers are data points considered outliers in the context in which they appear. Still, the same values wouldn’t be regarded as outliers in other contexts within the same data set. Buying a lottery ticket at 11:00 at night would be viewed as a contextual outlier if you routinely purchase a ticket every day on your way home from work much earlier in the evening.
READ MORE :
- 3 Ways to Determine the Market Value of Your Commercial Property
- Various Benefits of Hiring a Professional Moving Company
- Moving To a New Location? A Financial Checklist
- 6 Reasons Why Your Construction Business Should Switch to the EnviroWash System
- Sources of Business Finance
Finally, there are collective outliers, a group of data points that deviate significantly from the rest of the dataset as a subset. It’s probably easiest to think ofpoints within a larger dataset. If the last three jackpot winners were neighbors of yours, that would be a collective outlier since we’d statistically expect a wider geographic distribution of the winners.
Humans are naturally good at spotting outliers, provided we’re given enough data to form a clear mental model of what’s normal for a dataset. Graphs and other ways of visualizing data make this even easier for us. If we dig deep into a plot of time series data, we could spot all three types, even if it took us a while. However, it’s impractical for businesses to use manual outlier detection for hundreds, thousands, or millions of metrics.
Outlier detection and matters of the heart
It turns out that neural networks are also good at detecting outliers and, in some areas, are already better than humans in terms of recall (how well all the real outliers are identified) and precision (how many of the data points identified as outliers were outliers).
One example is the recently developed 34-layer convolutional neural network created by Stanford researchers, which outperforms board-certified cardiologists at heart arrhythmia detection. This neural net utilized a massive dataset of ECG recordings taken from people who wore a specific wearable heart monitor. This is one context where abstract statistical terms like precision and recall can mean the difference between life and death, especially for those who may not have access to a cardiologist.
Part of the team’s success was having a dataset of ECG recordings far larger than anyone who has attempted to use computers to detect heart arrhythmia accurately. They had this huge dataset because the research team partnered with the wearable heart monitor vendor, which was able to collect real data from its customers. Machine learning needs lots of data to learn from, just like we all needed to spend our first few years of life listening to and babbling at adults before we could speak in complete sentences.
Just like the heart arrhythmia example, real-time outlier detection vendor Anodot uses lots of data and neural nets to achieve high precision, recall, and conciseness in real time at the scale of millions of metrics. It does so by tailoring the specific statistical outlier tests to the distribution seen in the data and learning the “normal” behavior of a metric over time. Any data point that the statistical test identifies as outside of the normal range is labeled an outlier. This robust approach can identify all three types of outliers in any time series data.
Regarding business metrics, precision means catching only legitimate and significant outliers and not flooding analysts with false positives. Recall means not letting legitimate outliers slip through, like a price glitch that causes you to lose hundreds of dollars of revenue on every sale.