Heartfelt advice on how machine learning can identify the different types of outliers in your business metrics

When data is the lifeblood of your business, outliers are the pulse. The unexpected deviations in your data can indicate business successes. Thus, new opportunities: had a promotional discount caused a spike in first-time customers – can you build on that success and make them come back? Outliers can also signal serious problems: a sudden surge in online sales volume without a simultaneous jump in online revenue often indicates a price glitch.

Not surprisingly, outlier detection is emerging as the next big thing in business intelligence (BI), especially in organizations with millions of metrics monitored in real-time. Old-school analytics dashboards and static thresholds mire your team in alert storms and need constant readjusting.

Outlier detection techniques. Pattern! Patterns! Patterns! | by Nikhil  Verma | Medium

The three flavors of outliers

Automated outlier detection systems avoid these problems by using machine learning to detect three different types of outliers accurately:

Global outliers are rare data values drastically different from the rest of the data they’re found in. You winning the lottery would be a global outlier since the odds of a particular person winning the jackpot is often over a million to one.

Contextual outliers are data points considered outliers in the context in which they appear. Still, the same values wouldn’t be regarded as outliers in other contexts within the same data set. Buying a lottery ticket at 11:00 at night would be viewed as a contextual outlier if you routinely purchase a ticket every day on your way home from work much earlier in the evening.


Finally, there are collective outliers, a group of data points that deviate significantly from the rest of the dataset as a subset. It’s probably easiest to think of collective outliers as statistically unlikely “clumps” of data points within a larger dataset. If the last three jackpot winners were neighbors of yours, that would be a collective outlier since we’d statistically expect a wider geographic distribution of the winners.

Humans are naturally good at spotting outliers, provided we’re given enough data to form a clear mental model of what’s normal for a dataset. Graphs and other ways of visualizing data make this even easier for us. If we dig deep into a plot of time series data, we could spot all three types, even if it took us a while. However, it’s impractical for businesses to use manual outlier detection for hundreds, thousands, or millions of metrics.

Outlier detection and matters of the heart

It turns out that neural networks are also good at detecting outliers and, in some areas, are already better than humans in terms of recall (how well all the real outliers are identified) and precision (how many of the data points identified as outliers were outliers).

One example is the recently developed 34-layer convolutional neural network created by Stanford researchers, which outperforms board-certified cardiologists at heart arrhythmia detection. This neural net utilized a massive dataset of ECG recordings taken from people who wore a specific wearable heart monitor. This is one context where abstract statistical terms like precision and recall can mean the difference between life and death, especially for those who may not have access to a cardiologist.

Part of the team’s success was having a dataset of ECG recordings far larger than anyone who has attempted to use computers to detect heart arrhythmia accurately. They had this huge dataset because the research team partnered with the wearable heart monitor vendor, which was able to collect real data from its customers. Machine learning needs lots of data to learn from, just like we all needed to spend our first few years of life listening to and babbling at adults before we could speak in complete sentences.

Just like the heart arrhythmia example, real-time outlier detection vendor Anodot uses lots of data and neural nets to achieve high precision, recall, and conciseness in real time at the scale of millions of metrics. It does so by tailoring the specific statistical outlier tests to the distribution seen in the data and learning the “normal” behavior of a metric over time. Any data point that the statistical test identifies as outside of the normal range is labeled an outlier. This robust approach can identify all three types of outliers in any time series data.

Regarding business metrics, precision means catching only legitimate and significant outliers and not flooding analysts with false positives. Recall means not letting legitimate outliers slip through, like a price glitch that causes you to lose hundreds of dollars of revenue on every sale.

And hemorrhaging revenue is the last thing your company needs. Machine learning can pump new insights into your business, boosting your bottom line.

About author

I work for WideInfo and I love writing on my blog every day with huge new information to help my readers. Fashion is my hobby and eating food is my life. Social Media is my blood to connect my family and friends.
    Related posts

    What's the Difference Between a Secured and Unsecured Business Loan?


    Utilizing Automation In Payment Systems Increases Efficiency In Business


    Importance of position size calculator in trading


    Shopify empowers business owners to succeed.

    Sign up for our newsletter and stay informed !