Heartfelt advice on how machine learning can identify the different types of outliers in your business metrics

When data is the lifeblood of your business, outliers are the pulse. The unexpected deviations in your data can indicate business successes. Thus, new opportunities: had a promotional discount caused a spike in first-time customers – can you build on that success and make them come back? Outliers can also signal serious problems: a sudden surge in online sales volume without a simultaneous jump in online revenue very often indicates a price glitch.

Not surprisingly, outlier detection is emerging as the next big thing in business intelligence (BI), especially in organizations with millions of metrics monitored in real-time. Old school analytics dashboards and static thresholds mire your team in alert storms and need constant readjusting.

The 3 flavors of outliers

Automated outlier detection systems avoid these problems by using machine learning to detect three different types of outliers accurately:

Global outliers are rare data values that are drastically different from the rest of the data they’re found in. You winning the lottery would be a global outlier since the odds of a particular person winning the jackpot is often over a million to one.

Contextual outliers are data points that are considered outliers in the context in which they appear. Still, the same values wouldn’t be considered outliers in other contexts within the same data set. You buy a lottery ticket at 11:00 at night would be considered a contextual outlier if you routinely purchase a ticket every day on your way home from work much earlier in the evening.


Finally, there are collective outliers, a group of data points that deviate significantly from the rest of the dataset as a subset. It’s probably easiest to think of collective outliers as statistically unlikely “clumps” of data points within a larger dataset. If the last three winners of the jackpot were neighbors of yours, that would be a collective outlier since we’d statistically expect a wider geographic distribution of the winners.

Human beings are naturally good at spotting outliers, provided we’re given enough data to form a clear mental model of what’s normal for a dataset. Graphs and other ways of visualizing data make this even easier for us. If we dig deep into a plot of time series data, we could probably spot all three types, even if it took us a while. However, it’s impractical for businesses to use manual outlier detection for hundreds, thousands, or millions of metrics.

Outlier detection and matters of the heart

It turns out that neural networks are also good at detecting outliers and, in some areas, are already better than humans in terms of recall (how well all the real outliers are identified) and precision (how many of the data points identified as outliers actually were outliers).

One example is the recently developed 34-layer convolutional neural network created by Stanford researchers, which outperforms board-certified cardiologists at heart arrhythmia detection. This neural net utilized a massive dataset of ECG recordings taken from people who wore a specific wearable heart monitor. This is one context where abstract statistical terms like precision and recall can mean the difference between life and death, especially for those who may not have access to a cardiologist.

Part of the team’s success was that they had a dataset of ECG recordings far larger than anyone who has attempted to use computers to detect heart arrhythmia accurately. They had this huge dataset because the research team partnered with the wearable heart monitor vendor, who was able to collect real data from its customers. Machine learning needs lots of data to learn from, just like we all needed to spend our first few years of life listening to and babbling at adults before we could speak in complete sentences.

Just like the heart arrhythmia example, real-time outlier detection vendor Anodot uses lots of data and neural nets to achieve high precision, recall, and conciseness in real-time at the scale of millions of metrics. It does so by tailoring the specific statistical outlier tests to the distribution actually seen in the data and learning the “normal” behavior of a metric over time. Any data point which the statistical test identifies as outside of the normal range is labeled an outlier. This robust approach can identify all three types of outliers in any time series data.

When it comes to business metrics, precision means catching only the legitimate and significant outliers and not flooding analysts with false positives. Recall means not letting any legitimate outliers slip through, like a price glitch, that causes you to lose hundreds of dollars of revenue on every sale.

And hemorrhaging revenue is the last thing your company needs. Machine learning can pump new insights into your business, bulking up your bottom line.

About author

I work for WideInfo and I love writing on my blog every day with huge new information to help my readers. Fashion is my hobby and eating food is my life. Social Media is my blood to connect my family and friends.
    Related posts

    How to Hire a Good Insurance Company for You?


    What are home loan interest rates and why are they important?


    The Blockchain industry


    Planning to invest in mutual funds? Answer these six questions first

    Sign up for our newsletter and stay informed !