StatsD And Anomalies

Anomaly Detection

I had been looking for a tool to detect anomalies in data. I stumbled across two libraries from Twitter:

 

 

These are R libraries for analysis of data. I have written a quick script to take data exported from StatsD and plot a graph with the interesting parts highlighted.

 

I was writing R code in a text editor but then someone suggested RStudio which I would highly recommend.

 

Below is the graph I was able to generate with 28 days of data.

 

Spot the issue.

Spot the issue…

 

The circled areas are available in code as well:

> print(res$anoms)
            timestamp anoms
1 2016-08-24 22:25:00 419.4919
2 2016-08-24 22:55:00 546.4654
3 2016-08-24 23:00:00 276.6360
4 2016-08-26 16:15:00 106.3696

 

The code to make this is StatsDAnomalyDetectionconvert_json.py will convert your raw data and output it into format the Twitter library can read.

 

Conclusion

This is a basic set of scripts for doing some analysis of StatsD data. I would recommend learning some R if you want to do some serious data analysis.