Classifying Multi-Variate Time Series at Scale: Characterizing and understanding the runtime behavior of large scale Big Data production systems is extremely important. Typical systems consist of hundreds to thousands of machines in a cluster with hundreds of terabytes of storage costing millions of dollars, solving problems that are business critical. By instrumenting each running process, and measuring their resource utilization including CPU, Memory, I/O, network etc., as time series it is possible to understand and characterize the workload on these massive clusters. Each time series is a series consisting of tens to tens of thousands of data points that must be ingested and then classified. At Pepperdata, our instrumentation of the clusters collects over three hundred metrics from each task every five seconds resulting in millions of data points per hour. At this scale the data are equivalent to the biggest IOT data sets in the world. Our objective is to classify the collection of time series into a set of classes that represent different work load types. Or phrased differently, our problem is essentially the problem of classifying multivariate time series. In this talk, we propose a unique, off-the-shelf approach to classifying time series that achieves near best-in-class accuracy for univariate series and generalizes to multivariate time series. Our technique maps each time series to a Grammian Angular Difference Field (GADF), interprets that as an image, uses Google’s pre-trained CNN (trained on Inception v3) to map the GADF images into a 2048-dimensional vector space and then uses a small MLP with two hidden layers, with fifty nodes in each layer, and a softmax output to achieve the final classification. Our work is not domain specific – a fact proven by our achieving competitive accuracies with published results on the univariate UCR data set as well as the multivariate UCI data set.
Session Summary
Classifying Multi-Variate Time Series at Scale
MLconf 2017 San Francisco
Ashfaq Munshi
Pepper Data
CEO
Learn more »