
We reduced each string by counting any run of equal values as one value (i.e., abb abb abb aaa abb would be reduced to abb aaa aab, as abb is only counted once when repeated). Consequently, the desired string lengths were achieved by using dimensionality reduction in the SAX algorithm. Our stochastic transducer, at the moment of this study, could not run on strings of length longer than 1020 characters.

The second factor that required a difference in procedure for extracting the data was a matter of computational limitation. To correct for this, any subsequence that had zero variation was substituted by values of 0.
FINITE STATE AUTOMATA CLINICAL DOMAIN WINDOWS
In some windows the variation in values was 0 which resulted in values that divided by zero.

This representation normalizes the data in overlapping local windows prior to conversion such that the mean is 0 and standard deviation is 1. The data for this paper was converted to a symbolic representation named Symbolic Aggregate approXimation (SAX) which is explained in more detail in the Methods section. The test set consisted of segment B of Test Set A which contained 10 patients, of which half had experienced a period of acute hypotension following the sample data, and was composed of only 10 h of data. Two patients were dropped because of large gaps in the training data resulting in the final size of 58. The data was taken from segments A and B which contained all the hours of data prior to the period in which the patient may or may not have experienced an episode of hypotension. The training set consisted of 58 of 60 patients from the 2009 Physionet Challenge, of which 28 experience an episode of hypotension in the hour following the period of the data sample. We focused only on heart rate for this trail and instead of using leave one out validation, we used the test set that was provided in the Challenge and the original data from previous work as the training set. The prediction task was to classify which patients were going to enter an episode of acute hypotension in the forecast window of one hour after the last entry in the data set. It consists of 1–6 days of high-frequency physiological data from patients in an Intensive Care Unit (ICU). To compare results, we used one of the data sets from previous work which came from the Physionet Challenge. We compare the best results in the multivariate domain to the results we achieved in the univariate domain with this new approach to test its potential effectiveness. In this case, the edit cost probabilities are learned by a stochastic Finite-State Transducer (FST).
FINITE STATE AUTOMATA CLINICAL DOMAIN SERIES
We are using Stochastic Edit Distance on a concatenated symbolic representation of the time series to classify a physiological data set for an acute episode of hypotension. This paper represents work in the univariate time series domain which we are examining for application to the multivariate domain. The results demonstrated that the multivariate representations outperformed univariate ones for the purpose of predicting the targeted outcome. Similarity was measured by converting the data sets into the indicated representations and then classifying the data using the Nearest Neighbor algorithm. Two data sets were examined in each domain for a total of six different data sets. The representations were tested in three distinct data domains: field-motion capture data, robot sensor data, and ICU data.

In related work, four multivariate time series representations were examined to serve as compressed representation of high frequency physiological data from Intensive Care Units: Stacked Bags-of-Patterns, Multivariate Bags-of-Patterns, Multivariate Piecewise Dynamic Time Warping and Ensemble Voting with Bag-of-Patterns. Many are recognizing the importance of analyzing this data as a multivariate temporal representation by creating multivariate probabilistic models or temporal abstractions from electronic health records or creating multivariate structures that are similar to those in other domains such as convolutional neural networks or imaging. Recently, however, there has been interest in storing and analyzing the high-frequency data using automated and semi-automated methods. Physicians make life-saving decision based on this lower-frequency data. Instead, a lower-frequency version of this data is stored in an electronic health record after validation by a medical provider at the rate of once every 15 min to once every several hours. These measurements are displayed on a monitor for a few seconds as a collection of univariate time series and then lost to further analysis. Current methods for measuring the well-being of a patient in the intensive care unit (ICU) acquire a patient’s vital signs data at rates that are difficult for a human to analyze (60–500 Hz).
