Transformations

Data Pre-Processing

Every experienced ML practitioner knows that most of the work has to be done before a model is being trained. Data needs to be:

  1. Retrieved (and subsequently organized)

  2. Cleaned (remove duplicates, implausible values)

  3. Interpolated (if missing data occurs)

  4. Preprocessed (some kind of transformation, which this section is all about)

While 1. through 3. can be standardized, different data scientists might prefer different pre-processing to model certain known dependencies. One such well-known transformation is the log return on the closing price. It stationarizes the wildly varying closing price, such that the ML algorithm has an easier time recognizing similar patterns across the complete time series.

Because there exist more examples of the same log return patterns on different price levels, ML algorithms can recognize them more easily.

This method is great to improve the generalization of our model, the capability to predict with out-of-sample data.

The Switchboard

A great idea in modeling is something called the computation graph. A computation graph is often employed when a chain of calculations needs to be analyzed, for example finding the contributing factors of an error (see Backpropagation). We have something different in mind though:

This view allows users to:

  • Pick the transforms to apply.

  • Chain transforms and combine different data sources to create expressive metrics.

  • Save chains of transformations to one's own toolkit of transforms, to reduce the complexity of the transformation graph and reuse common transformations (like the first chain on data source 'A', which effectively models the log return, mentioned above).

  • Pick the final data series to input into the model.

Last updated