Dynamic time warping (DTW) is a dynamic programming algorithm which aims to find the dissimilarity between two time-series. This algorithm was originally applied towards speech recognition .
In ICML 2017, Marco Cuturi and Mathieu Blondel proposed a differentiable formulation of this algorithm that’s very helpful in optimization problems involving temporal sequences. They call this differentiable formulation soft DTW and I’ve been using this algorithm quite extensively for a project that I’m working on.
My primary deep learning framework is PyTorch, and although multiple implementations exist already (e.g. this or this ), they were a bit slow for my use-case, and couldn’t do as many experiments as I wanted due to speed constraints. Considering that soft DTW is very similar to the original DTW, and many efficient implementations exist for it already, I set out to come up with my own implementation that was faster than the existing ones. Naturally, a CUDA implementation was the first thing that I thought of.