My PhD involved work with an industrial sponsor, ZOO Digital, and resulted in an audio-visual speech detection and alignment system designed for detection and alignment applications in entertainment multimedia.
One of the key contributions of this work is a method for speech detection in feature-film/entertainment audio. This uses a novel form of feature selection to produce features which are highly sensitive to the presence or absence of speech. The resulting feature vector is then used to train a random forest classifier. Testing has demonstrated encouraging results, outperforming state of the art and contemporary approaches. The approach is detailed in the paper Cross-Covariance-Based Features for Speech Classification in Film Audio in the Journal of Visual Languages and Computing.