Spike sorting for large dense arrays
Shabnam N. Kadir (NYU Langone Medical Center, New York/Imperial College London), Dan F.M. Goodman (Ecole Normale Superieure, Paris), John Schulman (Berkeley, California), Gyorgy Buzsaki (NYU Langone Medical Center, New York), Kenneth D. Harris (Imperial College London)
The greatest challenge is that of temporally overlapping spikes: Current spike detection and sorting methods fail when two spikes occur simultaneously. For small arrays such as 4-channel `tetrodes', temporal overlaps were rare and thus only a minor source of error. For large, dense electrode arrays, however, this is the rule, not the exception. At present, the most common approach to sorting large array data in cortex is to arbitrarily divide the recording sites into `virtual tetrodes', which are then sorted using standard methods. This ad-hoc approach is not only inconvenient, labour-intensive and subjective, but also introduces serious errors as spike waveforms inevitably cross the boundaries of the virtual tetrodes.
We introduce a new system for sorting high channel count data. Firstly, a new spike detection system is implemented in the program `SpiKeDeTeKt', which uses knowledge of the probe geometry to perform a space-fill algorithm that groups spatially and temporally contiguous superthreshold samples. It produces for each detected spike a list of adjacent `unmasked' channels on which there are supra-threshold spike waveforms, and a list of `masked' channels on which there is only noise. Temporally overlapping but spatially separated spikes are represented through different lists of masked channels.
In the second step, we introduce a new version of KlustaKwik which implements a novel `distributional EM algorithm' to deal with masked data. This is a modified version of a standard hard-EM algorithm for a mixtures of Gaussians, with the features on masked channels replaced with a fixed probabilistic model. This ensures that temporally overlapping spikes do not corrupt the sorting process, and that noise from the large number of subthreshold channels does not swamp the signal from the few suprathreshold channels.
To test the efficacy of our algorithm we create a `hybrid data set' where groundtruth is available through the addition of a set of spikes from one recording to a second recording made with the same probe. Performance of the new algorithm is comparable to that achieved by supervised learning based on groundtruth, suggesting that performance is close to optimal.