If I have multiple audio files and I want to know which of them have the best quality, I can use a few things to test them.

MOS Score:

โ€” Note: These MOS Score are Non Intrusive Prediction Models, they only need the audio files/no need of reference files.

Model Accuracy Speed Best For
**DeepMOS
(Wav2Vec + Transformer)** ๐ŸŸข๐ŸŸข๐ŸŸข๐ŸŸข๐ŸŸข ๐ŸŸข๐ŸŸข General Speech Quality
**NISQA
(CNN + Transformer)** ๐ŸŸข๐ŸŸข๐ŸŸข๐ŸŸข ๐ŸŸข๐ŸŸข๐ŸŸข VoIP & Speech Enhancement
**MOSNet
(CNN + BiLSTM)** ๐ŸŸข๐ŸŸข๐ŸŸข ๐ŸŸข๐ŸŸข๐ŸŸข๐ŸŸข TTS & Synthesized Speech
**DNSMOS
(Deep Learning)** ๐ŸŸข๐ŸŸข ๐ŸŸข๐ŸŸข๐ŸŸข๐ŸŸข๐ŸŸข Noisy Speech & Real-Time Processing

Speech Diarization (Speechbrain):

Model / Setup Model Size DER (โ†“ better) Speed Notes
ECAPA-TDNN + Spectral Clustering 14M params ~3.0โ€“4.0% โšกโšกโšก High accuracy, slower clustering
ECAPA-TDNN + AHC 14M params ~4.5โ€“5.5% โšกโšกโšกโšก Fast clustering, good accuracy
SpeechBrain Full Pipeline (CRDNN+ECAPA) 20M params ~3.5โ€“4.5% โšกโšกโšก All-in-one pipeline, includes VAD
X-vector + PLDA 7M params ~6.5โ€“7.5% โšกโšกโšกโšกโšก Very fast, lightweight, lower accuracy
X-vector + AHC / KMeans 7M params ~6.0โ€“7.0% โšกโšกโšกโšกโšก Fastest with simple clustering
Resemblyzer + KMeans (external) 5M params ~7.0โ€“8.0% โšกโšกโšกโšกโšก Python-only, very low overhead
Tiny ECAPA (custom) 6โ€“8M params ~5.0โ€“6.0% โšกโšกโšกโšก Requires training, good trade-off
WavLM Base + Clustering 94M params ~4.0โ€“5.0% โšกโšกโšก Accurate but large and slower
pyannote VAD + clustering 1M (VAD) ~6.0โ€“7.0% โšกโšกโšกโšก Modular, pair with any embedder