Model Name Size (Params) Embedding Dim MTEB Score Pros Cons
all-MiniLM-L6-v2 ~22M 384 ~60-62 Lightweight, fast, general-purpose Less nuanced semantics
all-mpnet-base-v2 ~110M 768 ~65 Strong performance, balanced Slower than smaller models
text-embedding-3-small N/A (API-based) 1536 ~65-67 Excellent semantics, easy to use API cost, not local
text-embedding-3-large N/A (API-based) 3072 ~67-70 Top-tier accuracy Higher cost/latency
intfloat/e5-small-v2 ~33M 384 ~62 Fast, retrieval-optimized Slightly less general
intfloat/e5-large-v2 ~335M 1024 ~66 Strong semantics, multilingual Larger, slower
BAAI/bge-small-en-v1.5 ~33M 384 ~63 Competitive, fast English-focused
BAAI/bge-large-en-v1.5 ~335M 1024 ~66-67 Near SOTA, great for RAG Resource-intensive
thenlper/gte-small ~33M 384 ~62 Versatile, solid performance Less specialized
thenlper/gte-large ~335M 1024 ~65-66 High accuracy, good for RAG Slower inference
facebook/dpr-ctx_encoder ~110M 768 ~60-62 Tailored for retrieval Older, outperformed by newer