What is a reranker and why is it used in a 'retrieve-then-rerank' pipeline?

A reranker, or cross-encoder, is a model that calculates a detailed relevance score for a given query and document pair by allowing them to interact through every layer. Because this process is computationally expensive, it's used in a two-step 'retrieve-then-rerank' system. First, a fast but less accurate embedding model retrieves a broad set of candidate documents (e.g., the top 100). Then, the more accurate reranker re-orders just those candidates to produce the final, high-quality ranking, balancing speed with accuracy.

Introducing the Ettin Reranker Family

Tom Aarsen Releases Ettin Reranker Family for High-Accuracy Search

AI researcher Tom Aarsen has released a new family of six open-source reranker models, named the Ettin Reranker family, providing state-of-the-art performance at their respective sizes. These models are designed to significantly improve the accuracy of information retrieval systems by re-ordering the initial results from a faster, less precise search model. The release includes models ranging from a lightweight 17 million parameters up to a powerful 1 billion parameter version, offering developers a spectrum of options to balance performance and computational cost.

Technical Architecture and Training

The new models are Sentence Transformers `CrossEncoder` models built upon the Ettin ModernBERT encoders developed at Johns Hopkins University. This foundation provides modern architectural features and support for long context windows up to 8,192 tokens. The models were trained using a distillation recipe, learning to replicate the scores of the larger, high-performance mixedbread-ai/mxbai-rerank-large-v2 model. The full dataset and training script have also been made available, promoting transparency and enabling developers to build their own custom rerankers.

Model Suite: Six models are available, from `ettin-reranker-17m-v1` to `ettin-reranker-1b-v1`.
Backbone: Built on Ettin ModernBERT encoders with RoPE positional encodings and GeGLU activations.
Optimization: Recommends using Flash Attention 2, which can provide a 1.7x to 8.3x speedup over standard implementations.
Training Method: Trained via pointwise MSE distillation on a mix of pre-training and fine-tuning data.
License: All models are released under the Apache 2.0 license, matching their Ettin backbones.

Benchmark Performance and Industry Impact

On the MTEB(eng, v2) Retrieval benchmark, the Ettin Reranker family demonstrates highly competitive performance. The 1B parameter model (`ettin-reranker-1b-v1`) achieves a mean NDCG@10 score of 0.6114, nearly matching its 1.54B parameter teacher model and outperforming several larger models from competitors like Jina AI and BAAI. This release provides the AI community with a new set of powerful, efficient, and openly licensed tools for building advanced search and retrieval pipelines, directly challenging existing commercial and open-source solutions by offering strong performance across a wide range of model sizes.

The release of the Ettin family underscores a key trend in enterprise AI: the unbundling of search stacks into specialized, highly-optimized components. By providing a spectrum of Apache 2.0-licensed rerankers that outperform or match competitors at various sizes, this work gives developers granular control over the accuracy-latency trade-off without vendor lock-in, challenging the dominance of larger, monolithic models.

>> Verify Original Transmission at Hugging Face