Patent Retrieval & Re-ranking (Dense + Cross-Encoders)

Research-oriented patent retrieval pipeline combining dense retrieval with transformer cross-encoder re-ranking, evaluated using standard IR metrics.

This project builds an end-to-end patent retrieval and re-ranking system designed for rigorous research evaluation. The pipeline first retrieves candidates using dense vector search, then improves ranking quality with a transformer-based cross-encoder that scores query–document relevance more precisely.

The system supports experimentation with embedding models, re-rankers, and fusion strategies, and reports standard IR metrics such as Recall@k, MAP, and MRR (Mean Reciprocal Rank). The repository is organized into clear notebooks/scripts for training, inference, and analysis.


Pipeline Overview

Two-Stage Retrieval
  1. Stage 1 — Dense Retrieval: encode queries and documents into embeddings and retrieve top-k candidates via ANN search.
  2. Stage 2 — Cross-Encoder Re-ranking: score each (query, candidate) pair with a transformer model and re-order results.
Key Components
  • Dense encoder(s) for initial retrieval (bi-encoder style)
  • Cross-encoder re-ranker for fine-grained relevance scoring
  • Evaluation suite with IR metrics + plots
  • Configurable experiments for ablation/comparison

Evaluation & Metrics


Qualitative Analysis

Patent Retrieval Pipeline Overview

This figure summarizes the two-stage retrieval and re-ranking workflow.


Repository

The complete implementation, retrieval pipelines, re-ranking models, evaluation scripts, and detailed documentation are available at: github.com/md-naim-hassan-saykat/ir-patent-reranking