Patent Retrieval & Re-ranking (Dense + Cross-Encoders)
Research-oriented patent retrieval pipeline combining dense retrieval with transformer cross-encoder re-ranking, evaluated using standard IR metrics.
This project builds an end-to-end patent retrieval and re-ranking system designed for rigorous research evaluation. The pipeline first retrieves candidates using dense vector search, then improves ranking quality with a transformer-based cross-encoder that scores query–document relevance more precisely.
The system supports experimentation with embedding models, re-rankers, and fusion strategies, and reports standard IR metrics such as Recall@k, MAP, and MRR (Mean Reciprocal Rank). The repository is organized into clear notebooks/scripts for training, inference, and analysis.
Pipeline Overview
Two-Stage Retrieval
- Stage 1 — Dense Retrieval: encode queries and documents into embeddings and retrieve top-k candidates via ANN search.
- Stage 2 — Cross-Encoder Re-ranking: score each (query, candidate) pair with a transformer model and re-order results.
Key Components
- Dense encoder(s) for initial retrieval (bi-encoder style)
- Cross-encoder re-ranker for fine-grained relevance scoring
- Evaluation suite with IR metrics + plots
- Configurable experiments for ablation/comparison
Evaluation & Metrics
- Recall@k
- MAP
- MRR / Mean Rank
Qualitative Analysis
Patent Retrieval Pipeline Overview
This figure summarizes the two-stage retrieval and re-ranking workflow.
Repository
The complete implementation, retrieval pipelines, re-ranking models, evaluation scripts, and detailed documentation are available at: github.com/md-naim-hassan-saykat/ir-patent-reranking