Details
The BCEmbedding project is designed as a modular ML toolkit primarily focused on providing high-performance embedding and reranking capabilities for Retrieval Augmented Generation (RAG) and semantic search, especially in bilingual and crosslingual contexts.
Embedding Models
Core component for generating dense vector representations of text, foundational for retrieval tasks.
Reranker Models
Implements various models (bi-encoder, cross-encoder) to re-order retrieved documents based on relevance to a query.
Model Preprocessing
Handles the preparation of raw input data, including tokenization and input merging, for consumption by both embedding and reranker models.
Evaluation Framework
Orchestrates the evaluation of models, particularly rerankers, by loading datasets, computing performance metrics, and integrating with specialized evaluators.
RAG Integration
Provides interfaces and utilities for seamlessly integrating BCEmbedding's models into RAG pipelines built with frameworks like Langchain and LlamaIndex, including data extraction and retrieval logic.