CodeBoarding

Initializing diagram...

Details

The BCEmbedding project is designed as a modular ML toolkit primarily focused on providing high-performance embedding and reranking capabilities for Retrieval Augmented Generation (RAG) and semantic search, especially in bilingual and crosslingual contexts.

Embedding Models

Core component for generating dense vector representations of text, foundational for retrieval tasks.

Reranker Models

Implements various models (bi-encoder, cross-encoder) to re-order retrieved documents based on relevance to a query.

Model Preprocessing

Handles the preparation of raw input data, including tokenization and input merging, for consumption by both embedding and reranker models.

Evaluation Framework

Orchestrates the evaluation of models, particularly rerankers, by loading datasets, computing performance metrics, and integrating with specialized evaluators.

RAG Integration

Provides interfaces and utilities for seamlessly integrating BCEmbedding's models into RAG pipelines built with frameworks like Langchain and LlamaIndex, including data extraction and retrieval logic.