Agentset

Tracked

An open-source platform for retrieval-augmented generation (RAG) that simplifies multi-format ingestion, partitioning, and citation-aware retrieval.

Author Agentset Open Sourced 2025-03-10 Last Commit Unknown

Overview

Agentset is an open-source RAG platform that helps developers and researchers build citation-aware agents with deep research capabilities. It supports 22+ file formats out of the box and provides built-in citations, partitions, and an MCP server to streamline connecting external knowledge into an agent's context for improved accuracy and traceability.

Key Features

  • Multi-format ingestion supporting 22+ file types with automatic partitioning to reduce preprocessing overhead
  • Built-in citation pipeline that links outputs to source document locations for verification and traceability
  • Compatible with multiple vector databases and retrieval components, plus an integrated MCP server
  • SDKs and examples for building multi-step, agentic workflows with deep research capabilities

Use Cases

Enterprise knowledge QA with citation-backed assistants, rapid RAG prototyping and retrieval strategy evaluation, compliance and auditing workflows requiring traceable answers, and multi-format document processing that normalizes diverse assets into a unified retrieval corpus.

Technical Details

Built on modern embeddings and vector search with partitioning and caching strategies to optimize context window usage. Features configurable retrieval and re-ranking pipelines compatible with mainstream LLMs and inference services. MIT-licensed and suitable for both extension and enterprise deployment.