Overview
Agentset is an open-source RAG platform that helps developers and researchers build citation-aware agents with deep research capabilities. It supports 22+ file formats out of the box and provides built-in citations, partitions, and an MCP server to streamline connecting external knowledge into an agent's context for improved accuracy and traceability.
Key Features
- Multi-format ingestion supporting 22+ file types with automatic partitioning to reduce preprocessing overhead
- Built-in citation pipeline that links outputs to source document locations for verification and traceability
- Compatible with multiple vector databases and retrieval components, plus an integrated MCP server
- SDKs and examples for building multi-step, agentic workflows with deep research capabilities
Use Cases
Enterprise knowledge QA with citation-backed assistants, rapid RAG prototyping and retrieval strategy evaluation, compliance and auditing workflows requiring traceable answers, and multi-format document processing that normalizes diverse assets into a unified retrieval corpus.
Technical Details
Built on modern embeddings and vector search with partitioning and caching strategies to optimize context window usage. Features configurable retrieval and re-ranking pipelines compatible with mainstream LLMs and inference services. MIT-licensed and suitable for both extension and enterprise deployment.