Knowledge & Context

Retrieval, memory, indexing, knowledge organization, and data connectivity.

88 Projects 6 Subcategory 35 Tags
Tracked

Memory layers, context compression, and state management for AI systems.

Acontext

A context data platform for self-learning agents to store, observe, and distill experiences.

-- Loading score

Agentic Context Engine

Agentic Context Engine (ACE) is a framework and implementation for enabling agents to learn from experience through structured context engineering.

-- Loading score

AgentMemory

Persistent memory layer for AI coding agents, enabling cross-session context retention based on real-world benchmarks.

-- Loading score

Basic Memory

A local-first knowledge-as-Markdown system that lets LLMs read and write your memory via the Model Context Protocol (MCP).

-- Loading score

Claude Mem

A Claude Code plugin that captures coding-session context, compresses it with AI, and injects relevant memory into future sessions.

-- Loading score

Colossal-AI

Discover Colossal-AI: an open-source solution for efficient large-scale training and inference, featuring advanced parallelism and memory management for optimal performance.

-- Loading score

Detectron2

Facebook AI Research's next-generation object detection and segmentation library, offering state-of-the-art algorithms and a rich model zoo.

-- Loading score

Flash Attention

Fast and memory-efficient exact attention implementation optimized for large Transformer training and inference.

-- Loading score

FlashMLA

Efficient multi-head latent attention kernels designed to accelerate large-scale Transformer training and inference with reduced memory footprint.

-- Loading score

Letta

Platform for building stateful agents with advanced memory and self-improvement capabilities, supporting both local and cloud deployments.

-- Loading score

LMCache

A high-performance KV cache layer for LLM serving that reduces time-to-first-token and increases throughput, especially for long-context and multi-turn scenarios.

-- Loading score

LocalRecall

LocalRecall provides a local memory layer and knowledge base management API for agents and RAG scenarios.

-- Loading score

Mem0

Mem0 is a scalable memory layer for AI agents that provides long-term, personalized, and efficient memory storage and retrieval.

-- Loading score

MemOS

MemOS is an open-source Memory OS that provides long-term memory capabilities for large language models (LLMs), improving context awareness and long-term consistency.

-- Loading score

memU

memU is an open-source memory framework for AI companions, offering high accuracy, fast retrieval, and low cost for personalized AI experiences.

-- Loading score

OpenHuman

OpenHuman is an open-source personal AI super intelligence assistant focused on privacy, simplicity, and power, featuring 118+ third-party integrations, local memory trees, an Obsidian wiki, and native voice interaction.

-- Loading score

Supermemory

A high-performance, scalable memory engine and app providing a Memory API for storing, retrieving, and interacting with content in the AI era.

-- Loading score

TencentDB Agent Memory

Tencent's local long-term memory system for AI agents, powered by a 4-tier progressive pipeline with zero external API dependencies.

-- Loading score

Vector stores, ANN engines, and similarity search.

Infinity

An AI-native database that delivers hybrid search over dense vectors, sparse vectors, tensors, full-text and structured data.

-- Loading score

Langflow

A visual platform to build, test and deploy AI agents and workflows, with multi-model and vector DB integrations.

-- Loading score

Milvus

Milvus is a high-performance vector database designed for large-scale unstructured data processing.

-- Loading score

Qdrant

Discover Qdrant, a high-performance vector search engine that enhances similarity search and scalable deployment for efficient data retrieval.

-- Loading score

SeekDB

An AI-native search database that unifies vector, text, and structured data in a single engine to enable hybrid search and in-database AI workflows.

-- Loading score

sqlite-vector

Integrates embedding storage and vector search into SQLite, providing a cross-platform lightweight vector database extension.

-- Loading score

Chunking, retrieval, reranking, and indexing pipelines.

Agents Towards Production

Open-source playbook and toolkit for building production-ready AI agents, covering the full lifecycle from prototype to enterprise deployment.

-- Loading score

Airweave

Airweave lets agents search any app by connecting to apps, productivity tools, databases and document stores and turning their contents into searchable knowledge bases.

-- Loading score

BISHENG

An open-source LLM DevOps platform for enterprise scenarios, offering workflows, RAG, model management and observability.

-- Loading score

Chroma

Chroma is an open-source embedding database for AI applications, enabling efficient search, storage, and retrieval for intelligent RAG systems.

-- Loading score

CocoIndex

A high-performance data processing and indexing framework for AI, supporting incremental processing and semantic indexing.

-- Loading score

DB-GPT

DB-GPT is an open-source framework focused on data-native applications, integrating RAG, Text2SQL, and multi-backend adapters to simplify building intelligent database-driven apps.

-- Loading score

DocsGPT

An open-source enterprise document agent platform combining RAG and multi-model support to provide citation-backed answers.

-- Loading score

Embedding Atlas

A tool that provides interactive visualizations for large embeddings, allowing you to visualize, cross-filter, and search embeddings and metadata.

-- Loading score

FastGPT

Discover FastGPT: a powerful platform for seamless data processing and AI workflow orchestration, enabling easy development of advanced question-answering systems.

-- Loading score

FinGPT

Open-source financial large language models with data pipelines, instruction tuning datasets, benchmarks and RAG toolkits.

-- Loading score

Firecrawl

The Web Data API for AI that turns entire websites into clean markdown or structured data for RAG and knowledge pipelines.

-- Loading score

Generative AI on Google Cloud

Sample code and notebooks demonstrating how to build and deploy generative AI workflows on Vertex AI and Gemini.

-- Loading score

Genkit

An open-source framework by Firebase for building production-grade, full-stack AI applications with multi-language SDKs and model provider integrations.

-- Loading score

GraphRAG

Discover GraphRAG, an open-source project by Microsoft Research for extracting structured knowledge from text, enhancing retrieval and enabling advanced temporal queries.

-- Loading score

Haystack

Haystack is an open-source framework for building retrieval-augmented generation (RAG) and semantic search applications by combining document stores, vector search, and LLMs.

-- Loading score

Kaito

Kaito is a Kubernetes AI Toolchain Operator that automates deployment and management of large-model inference and tuning workflows, with built-in RAG support and node auto-provisioning.

-- Loading score

Keploy

A developer-centric API and integration testing tool that auto-generates tests and data mocks from real traffic, supporting record-and-replay of API calls, database operations, and streaming events.

-- Loading score

Khoj

A self-hostable 'second brain' platform that turns web pages and documents into a searchable knowledge base and supports custom agents and automations.

-- Loading score

LanceDB

Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.

-- Loading score

LangChain

A framework for building LLM-powered applications with composable components and rich integrations.

-- Loading score

LangChain4j

An open-source Java library that provides a unified API for integrating large language models and vector databases into enterprise Java applications.

-- Loading score

LightRAG

LightRAG is a lightweight Retrieval-Augmented Generation toolkit that supports document indexing, graph extraction, and deployable server/core modes.

-- Loading score

LlamaFarm

LlamaFarm is an open-source platform for deploying AI models, agents, vector databases, and RAG pipelines locally or remotely in minutes.

-- Loading score

LlamaIndex

LlamaIndex is a data framework for LLM applications that helps structure and connect private data sources to models for retrieval-augmented generation.

-- Loading score

LocalGPT

A private, on-premise document intelligence platform that combines hybrid retrieval and multi-model inference while keeping all data local.

-- Loading score

Marker

Converts PDF, image, PPTX, DOCX, XLSX, HTML, EPUB files to markdown, JSON, chunks, and HTML quickly and accurately.

-- Loading score

MaxKB

MaxKB: an open-source enterprise agent platform with RAG pipelines, agent workflows and multimodal support for knowledge bases and customer service.

-- Loading score

Memori

An open-source SQL-native memory engine that provides persistent, queryable context for Large Language Models.

-- Loading score

Memvid

Encode millions of text chunks into portable MP4 files for millisecond semantic search and offline-first AI memory.

-- Loading score

mgrep

A CLI-native semantic search tool for code, documents and media, with background indexing and agent integrations.

-- Loading score

MineContext

MineContext is a proactive, context-aware AI partner combining Context-Engineering with ChatGPT Pulse to improve dialogue coherence and retrieval in RAG scenarios.

-- Loading score

OpenViking

OpenViking is an open-source context database for AI Agents that unifies memories, resources, and skills with a filesystem paradigm for hierarchical retrieval and observability.

-- Loading score

PageIndex

PageIndex (by Vectify AI) is an open-source reasoning-based document index designed for high-accuracy retrieval over long documents.

-- Loading score

PandaWiki

PandaWiki is an open-source knowledge base system driven by large models, enabling fast building of intelligent documentation, FAQ and blog centers.

-- Loading score

Pathway LLM App

Production-ready templates for RAG and AI pipelines that support live data synchronization and large-scale document indexing.

-- Loading score

Perplexica

Perplexica is an open source AI-powered search engine positioned as an alternative to Perplexity AI.

-- Loading score

RAG-Anything

A multimodal document processing and Retrieval-Augmented Generation (RAG) system supporting unified parsing and intelligent retrieval of text, images, tables, formulas, and more.

-- Loading score

RAGFlow

An open-source RAG engine based on deep document understanding, supporting complex document parsing and knowledge Q&A

-- Loading score

Scira

A minimalistic AI-powered search engine that finds information on the web and provides citations, serving as an open-source Perplexity alternative.

-- Loading score

SearXNG

A free, privacy-preserving internet metasearch engine that aggregates results from multiple search services and databases without user tracking.

-- Loading score

SemTools

A command-line toolkit for semantic search, embedding generation, and document parsing for local and CI workflows.

-- Loading score

text-embeddings-inference

Hugging Face's text-embeddings-inference provides an out-of-the-box text vectorization inference service, making it easy to build similarity search and semantic search applications.

-- Loading score

Tongyi DeepResearch

An open research agent and toolset for long-horizon information-seeking and agentic tasks, developed by Tongyi Lab (Alibaba-NLP).

-- Loading score

UltraRAG

A low-code RAG framework based on MCP, emphasizing visual orchestration and reproducible evaluation workflows.

-- Loading score

Unstructured

An open-source ETL solution to convert complex documents into clean, structured formats for language-model workflows.

-- Loading score

Vanna

Vanna is an open-source RAG framework that converts natural language questions into executable SQL and runs them against local databases.

-- Loading score

Vespa

Vespa is a distributed engine designed for online AI and big-data workloads. It excels at low-latency retrieval and inference, supporting vector search, custom scoring, and near-real-time indexing.

-- Loading score

Weaviate

Weaviate is an open-source, cloud-native vector database for storing objects and vectors, enabling scalable semantic search and structured filtering for AI applications.

-- Loading score

WeKnora

WeKnora — an open-source document understanding and retrieval framework from Tencent that combines LLMs and RAG for multimodal document search and knowledge graph construction.

-- Loading score

Wren AI

Open-source GenBI agent for querying databases in natural language and producing SQL, charts and AI-generated insights.

-- Loading score

OCR, parsing, extraction, and document understanding.

MinerU

MinerU is a high-precision PDF document parsing tool that converts complex PDFs into machine-readable Markdown and JSON formats, supporting formula, table, image extraction and multilingual OCR.

-- Loading score

pdfplumber

An open-source Python library built on pdfminer.six that exposes detailed PDF objects, table extraction, and visual debugging features.

-- Loading score

Stirling PDF

An open-source, self-hosted web PDF editor and processing platform that supports a wide range of PDF operations.

-- Loading score

Tesseract OCR

Tesseract is a powerful open-source Optical Character Recognition (OCR) engine supporting over 100 languages, widely used for text extraction and document digitization.

-- Loading score

Entity graphs, relationship modeling, and graph retrieval.

CodeGraph

Pre-indexed code knowledge graph for AI coding agents, supporting Claude Code, Codex, Cursor, and OpenCode with 100% local execution.

-- Loading score

DeepTutor

A multi-agent personalized learning system integrating RAG, knowledge graphs, and interactive visualizations.

-- Loading score

Data ingestion, connectors, and synchronization pipelines.

DataTrove

DataTrove provides composable, platform-agnostic pipelines for large-scale text data processing, including extraction, filtering, deduplication and saving.

-- Loading score

DSPy

DSPy is an open-source framework that favors programming over prompting to build composable, self-improving AI pipelines.

-- Loading score

Gravitino

A high-performance, geo-distributed and federated metadata lake for unified metadata access and governance of data and AI assets.

-- Loading score

ROLL

Reinforcement Learning Optimization platform for large-scale training and pipelines.

-- Loading score

spaCy

A high-performance, production-ready open-source natural language processing library providing pretrained pipelines, training tools, and extensible language components.

-- Loading score

Unity Catalog

An open, multimodal catalog for data and AI that provides unified governance, metadata management, and access control.

-- Loading score