Infinity

Tracked

An AI-native database that delivers hybrid search over dense vectors, sparse vectors, tensors, full-text and structured data.

Author Infiniflow Open Sourced 2022-07-18 Last Commit Unknown

Infinity is an AI-native inference engine purpose-built for high-performance embeddings, reranking, and classification workloads. It delivers low-latency, high-throughput inference for the most commonly used AI model types in retrieval-augmented generation and search applications through a unified API.

Key Features

  • Optimized inference for popular embedding models and rerankers with millisecond-level query latency
  • Support for multiple data types including dense vectors, sparse vectors, tensors, full-text, and structured fields
  • Developer-friendly Python SDK and single-binary deployment for quick integration
  • Built-in observability and benchmarking tools designed for high-QPS production workloads
  • Hybrid index architecture that unifies vector, sparse, and full-text indexes

Use Cases

  • Powering vector search and retrieval-augmented generation (RAG) systems with low-latency inference
  • Building similarity recommendation engines for e-commerce, content, and media platforms
  • Deploying classification models at scale for enterprise applications
  • Private deployment for compliance-sensitive workloads that cannot use external inference services

Technical Highlights

  • Achieves high QPS throughput through a hybrid index architecture with smart resource management
  • Can run as a standalone binary or be embedded directly in Python processes for flexible deployment
  • Released under the Apache-2.0 license for both community and enterprise adoption