Airbyte

Tracked

Open-source data movement platform for ELT pipelines and AI agents, moving data from APIs, databases, and files to warehouses, lakes, and AI applications.

Author Airbyte Open Sourced 2020-07-27 Last Commit Unknown

Overview

Airbyte is the leading open-source data integration platform that moves data from APIs, databases, and files to data warehouses, data lakes, and AI applications. With 350+ connectors and growing AI agent support, it serves as the data backbone for building RAG pipelines and AI-powered data applications.

Key Features

  • 350+ pre-built connectors for databases, APIs, SaaS platforms, and files
  • ELT architecture with support for incremental and full refresh syncs
  • AI-ready data pipelines for RAG and agent-based applications
  • Self-hosted or cloud deployment options
  • Change data capture (CDC) for real-time data synchronization

Use Cases

  • Building data pipelines to feed RAG knowledge bases
  • Syncing enterprise data to vector databases for AI search
  • Creating unified data layers for AI agent tool access
  • ETL workflows for machine learning feature engineering

Technical Details

  • Built with Java and Python, containerized with Docker
  • Supports dbt transformations within pipelines
  • Connector Development Kit (CDK) for custom connector creation
  • Python and PyAirbyte SDK for programmatic pipeline control