AG AgentScope
Start building LLM-empowered multi-agent applications in an easier way.
Foundation models, model toolkits, and multimodal capabilities.
Foundation model families and model releases.
AG Start building LLM-empowered multi-agent applications in an easier way.
AG A unified platform for intelligent agents that supports multimodal and multi-agent systems, integrating over 23 model providers and more than 20 vector stores with prioritized routing design.
CH AI conversation client with multi-provider integration. Focused on privacy and security with all data stored locally.
DA Data Prep Kit accelerates unstructured data preparation for LLM applications.
EI Eino is a Go-centered framework for building LLM applications, focusing on composability, stream processing, and production readiness.
FR A community-maintained list of LLM providers and gateways offering free or trial API access.
GE Explore GenAI Agents: a comprehensive resource for developing generative AI agents with 45+ cases, covering diverse applications and fostering a vibrant developer community.
GU GuideLLM offers tooling for guiding, interpreting, and controlling large language models (LLMs), enabling better controllability in interactive applications.
K8 An AI tool that provides diagnostic and analysis capabilities for Kubernetes, using LLM to locate and explain cluster issues.
KE Keras is a high-level deep learning API that enables fast experimentation with neural networks, running on top of TensorFlow and providing an intuitive interface for building and training models.
LA A Python library that uses LLMs to extract structured information from unstructured text and provides interactive visualization for review.
LA Discover Langfuse, the open-source platform for LLM development, enhancing collaboration, monitoring, and debugging for AI applications.
LL A comprehensive framework for fine-tuning LLaMA models with multiple training methods, efficient algorithms, and easy-to-use interface for both research and production environments.
LO A fully local web research and report-writing assistant that uses local LLMs (e.g. Ollama/LMStudio) to iteratively search, summarize and refine findings.
MI A Minecraft multi-agent and agent framework that integrates LLMs with Mineflayer to build programmable, collaborative bots and task suites.
ML A Python toolkit for running and fine-tuning LLMs on Apple Silicon, with support for quantization, distributed inference and Hugging Face integration.
OL Local large language model runner that enables users to easily run and manage various open-source LLM models in local environments.
PA An open-source deep learning platform developed by Baidu, providing a comprehensive ecosystem for machine learning and deep learning research and production.
PY Pydantic AI — a next-generation AI framework built by the Pydantic and FastAPI teams for building structured, production-grade AI systems with strong data validation and real-time outputs.
Cross-modal modeling for text, image, audio, and video.
CH Chandra is a high‑accuracy OCR model that converts images and PDFs into structured outputs with layout information.
EL ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help build multimodal agent interfaces faster.
LE An open-source robotics library providing datasets, pretrained policies and simulation environments for reproducible robot learning and deployment.
LI LightX2V provides lightweight image-to-vector models and tooling for efficient visual feature extraction and vector retrieval in resource-constrained environments.
MI A vision-language-model driven, cross-platform UI automation framework that uses screenshots for visual localization and interaction.
NE NVIDIA's NeMo framework for speech, TTS, multimodal and LLM training & fine-tuning.
NE A Next.js web application that integrates AI into draw.io to support natural-language-driven diagram creation and enhancement.
OL A toolkit for linearizing PDFs and image-based documents into readable plain text and Markdown, aimed at LLM dataset creation and large-scale document processing.
OP An open-source, privacy-focused notebook and research platform that supports multi-model integration and multimodal content management.
PA PaddleOCR is a lightweight, high-performance open-source OCR toolkit that supports 100+ languages and converts images or PDFs into structured data.
A declarative data infrastructure for multimodal AI workloads that simplifies storage, indexing, and inference.
VL A framework for high-performance, cost-efficient inference and serving of omni-modality models across text, image, video, and audio.
Speech-to-text, text-to-speech, and audio understanding.
AU Generate accurate, editable subtitles locally or integrated with DaVinci Resolve.
CO Multilingual, high-quality streaming TTS / speech generation toolkit supporting zero-shot cloning and low-latency generation.
GE GenMedia Creative Studio is a demo web application built on Vertex AI showcasing image, video, audio, and text-to-speech generation capabilities.
GP GPT-SoVITS is an open-source few-shot voice conversion and TTS WebUI with cross-lingual inference and production-friendly tooling.
HA A free, open-source, extensible offline speech-to-text desktop application that runs Whisper and Parakeet models locally.
LI A framework for building real-time, multimodal voice agents, integrating WebRTC and an extensible plugin ecosystem.
MO An open-source voice cloning and real-time speech generation toolkit that can clone a speaker from a short sample and synthesize arbitrary speech.
NO A local-first audio transcription and editing tool for qualitative research and journalism, built on Whisper and Pyannote.
PI An open-source framework for real-time voice and multimodal agents, supporting low-latency voice interaction and multi-platform SDKs.
PY pyvideotrans translates videos between languages and generates dubbing audio.
TE An open-source framework and ecosystem for real-time, multimodal conversational voice and agent applications.
VI A privacy-first, cross-platform audio/video transcription tool that supports fully offline operation and batch processing.
VI Explore VibeVoice, a cutting-edge TTS framework for long, expressive audio synthesis. Ideal for research, media prototyping, and academic evaluation.
VO Vosk API provides offline speech recognition for Android, iOS, Raspberry Pi and servers with bindings for Python, Java, C# and Node.
WH whisper.cpp is a high-performance local Whisper implementation for speech recognition across edge devices and desktop platforms.
Image/video generation models and creative pipelines.
CO A node-based visual workflow builder for Stable Diffusion, enabling graphical assembly and debugging of image-generation pipelines.
DE Deep-Live-Cam is an open-source real-time face swap and avatar tool that runs offline for creators and streamers.
HU Diffusers: a modular toolbox for state-of-the-art pretrained diffusion models for image, audio and 3D generation, suitable for inference and training.
No projects match the current filters.