Models & Modalities

Foundation models, model toolkits, and multimodal capabilities.

49 Projects 4 Subcategory 25 Tags

Tracked

Foundation model families and model releases.

AgentScope

Start building LLM-empowered multi-agent applications in an easier way.

-- Loading score

Agno

A unified platform for intelligent agents that supports multimodal and multi-agent systems, integrating over 23 model providers and more than 20 vector stores with prioritized routing design.

-- Loading score

Cherry Studio

AI conversation client with multi-provider integration. Focused on privacy and security with all data stored locally.

-- Loading score

Data Prep Kit

Data Prep Kit accelerates unstructured data preparation for LLM applications.

-- Loading score

Eino

Eino is a Go-centered framework for building LLM applications, focusing on composability, stream processing, and production readiness.

-- Loading score

Free LLM API resources

A community-maintained list of LLM providers and gateways offering free or trial API access.

-- Loading score

GenAI Agents

Explore GenAI Agents: a comprehensive resource for developing generative AI agents with 45+ cases, covering diverse applications and fostering a vibrant developer community.

-- Loading score

GuideLLM

GuideLLM offers tooling for guiding, interpreting, and controlling large language models (LLMs), enabling better controllability in interactive applications.

-- Loading score

k8sgpt

An AI tool that provides diagnostic and analysis capabilities for Kubernetes, using LLM to locate and explain cluster issues.

-- Loading score

Keras

Keras is a high-level deep learning API that enables fast experimentation with neural networks, running on top of TensorFlow and providing an intuitive interface for building and training models.

-- Loading score

LangExtract

A Python library that uses LLMs to extract structured information from unstructured text and provides interactive visualization for review.

-- Loading score

Langfuse

Discover Langfuse, the open-source platform for LLM development, enhancing collaboration, monitoring, and debugging for AI applications.

-- Loading score

LLaMA Factory

A comprehensive framework for fine-tuning LLaMA models with multiple training methods, efficient algorithms, and easy-to-use interface for both research and production environments.

-- Loading score

Local Deep Researcher

A fully local web research and report-writing assistant that uses local LLMs (e.g. Ollama/LMStudio) to iteratively search, summarize and refine findings.

-- Loading score

Mindcraft

A Minecraft multi-agent and agent framework that integrates LLMs with Mineflayer to build programmable, collaborative bots and task suites.

-- Loading score

MLX LM

A Python toolkit for running and fine-tuning LLMs on Apple Silicon, with support for quantization, distributed inference and Hugging Face integration.

-- Loading score

Ollama

Local large language model runner that enables users to easily run and manage various open-source LLM models in local environments.

-- Loading score

PaddlePaddle

An open-source deep learning platform developed by Baidu, providing a comprehensive ecosystem for machine learning and deep learning research and production.

-- Loading score

Pydantic AI

Pydantic AI — a next-generation AI framework built by the Pydantic and FastAPI teams for building structured, production-grade AI systems with strong data validation and real-time outputs.

-- Loading score

Cross-modal modeling for text, image, audio, and video.

Chandra

Chandra is a high‑accuracy OCR model that converts images and PDFs into structured outputs with layout information.

-- Loading score

ElevenLabs UI

ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help build multimodal agent interfaces faster.

-- Loading score

LeRobot

An open-source robotics library providing datasets, pretrained policies and simulation environments for reproducible robot learning and deployment.

-- Loading score

LightX2V

LightX2V provides lightweight image-to-vector models and tooling for efficient visual feature extraction and vector retrieval in resource-constrained environments.

-- Loading score

Midscene.js

A vision-language-model driven, cross-platform UI automation framework that uses screenshots for visual localization and interaction.

-- Loading score

NeMo

NVIDIA's NeMo framework for speech, TTS, multimodal and LLM training & fine-tuning.

-- Loading score

Next AI Draw.io

A Next.js web application that integrates AI into draw.io to support natural-language-driven diagram creation and enhancement.

-- Loading score

olmOCR

A toolkit for linearizing PDFs and image-based documents into readable plain text and Markdown, aimed at LLM dataset creation and large-scale document processing.

-- Loading score

Open Notebook

An open-source, privacy-focused notebook and research platform that supports multi-model integration and multimodal content management.

-- Loading score

PaddleOCR

PaddleOCR is a lightweight, high-performance open-source OCR toolkit that supports 100+ languages and converts images or PDFs into structured data.

-- Loading score

Pixeltable

A declarative data infrastructure for multimodal AI workloads that simplifies storage, indexing, and inference.

-- Loading score

vLLM-Omni

A framework for high-performance, cost-efficient inference and serving of omni-modality models across text, image, video, and audio.

-- Loading score

Speech-to-text, text-to-speech, and audio understanding.

AutoSubs

Generate accurate, editable subtitles locally or integrated with DaVinci Resolve.

-- Loading score

CosyVoice

Multilingual, high-quality streaming TTS / speech generation toolkit supporting zero-shot cloning and low-latency generation.

-- Loading score

GenMedia Creative Studio

GenMedia Creative Studio is a demo web application built on Vertex AI showcasing image, video, audio, and text-to-speech generation capabilities.

-- Loading score

GPT-SoVITS

GPT-SoVITS is an open-source few-shot voice conversion and TTS WebUI with cross-lingual inference and production-friendly tooling.

-- Loading score

Handy

A free, open-source, extensible offline speech-to-text desktop application that runs Whisper and Parakeet models locally.

-- Loading score

LiveKit Agents

A framework for building real-time, multimodal voice agents, integrating WebRTC and an extensible plugin ecosystem.

-- Loading score

MockingBird

An open-source voice cloning and real-time speech generation toolkit that can clone a speaker from a short sample and synthesize arbitrary speech.

-- Loading score

noScribe

A local-first audio transcription and editing tool for qualitative research and journalism, built on Whisper and Pyannote.

-- Loading score

Pipecat

An open-source framework for real-time voice and multimodal agents, supporting low-latency voice interaction and multi-platform SDKs.

-- Loading score

pyvideotrans

pyvideotrans translates videos between languages and generates dubbing audio.

-- Loading score

TEN Framework

An open-source framework and ecosystem for real-time, multimodal conversational voice and agent applications.

-- Loading score

Vibe

A privacy-first, cross-platform audio/video transcription tool that supports fully offline operation and batch processing.

-- Loading score

VibeVoice

Explore VibeVoice, a cutting-edge TTS framework for long, expressive audio synthesis. Ideal for research, media prototyping, and academic evaluation.

-- Loading score

Vosk API

Vosk API provides offline speech recognition for Android, iOS, Raspberry Pi and servers with bindings for Python, Java, C# and Node.

-- Loading score

whisper.cpp

whisper.cpp is a high-performance local Whisper implementation for speech recognition across edge devices and desktop platforms.

-- Loading score

Image/video generation models and creative pipelines.