Models & Modalities

Foundation model families and model releases.

Keras

Keras is a high-level deep learning API that enables fast experimentation with neural networks, running on top of TensorFlow and providing an intuitive interface for building and training models.

-- Loading score

PaddlePaddle

An open-source deep learning platform developed by Baidu, providing a comprehensive ecosystem for machine learning and deep learning research and production.

-- Loading score

Tooling for model conversion, quantization, and packaging.

Detectron2

Facebook AI Research's next-generation object detection and segmentation library, offering state-of-the-art algorithms and a rich model zoo.

-- Loading score

Hugging Face Transformers

The model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal, for both inference and training.

-- Loading score

Cross-modal modeling for text, image, audio, and video.

Chandra

Chandra is a high‑accuracy OCR model that converts images and PDFs into structured outputs with layout information.

-- Loading score

ElevenLabs UI

ElevenLabs UI is a component library and custom registry built on top of shadcn/ui to help build multimodal agent interfaces faster.

-- Loading score

LeRobot

An open-source robotics library providing datasets, pretrained policies and simulation environments for reproducible robot learning and deployment.

-- Loading score

LightX2V

LightX2V provides lightweight image-to-vector models and tooling for efficient visual feature extraction and vector retrieval in resource-constrained environments.

-- Loading score

olmOCR

A toolkit for linearizing PDFs and image-based documents into readable plain text and Markdown, aimed at LLM dataset creation and large-scale document processing.

-- Loading score

PaddleOCR

PaddleOCR is a lightweight, high-performance open-source OCR toolkit that supports 100+ languages and converts images or PDFs into structured data.

-- Loading score

vLLM-Omni

A framework for high-performance, cost-efficient inference and serving of omni-modality models across text, image, video, and audio.

-- Loading score

Speech-to-text, text-to-speech, and audio understanding.

AutoSubs

Generate accurate, editable subtitles locally or integrated with DaVinci Resolve.

-- Loading score

CosyVoice

Multilingual, high-quality streaming TTS / speech generation toolkit supporting zero-shot cloning and low-latency generation.

-- Loading score

Fish Speech

State-of-the-art open source text-to-speech system with voice cloning capabilities, supporting multiple languages with natural-sounding output.

-- Loading score

GenMedia Creative Studio

GenMedia Creative Studio is a demo web application built on Vertex AI showcasing image, video, audio, and text-to-speech generation capabilities.

-- Loading score

GPT-SoVITS

GPT-SoVITS is an open-source few-shot voice conversion and TTS WebUI with cross-lingual inference and production-friendly tooling.

-- Loading score

Handy

A free, open-source, extensible offline speech-to-text desktop application that runs Whisper and Parakeet models locally.

-- Loading score

MockingBird

An open-source voice cloning and real-time speech generation toolkit that can clone a speaker from a short sample and synthesize arbitrary speech.

-- Loading score

noScribe

A local-first audio transcription and editing tool for qualitative research and journalism, built on Whisper and Pyannote.

-- Loading score

OpenAI Whisper

Robust speech recognition via large-scale weak supervision, supporting transcription and translation across 100+ languages with state-of-the-art accuracy.

-- Loading score