AX AXLearn
An extensible deep learning library built on JAX/XLA, designed for developing, training and deploying large-scale models.
Training frameworks, fine-tuning, evaluation, observability, and optimization workflows.
Training frameworks and distributed training systems.
AX An extensible deep learning library built on JAX/XLA, designed for developing, training and deploying large-scale models.
CL ClearML is an open-source MLOps platform providing experiment tracking, data management, pipelines and model serving.
CO Discover Colossal-AI: an open-source solution for efficient large-scale training and inference, featuring advanced parallelism and memory management for optimal performance.
DE An efficient expert-parallel communication library that provides low-overhead communication primitives for large-scale distributed training.
DE A high-performance library for training and inference that dramatically speeds up large-scale deep learning while reducing cost.
DL DLRover is an automatic distributed deep learning system that provides elastic scheduling, flash checkpointing and auto-scaling to simplify large-scale model training on Kubernetes and Ray.
EA EasyR1 is an efficient, scalable RL training framework for multimodal models, based on veRL and optimized for large-model training.
GY An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym).
JA High-performance Python library for accelerator-oriented array computation and composable program transformations.
LI A fast, distributed, high-performance gradient boosting framework for decision tree algorithms, widely used for ranking, classification, and large-scale ML tasks.
LI A high-performance, engineering-focused LLM toolkit that provides end-to-end recipes and practical tutorials for training and deploying large models.
MA A high-performance, highly scalable open-source LLM library and reference implementation built with Python and JAX, targeting Google Cloud TPUs and GPUs.
ME Reference implementation from NVIDIA for large-scale model training and inference with distributed optimizations.
MO An AI toolkit for healthcare imaging focused on deep learning workflows for medical image processing and analysis.
NA A minimal, fast repository for training and fine-tuning medium-sized GPT models, suitable for teaching and experiments.
NE NVIDIA's NeMo framework for speech, TTS, multimodal and LLM training & fine-tuning.
NE NeMo RL is a scalable post-training reinforcement learning library for large models, supporting high-performance distributed training and multiple backends.
OP An end-to-end framework for creating, deploying and using isolated execution environments, aimed at agentic RL training and environment development.
PY An open-source deep learning framework for fast, flexible research and production, featuring dynamic computation graphs and strong GPU acceleration.
PY PyTorch Lightning is an open-source framework that streamlines PyTorch training, enabling efficient model development, training, and deployment.
RL RLinf is a flexible and scalable open-source RL infrastructure designed for Embodied and Agentic AI, supporting PPO, GRPO, SAC and more, with seamless scaling to large GPU clusters.
RO Reinforcement Learning Optimization platform for large-scale training and pipelines.
SK A modular full-stack reinforcement learning (RL) library for large language models (LLMs), designed for long-horizon, real-world tasks.
TE Google's open-source end-to-end machine learning platform for building and training deep learning models.
TO A PyTorch-native platform for generative model pretraining and distributed optimization.
VE A reinforcement learning training framework for large models, designed for scalable RLHF and agent training.
XL xLLM is an open-source framework for vision-language models, providing tools and documentation for training and inference.
SFT, RLHF, preference optimization, and alignment.
AR A fully asynchronous reinforcement learning system for large reasoning and agentic models that emphasizes scalability and reproducibility.
AX A free and open-source LLM post-training and fine-tuning framework that supports multiple models, training methods, and distributed optimizations.
HE Heretic is a fully automated tool that removes censorship (aka "safety alignment") from transformer-based language models without expensive post-training. It combines an advanced implementation of directional ablation, also known as "abliteration," with a TPE-based parameter optimizer powered by Optuna to automatically find high-quality ablation parameters by co-minimizing refusals and KL divergence from the original model.
LL A comprehensive framework for fine-tuning LLaMA models with multiple training methods, efficient algorithms, and easy-to-use interface for both research and production environments.
LM An extensible, convenient, and efficient toolbox for fine-tuning and inference of large foundation models.
ML A local-first toolkit for inference and fine-tuning of vision-language and omni models using MLX, optimized for macOS and general hardware.
MS SWIFT from ModelScope: a scalable, lightweight infrastructure for fine-tuning, evaluating and deploying large and multimodal models, with training, quantization and inference acceleration support.
OP An easy-to-use, high-performance open-source RLHF framework built on Ray, vLLM and DeepSpeed, supporting distributed and hybrid-engine training.
PE State-of-the-art parameter-efficient fine-tuning methods for large language models, enabling adapter-based training with minimal GPU resources.
TO A PyTorch-native post-training and fine-tuning toolkit providing reusable recipes, optimizations, and quantization support for LLM training and evaluation.
TR Explore Transformer Lab, the open-source app for downloading and fine-tuning large models locally or in the cloud with powerful tools and multi-engine support.
TR TRL is an open-source toolkit from Hugging Face for reinforcement learning training on transformer models.
TU Tunix is a JAX-native post-training library for LLMs providing efficient fine-tuning, RL training, and distillation tools.
UN High-performance toolkit for fine-tuning and reinforcement learning of large models, with memory-efficient kernels and wide model support.
Experiment tracking, model ops, and training pipelines.
GO Google Research aggregates open-source research code and datasets from Google, covering machine learning, vision, NLP and other research areas.
SK Skypilot is an open-source tool to automate distributed training and inference across cloud and on-premises clusters, simplifying resource provisioning and environment setup.
SL Slurm is an open-source cluster resource management and job scheduling system that is simple, scalable, portable, fault-tolerant, and interconnect agnostic, widely used in high-performance computing and AI training clusters.
SW SwanLab is an open-source, modern training tracking and visualization tool that supports cloud and self-hosted deployment.
WE A machine learning development and observability platform for tracking experiments, managing models and artifacts, and visualizing results across the ML lifecycle.
ZE A unified MLOps framework to develop, evaluate and deploy everything from classical models to multi-agent AI systems.
Evaluation frameworks, benchmark suites, and datasets.
AG Agenta is an open-source LLMOps platform that combines prompt management, evaluation, and observability to help teams ship reliable LLM applications faster.
DE DeepEval is an open-source LLM evaluation framework that provides modular metrics and tooling for testing LLM systems and RAG pipelines.
DE An open-source framework for red-teaming large language models and LLM systems, focused on security and robustness evaluation.
DI A tool for automated data quality evaluation that combines rule-based and model-based assessments.
EA An easy-to-use knowledge editing framework providing multiple editing methods, evaluation metrics and datasets; supports LLMs and some multimodal editing scenarios.
EV An open-source framework for evaluating, testing, and monitoring ML and LLM systems from experiments to production.
FU A static analysis tool that assesses codebase 'legacy-mess' and generates readable Markdown reports.
GI An open-source evaluation and testing framework to detect performance, bias, and security issues in AI systems.
HE Holistic Evaluation of Language Models (HELM) from Stanford CRFM: an open framework for reproducible, transparent model evaluation and benchmark management.
IN Inspector is a visual testing tool for MCP (Model Context Protocol) servers that helps developers validate and visualize server behavior and responses.
KE A developer-centric API and integration testing tool that auto-generates tests and data mocks from real traffic, supporting record-and-replay of API calls, database operations, and streaming events.
LI A lightweight toolkit from Hugging Face for fast, flexible LLM evaluation across multiple backends.
LI LiveBench is a contamination-aware, objective LLM benchmark suite that provides reproducible question sets, automatic scoring, and an online leaderboard.
LM The Language Model Evaluation Harness is a framework for large-scale, reproducible evaluation of generative language models across many tasks and datasets.
OP A one-stop platform for evaluating large models, providing benchmarks, evaluation toolkits and leaderboards to reproduce and compare model capabilities.
OP OpenLIT is an open-source platform for AI engineering that provides LLM observability, prompt management, evaluations and guardrails.
OP Opik is an open-source LLM evaluation and observability platform that helps teams build, evaluate and optimize LLM applications.
PE Petri is an alignment auditing agent designed to quickly explore alignment hypotheses and help researchers automate evaluation workflows.
PR Promptfoo is a developer-first, local LLM testing and red-teaming tool for automated evaluations, vulnerability scanning, and CI integration.
RA Ragas is an open-source toolkit for evaluating and optimizing LLM applications, offering objective metrics, test data generation, and production feedback loops.
RE ReLE (chinese-llm-benchmark) is a continuously updated Chinese LLM evaluation and leaderboard project covering education, medical, finance, legal, reasoning and other capability dimensions.
SH Generates interactive visual reports to explain machine learning model predictions for stakeholders.
Tracing, logging, and runtime observability.
Helicone is an open-source LLM observability and analytics platform that captures requests, traces and sessions to help developers debug, evaluate and optimize model usage.
LA Discover Langfuse, the open-source platform for LLM development, enhancing collaboration, monitoring, and debugging for AI applications.
OP An OpenTelemetry-inspired observability toolkit for LLM/AI, providing request tracing and metrics aggregation for diagnostics and monitoring.
PH Phoenix is a high-performance web framework built with Elixir, optimized for realtime, distributed, and scalable web applications.
PO Polyaxon is an MLOps platform for managing, training and monitoring large-scale machine learning workloads.
Prompt management, versioning, and quality tooling.
CO A tool that converts codebases into a single LLM prompt for code analysis, generation, and automation workflows.
GU GuideLLM offers tooling for guiding, interpreting, and controlling large language models (LLMs), enabling better controllability in interactive applications.
PR An open-source AI-powered code review and PR assistant that runs locally, in CI, or self-hosted; supports multi-platform integrations and customizable prompts.
Safety filters, guardrails, and risk controls.
AN 754 structured cybersecurity skills for AI agents mapped to 5 frameworks including MITRE ATT&CK, NIST CSF 2.0, and MITRE ATLAS.
MC A tool to scan MCP servers and tools for potential security issues, using multi-engine analysis and customizable reporting.
SK Security scanner for AI agent skills by NVIDIA that detects vulnerabilities, malicious patterns, and security risks across 64 vulnerability patterns in 16 categories.
Compiler optimization, autotuning, and simulation.
NE A GPU-accelerated physics simulation engine built on NVIDIA Warp, targeting robotics and simulation research.
No projects match the current filters.