Functional Mapping of the Safety Tooling Landscape
A living map of the tools that keep platforms safer.
Existing frameworks for safety tooling lack the granularity required to map a rapidly diversifying ecosystem of internal and user-facing interventions. This workstream develops a comprehensive map, categorizing tools by functional utility, impact on user experience, and role in the safety lifecycle, from preventative design to reactive enforcement — and publishes it as a living, browsable resource.
A living map of the safety stack.
A growing catalogue of trust & safety tools — open-source and commercial, internal and user-facing. Six neighbourhoods on the map group the tools by what they do; switch to the card view to search or filter by topic.
The neighbourhood names are an evocative consolidation of the committee's preliminary topic buckets. A formal mapping by functional utility, user impact, and place in the safety lifecycle is being developed in parallel.
Click + drag to pan · scroll to zoom · click a pin to open the tool
-
Alice
Alice (FKA ActiveFence)No description provided yet.
- AI for Safety
-
Altitude
JigsawWeb UI and hash matching for violent extremism and terrorism content
- Hash matching
-
Checkstep
CheckstepNo description provided yet.
- Review
-
No description provided yet.
- Content moderation
-
Community Sift
MicrosoftCommunity Sift is an AI-powered content moderation platform that combines the best of both worlds: artificial intelligence and human expertise. It is trusted by companies and communities of all sizes to classify, filter, and escalate user-generated content in real-time. By using Community Sift, businesses can enhance online safety, improve user experiences, and focus on growth and innovation.
- Detection
- Review
-
Content Safety API
GoogleUses machine learning to detect novel CSAM, nudity, and sexually explicit content in images and videos free service, but requires registration not open source itself, but can be used via Coop (https://roostorg.github.io/coop/SIGNALS.html#content-safety-api-by-google), which is open source
- Classification
-
Coop
ROOSTScaled review tool
- Review
-
CoPE
Zentropismall language model trained for accurate, fast, steerable content classification based on developer-defined content policies
- Classification
-
Detoxify
Unitary AIdetects and mitigates generalized toxic language (including hate speech, harassment, bullying) in text
- Classification
-
gpt-oss-safeguard
OpenAIopen-weight reasoning model to classify text content based on provided safety policies
- Classification
-
Granite Guardian
IBM Researchan input-output guardrail for detecting harms in a variety of use cases (general harm, RAG settings, agentic workflows, etc.)
- AI for Safety
-
Guardrails AI
Guardrails AIPython framework that helps build safe AI applications checking input/output for predefined risks
- AI for Safety
-
Hasher Matcher Action (HMA)
Meta / ROOSTHashing algorithm, matching function, and ability to hook into actions
- Hash matching
-
Hasher-Matcher-Actioner (CLIP demo)
Individual - Juan MradHMA extension for CLIP as reference for adding other format extensions
- Hash matching
-
Hive Classifiers
HiveNo description provided yet.
- Classification
-
hma-matrix
Matrix.org FoundationMatrix-specific extensions to HMA for (primarily) the Matrix ecosystem
- Hash matching
-
Implio by Besedo
BesedoA moderation tool with: AI Automation: Advanced machine learning models trained on billions of content items. Our AI understands nuance, context, patterns, and makes real-time decisions at scale. Rule-Based Filters: Simple, configurable filters that catch the obvious violations quickly and reliably. Perfect for spam, banned keywords, or clear-cut policies. Human Expertise: Multilingual and compliance-trained moderators who step in when context, culture, or judgment is required. They resolve edge cases and continuously retrain the AI to be smarter every day.
- Review
- Enforcement
-
Kanana Safeguard
Kakaoharmful content detection model based on Kanana 8B
- AI for Safety
-
Lasso Moderation
Lasso ModerationA content moderation solution that's not just an API. Lasso brings the power of AI to protect your brand, tackling 99% of content moderation tasks. Our platform also offers an extensive moderation dashboard for that crucial 1%, where humans can efficiently and effectively moderate at scale.
- Review
- Enforcement
-
Lattice Extract
AdobeGrid and lattice detection to guard against FP in hash matching
- Hash matching
-
Llama Guard
MetaAI-powered content moderation model to detect harm in text-based interactions
- AI for Safety
-
Llama Prompt Guard 2
MetaDetects prompt injection and jailbreaking attacks in LLM inputs
- AI for Safety
-
MediaModeration (Wiki Extension)
WikimediaCSAM hash matching for Wikimedia
- Hash matching
-
No description provided yet.
- Classification
-
No description provided yet.
- Automated T&S
-
Nima by Tremau
TremauNima is the AI-driven Trust & Safety platform to protect users with efficient automated and human moderation. With one-single API, AI marketplace, and policy-centric approach. It centers compliance tracking/reporting as a core value proposition.
-
NSFW filtering
Individual - Navendu Pottekkatbrowser extension to block explicit images from online platforms; user facing
- Classification
-
NSFW Keras Model
Individual - Gant Labordeconvoluted neural network (CNN) based explicit image ML model
- Classification
-
OSmod
Jigsawtoolkit of machine learning (ML) tools, models, and APIs that platforms can use to moderate content
- Classification
-
Osprey
ROOSTRules engine and investigation UI
- Investigation
-
PDQ
MetaPerceptual hash algorithm for images
- Hash matching
-
Perception
ThornProvides a common wrapper around existing, popular perceptual hashes (such as those implemented by ImageHash)
- Hash matching
-
Perspective API
Jigsawmachine learning-powered tool that helps platforms detect and assess the toxicity of online conversations
- Classification
-
Private Detector
Bumblepretrained model for detecting lewd images
- Classification
-
Purple Llama
Metaset of tools to assess and improve LLM security. Includes Llama Guard, CyberSec Eval, and Code Shield
- AI for Safety
-
No description provided yet.
- Investigation
-
ReTool
RetoolNo description provided yet.
- Review
-
Risk Atlas Nexus
IBM Researchknowledge-graph toolkit that maps AI risk taxonomies (IBM AI Risk Atlas, IBM Granite Guardian MIT AI Risk Repository, NIST AI RMF GenAI Profile, AIR 2024, AILuminate Benchmark, Credo Unified Control Framework, OWASP Top 10 for LLM Apps) to evaluations, mitigations and controls, supporting the generation of structured governance workflows
- AI for Safety
-
Roblox Guard 1.0
RobloxLLM that helps safeguard unlimited text generation on Roblox
- AI for Safety
-
machine learning model that detects and moderates harmful content in real-time voice chat on Roblox; focuses on spoken language detection
- Classification
-
RocketChat CSAM
Center for Online Safety and LibertyCSAM hash matching for RocketChat
- Hash matching
-
Safer by Thorn
ThornNo description provided yet.
- Classification
- Review
-
No description provided yet.
- Classification
-
Sentinel
RobloxPython library designed specifically for realtime detection of extremely rare classes of text by using contrastive learning principles
- Classification
-
ShieldGemma
Google DeepMindAI safety toolkit by Google DeepMind designed to help detect and mitigate harmful or unsafe outputs in LLM applications
- AI for Safety
-
TMK
MetaVisual similarity match for videos
- Hash matching
-
Toxic Prompt RoBERTa
IntelBERT-based model for detecting toxic content in prompts to language models
- Classification
-
No description provided yet.
- Automated T&S
-
TrustedExecBench
OpenGuardrailsSecurity Gateway providing a transparent reverse proxy for OpenAI apis with integrated safety protection
- AI for Safety
-
No description provided yet.
- Automated T&S
-
No description provided yet.
- Investigation
-
VPDQ
MetaVisual similarity match for videos using PDQ algorithm
- Hash matching
No tools match your filters.
The proposed dimensions of the formal map.
As the committee firms up the map, each tool will be re-classified along three dimensions. The neighbourhoods on the map are a first cut; these are the more granular structures the committee plans to publish alongside the dataset.
Functional utility
What the tool actually does — classification, hash matching, review workflow, identity assurance, transparency reporting, and so on.
Lifecycle position
Where in the safety lifecycle the tool acts — from preventative design, through detection, into responsive enforcement, and on to restorative measures.
Impact on user experience
Whether the tool is internal (used by reviewers, engineers, or analysts) or user-facing (felt directly by people on the platform), and how visibly it shapes their experience.