Open Knowledge Format (OKF)

Google's open spec for storing organizational knowledge as markdown plus YAML frontmatter, so AI agents and humans read the same files.

Open Knowledge Format (OKF) is an open specification from Google Cloud for representing the metadata, context, and curated knowledge that AI agents need, as a directory of plain markdown files with YAML frontmatter. No SDK, no query language, no runtime: an engineer can cat a concept and an LLM can read the same file verbatim. Announced 12th June 2026 by the Google Cloud Data Cloud team (Sam McVeety and Amir Hormati), it builds on Andrej Karpathy's "LLM wiki" idea of letting models manage a markdown wiki. The pitch: stop every agent builder and catalog vendor reinventing how context is stored.

The problem it solves

Organizational knowledge that agents need (table schemas, metric definitions, join paths, runbooks, deprecation notices) is scattered across incompatible systems: metadata catalogs with proprietary APIs, wikis, shared drives, code comments, and people's heads. Each agent builder reassembles that context from scratch, and each catalog vendor invents its own data model, so knowledge stays locked to the tool that created it. OKF makes the knowledge portable: it survives moving between systems, tools, and organizations because it is just files.

How it works

A bundle is a directory hierarchy, one markdown file per concept:

sales/
  index.md
  tables/
    orders.md
    customers.md
  metrics/
    weekly_active_users.md

Each file carries YAML frontmatter, then free markdown for the body (schema tables, joins, notes). Normal markdown links between files turn the directory into a graph of relationships, richer than parent/child folders.

---
type: BigQuery Table
title: Orders
description: One row per completed customer order.
resource: https://console.cloud.google.com/bigquery?p=acme&d=sales&t=orders
tags: [sales, revenue]
timestamp: 2026-05-28T14:30:00Z
---
# Schema
| Column | Type | Description |
|--------|------|-------------|
| order_id | STRING | Globally unique order identifier. |
| customer_id | STRING | FK to customers. |

Conventions worth knowing:

Element Role
type The only mandatory frontmatter field
title, description, resource, tags, timestamp Standard optional fields
index.md Progressive disclosure as an agent walks the hierarchy
log.md Chronological change history for a concept

The full v0.1 spec fits on a single page.

Design principles

  • Minimally opinionated. Only type is required; the producer owns the content model.
  • Producer and consumer are independent. The format is the contract, so the tool that writes a bundle and the agent that reads it are swappable.
  • Format, not platform. Vendor-neutral. Not tied to a cloud, database, model provider, or agent framework, and it never requires a proprietary account or SDK.
  • Lives in version control. Bundles sit in git next to the code they describe and grow like code.

BigQuery and agents

Google shipped reference implementations rather than just a spec:

  • A producer: an enrichment agent that walks a BigQuery dataset, drafts an OKF document per table and view, then runs a second LLM pass to add citations, schemas, and join paths.
  • A consumer: a static HTML visualizer that turns any bundle into an interactive graph in one self-contained file, no backend, no data leaving the page.
  • Three ready-to-browse sample bundles (GA4 e-commerce, Stack Overflow, Bitcoin public datasets).
  • Google's Knowledge Catalog (formerly Dataplex) now ingests OKF and serves it to agents.

OKF v0.1 is positioned as a starting point, versioned for backward-compatible growth, open to outside implementations and contributions.

Why it matters for me

This is the pattern I already live. My notes site is markdown files with YAML frontmatter, authored and maintained partly by agents, kept in git. OKF formalizes exactly that substrate for agent-readable knowledge, which makes it a clean target if I ever want my own KBs (sales intelligence, product docs, client context) consumed by agents without a translation layer. The "same file is human-readable and agent-parseable" property is the whole point, and it is cheap to adopt because it is just files and conventions. Worth watching as the spec matures and other vendors adopt it. Related: Karpathy Claude Code Skills.

Further Reading

The announcement:

The spec and reference code: