The vocabulary of the AI-native data layer for the real-world economy — from the SEC filings we resolve to the agent protocols that consume them. 24 terms, each defined plainly and linked to where the data is live.
The filings and identifiers behind the data we resolve — the trapped, public-but-messy market data.
A 13F filing is a quarterly disclosure that institutional investment managers with over $100M in U.S.-listed securities must file with the SEC, listing their long equity positions within 45 days of each quarter end.
AUM (assets under management) is the total market value of the investments a firm manages on behalf of its clients. It is a core measure of a manager's size and, for many filings, a regulatory threshold.
A CIK (Central Index Key) is the unique number the SEC assigns to every filer in EDGAR — each company, fund, or individual — used to retrieve all of that filer's submissions.
A CUSIP is a nine-character alphanumeric code that uniquely identifies a North American security (a specific stock or bond issue). The first six characters identify the issuer, the next two the issue, and the last is a check digit.
EDGAR (Electronic Data Gathering, Analysis, and Retrieval) is the SEC's public system through which U.S. public companies and investment managers file disclosures — including 13F, Form 4, 13D/G, and N-PORT — all freely available.
Form 4 is an SEC filing in which a company's insiders — directors, officers, and 10%+ owners — report changes in their holdings of the company's stock, due within two business days of the transaction.
Form N-PORT is the monthly portfolio-holdings report that U.S. registered funds — mutual funds and ETFs — file with the SEC. Each quarter's third month is later released publicly, giving a detailed look inside fund portfolios.
Legal insider trading is the buying or selling of a company's stock by its own directors, officers, and 10%+ owners, openly disclosed on SEC Form 4 — distinct from illegal trading on material non-public information.
Point-in-time data captures a value exactly as it was known on a given historical date, with no later restatements applied — essential for honest backtesting because it prevents look-ahead bias.
Schedule 13D is the SEC filing an investor must submit after acquiring more than 5% of a public company's voting shares with intent to influence or control it — disclosing the stake, intent, and funding, within five business days.
Schedule 13G is a short-form beneficial-ownership report for investors who cross 5% of a company without intent to influence control — used by index funds and other passive holders, with lighter deadlines than a 13D.
An accession number is the unique identifier EDGAR assigns to each individual submission — formatted like 0001193125-24-000123 — letting you cite and retrieve one exact filing.
How that data is consumed: agents, the Model Context Protocol, and the grounding that keeps answers honest.
An AI agent is a system built around a language model that plans and takes actions through tools in a loop — observing results and deciding the next step — to accomplish a goal, rather than producing a single reply.
A bearer token (commonly an API key) is a secret credential sent in an HTTP request's Authorization header — as `Authorization: Bearer <key>` — to authenticate the caller and authorize access to an API or MCP server.
Data provenance is the recorded origin and lineage of a datapoint — where it came from, when it was captured, and how it was transformed — so any value can be traced back to its primary source and verified.
Entity resolution is the process of determining when different records refer to the same real-world entity — for example, linking a fund's many name variants, legal entities, and CIKs to one identity — so data can be joined reliably.
Grounding is constraining a language model's output to verifiable source data — retrieved documents, tool results, or cited records — rather than its internal memory, so claims can be traced back and checked.
A hallucination is when an AI model produces fluent, confident output that is factually wrong or unsupported by any real source — a particular danger for financial data, where an invented number looks just as authoritative as a real one.
An MCP server is a program that exposes tools, resources, and prompts over the Model Context Protocol, so that an AI client like Claude can discover and call them — for example, to fetch live market or filing data.
The Model Context Protocol (MCP) is an open standard, introduced by Anthropic in late 2024, that lets AI applications connect to external tools and data through one uniform client–server interface — so any compatible model can use any MCP server.
A REST API is a conventional web interface where a client makes HTTP requests (GET, POST, …) to resource URLs and gets back data, usually JSON. It is the standard way software talks to a service — and the complement to an MCP server.
Retrieval-augmented generation (RAG) is a technique where an AI system fetches relevant documents or data at query time and places them in the model's context, so its answer is grounded in real sources rather than parametric memory.
Structured output is forcing a language model to return data that conforms to a defined schema — typically JSON with specified fields and types — so the result can be parsed and used by software reliably instead of as free text.
Tool calling (also "function calling") is when a language model, instead of answering directly, emits a structured request to invoke a named tool with arguments; the host runs it and feeds the result back to the model.
Each term links to where the data is live. Programmatic access via the Arkolith API & MCP server — one key across SEC filings, insider activity, and market data, every datapoint sourced to its origin.