Semantic Models
Semantic models are YAML files that describe your database tables as datasets with typed fields, relationships, and metrics. They follow the Open Semantic Interchange (OSI) spec and use DuckDB SQL syntax for expressions.
archmax uses the OSI YAML format as its internal storage format — every dataset, field, relationship, and metric is persisted as spec-compliant YAML on disk. When AI agents request model information through MCP tools, archmax converts the OSI YAML into a compressed markdown digest on-the-fly. This digest preserves all semantically relevant information (field types, descriptions, enums, relationships, examples) while using 3–5× fewer tokens than the equivalent YAML, making agent interactions significantly cheaper and faster.
File Structure
Section titled “File Structure”Each project stores its semantic models under <ARCHMAX_DATA_DIR>/projects/<projectId>/src/. A model consists of a root file and per-dataset files:
src/├── ecommerce.yaml # Root file: name, description, relationships, metrics└── ecommerce/ ├── orders.yaml # Dataset: orders table ├── customers.yaml # Dataset: customers table └── products.yaml # Dataset: products tableRoot File
Section titled “Root File”The root YAML file contains model-level metadata:
name: ecommercedescription: E-commerce data model for order analyticsai_context: instructions: Use this model for revenue and order analysis synonyms: - sales model - shop datarelationships: - name: order_customer from_model: orders from_columns: [customer_id] to_model: customers to_columns: [customer_id] ai_context: Links orders to the customer who placed themmetrics: - name: total_revenue description: Sum of all order amounts expression: dialects: - dialect: ANSI_SQL expression: "SUM(orders.total_amount)" ai_context: Total revenue across all ordersDataset Files
Section titled “Dataset Files”Each dataset maps to a database table or view:
name: orderssource: shopify.public.ordersprimary_key: [order_id]description: Customer orders with line items and totalsai_context: instructions: Each row is one order. Use total_amount for revenue calculations.fields: - name: order_id expression: dialects: - dialect: ANSI_SQL expression: "order_id" custom_extensions: - vendor_name: COMMON data: data_type: INTEGER - name: total_amount expression: dialects: - dialect: ANSI_SQL expression: "total_amount" custom_extensions: - vendor_name: COMMON data: data_type: DECIMAL - name: created_at expression: dialects: - dialect: ANSI_SQL expression: "created_at" dimension: is_time: true custom_extensions: - vendor_name: COMMON data: data_type: TIMESTAMPDataset Groups
Section titled “Dataset Groups”When a model contains many datasets, you can organize them into groups, which appear as visual bounding-box rectangles in the graph view. Groups are stored in the root YAML file’s custom_extensions:
custom_extensions: - vendor_name: COMMON data: '{"dataset_groups":[{"id":"grp_abc12345","name":"Order Management","datasets":["orders","order_items","customers"],"color":"sage"}]}'Managing Groups in the Graph View
Section titled “Managing Groups in the Graph View”- Right-click a dataset → “Create group” to start a new group containing that dataset
- Right-click a dataset → “Add to group” to move it into an existing group
- Right-click a dataset → “Remove from group” to ungroup it
- Right-click a group box → “Rename group” or “Delete group”
- Double-click a group label to rename it inline
The AI builder automatically creates groups when building models with 4 or more datasets, clustering by schema prefix, star-schema topology, or business domain.
Available Group Colors
Section titled “Available Group Colors”Groups use a 4-color CI palette: sage, rose, blue, purple. Colors are assigned automatically when creating groups.
AI Context
Section titled “AI Context”Every entity (model, dataset, field, relationship, metric) supports ai_context, either a plain string or a structured object with instructions, synonyms, and examples. This metadata is surfaced to AI agents through MCP tools, helping them understand what the data means and how to use it.
AI-Assisted Builder
Section titled “AI-Assisted Builder”The admin UI includes a chat-based AI agent that can build semantic models for you. Navigate to Semantic Models and start a new conversation describing what kind of model you want. The agent will:
- Schema discovery: list tables and columns from your connections
- Field mapping: create typed fields with correct SQL expressions
- Enum detection: find columns with limited distinct values
- Relationship inference: detect foreign keys and join paths
- Metric definition: suggest common aggregations
You can also create and edit models manually through the YAML files or the admin UI editor.
Publishing
Section titled “Publishing”After editing a model, click Publish to make it available to MCP clients. Publishing assembles the split source files (src/) into optimized single-file YAMLs in the build/ directory. MCP tools in production always read from the published build.
How Agents See Your Models
Section titled “How Agents See Your Models”AI agents never interact with the raw OSI YAML files. Instead, the MCP tools (get_semantic_model, get_datasets) convert the YAML into a compressed markdown digest before returning it to the agent. This conversion:
- Reduces token usage by 3–5× — the OSI YAML format includes verbose structures like
expression.dialects[].expression,custom_extensions[].vendor_name, and deeply nestedai_contextobjects. The digest flattens these into compact markdown tables and bullet lists. - Preserves all agent-relevant context — field types, descriptions, enum values, example data, synonyms, relationships, metric expressions, and query instructions are all included.
- Supports pagination — large models with many datasets or fields are paginated so agents can drill into specific sections without loading everything at once.
For example, a field that looks like this in OSI YAML:
- name: status expression: dialects: - dialect: ANSI_SQL expression: "status" description: Current order status custom_extensions: - vendor_name: COMMON data: '{"data_type":"VARCHAR","distinct_values":["pending","shipped","delivered"]}' ai_context: synonyms: - order state instructions: Filter on this field to segment by fulfillment stageBecomes a single line in the markdown digest:
- **status** `VARCHAR` {pending, shipped, delivered} — Current order status | _order state_ | Note: Filter on this field to segment by fulfillment stageThis compression is what makes it practical to give agents full context about large models without burning through token budgets.