Data Federation
archmax uses DuckDB as an in-process query engine that can federate queries across multiple database connections. Each project gets its own DuckDB instance with connections attached as named schemas.
How It Works
Section titled “How It Works”When you add a connection to a project and mark it as active, archmax attaches it to the project’s DuckDB instance using DuckDB’s extension system:
| Database Type | DuckDB Extension | Registry |
|---|---|---|
| PostgreSQL | postgres | Core |
| MySQL | mysql | Core |
| SQL Server | mssql | Community |
| SQLite | sqlite | Core |
| DuckDB | Native | - |
| Iceberg REST Catalog | iceberg + httpfs | Core |
Each connection’s slug becomes the DuckDB schema alias. For example, a connection with slug shopify attached from a Postgres database makes its tables available as shopify.<schema>.<table>.
SQL Server (MSSQL) Connections
Section titled “SQL Server (MSSQL) Connections”The SQL Server connector uses the DuckDB MSSQL community extension, which is installed automatically from the community registry.
When using Connection Details (host/port fields), archmax builds an ADO.NET connection string:
Server=host,port;Database=db;User Id=user;Password=pass;Encrypt=yesWhen using Connection URI, you can pass any format the extension accepts (ADO.NET or URI) directly.
The Encrypt connection (TLS) toggle controls the Encrypt parameter and defaults to on. Disable it only for local development servers that don’t support TLS.
Iceberg REST Catalog Connections
Section titled “Iceberg REST Catalog Connections”Iceberg connections let you query data in Apache Iceberg tables exposed through an Iceberg REST Catalog (Lakekeeper, Polaris, Cloudflare R2, etc.).
Unlike RDBMS connections, Iceberg uses a two-step attach: archmax creates a DuckDB secret with the bearer token, then attaches the catalog with the REST endpoint.
Required fields:
| Field | Description |
|---|---|
| Catalog Endpoint | URL of the Iceberg REST Catalog (e.g. https://catalog.example.com) |
| Warehouse | Warehouse identifier registered in the catalog |
| Bearer Token | Authentication token for the catalog |
Once attached, Iceberg namespaces and tables are available as <slug>.<namespace>.<table> and can be joined with any other connection type.
Limitations:
- Read-only access (no INSERT/UPDATE/DELETE)
- Bearer token auth only (OAuth2 support planned)
- Backed by S3 or GCS object storage (other backends not yet supported by DuckDB)
Cross-Connection Queries
Section titled “Cross-Connection Queries”Once multiple connections are attached, you can query across them in a single SQL statement:
SELECT s.order_id, s.total_amount, d.invoice_numberFROM shopify.public.orders sJOIN datev.dbo.invoices d ON s.order_id = d.external_order_idWHERE s.created_at > '2024-01-01'This is used internally by:
- The Data Browser for exploring connection schemas
- The Semantic Model Agent for discovering and mapping tables
- The MCP
execute_querytool for AI agent queries (through scoped VIEWs)
Scoped VIEWs
Section titled “Scoped VIEWs”When AI agents use execute_query, they don’t access raw connection schemas directly. Instead, each semantic model gets a set of VIEWs that expose only the fields defined in the model, acting as a curated and safe access layer.
Internally these VIEWs live in a _scope_<modelName> DuckDB schema, but agents never need to know this. The search_path is set automatically per query so agents write bare dataset names (e.g. FROM "orders") and the correct VIEW resolves transparently.
Connection Lifecycle
Section titled “Connection Lifecycle”- Active connections are attached to DuckDB and available for querying
- Inactive connections are detached from DuckDB but their configuration is preserved
- Deleted connections are soft-deleted and fully detached
DuckDB instances are created lazily per project and cached in memory for the lifetime of the server process.