Databricks + App Orchid: Why You Need Both

June 1, 2026

Rehan Refai, Vice President Solutions

Databricks + App Orchid: Why You Need Both

Technical leaders who've recently started their Databricks journey ask us this question regularly. Databricks is a genuinely powerful data platform. Delta Lake for storage, Unity Catalog for governance, Spark for heavy compute, an increasingly capable ML and AI runtime. It’s a great choice for teams doing serious data engineering, model training, and pipeline orchestration.

But getting accurate, multi-hop, conversational answers out of complex enterprise data is a different problem from data engineering.

Databricks gives you the raw materials. App Orchid puts it all together so AI can deliver something worth consuming.

A world-class data engineering platform is not the same thing as a world-class context layer.

The short answer is that you’re best served using Databricks as a data platform and App Orchid as the context layer. I’ll summarize why:

Databricks' Metric Views and Unity Catalog business semantics require humans to manually author all semantic definitions before AI querying can begin. App Orchid automates that construction through ontology discovery.

Next-gen Genie (April 2026) adds cross-space routing and unstructured data connectors, but connecting to documents is not the same as reasoning across them ontologically.

Databricks joined the OSI portability initiative but hasn't shipped import/export tooling yet Metric Views built today are Databricks-native with no defined exit path.

The two tools are complementary: App Orchid extends Lakehouse value to the full enterprise knowledge graph rather than replacing Databricks.

Metric Views: Still Manual

Databricks' Unity Catalog Business Semantics, built on Metric Views and surfaced through AI/BI Genie, is one of the more thoughtfully designed semantic layers shipping from a major data platform vendor today. The architecture is sensible: define metrics declaratively in SQL, govern them in Unity Catalog, expose them consistently across dashboards, notebooks, AI agents, and natural language interfaces. The "define once, use everywhere" promise is real, within the Databricks ecosystem.

The constraints surface quickly when you move from a bounded analytics domain to the multi-domain, cross-system reasoning that enterprises actually need.

Metric Views are SQL-compiled definitions: measures, dimensions, joins, materialized aggregates. They're catalog objects, governed like tables, powering Genie's natural language interface with deterministic execution rather than inferred logic. That's a genuine improvement over LLMs guessing at join paths. But the semantic scope is bounded by what you can define in a Metric View, and Metric Views are bounded by what lives in Unity Catalog.

Cross-domain reasoning requires explicit authorship. If "customer lifetime value" depends on billing data in one schema, contract data in another, and churn signals from a third, someone has to build a Metric View that explicitly joins those sources, defines the calculation, and certifies the result. Databricks provides tooling to make that easier. Genie Code can bootstrap measure and dimension suggestions, and Builder mode offers a low-code UI for authoring. But the semantic definition work still happens upstream of the tooling. Genie is not discovering your business logic. It's executing logic that humans have encoded.

For organizations with mature, well-staffed data teams operating in a consolidated Lakehouse, this is a workable model. For everyone dealing with the gap between data availability and semantic coverage, which describes most enterprises, the bottleneck isn't compute or tooling. It's the throughput of semantic definition work that has to happen before any AI-powered querying can start.

App Orchid inverts this. Automated ontology discovery reads your schema, analyzes distributions, infers entity relationships, and surfaces a candidate knowledge graph. Human reviewers validate and refine rather than construct from zero. Metrics can be captured in “derived fields” once and used everywhere. It's the difference between editing a first draft and writing cold under stakeholder pressure.

‍Next-gen Genie: Real Progress, and the Gaps That Remain

On April 26, 2026, Databricks announced the next generation of Genie and it directly addresses two limitations worth acknowledging plainly.

Cross-space reasoning: Previously, Genie Spaces were isolated by dataset configuration, and cross-domain questions that touched multiple Spaces fell into a gap. The next-gen Genie changes this: from a single chat interface, it now draws on the most relevant trusted assets across certified Genie Spaces, governed dashboards, and Databricks Apps simultaneously, using metadata routing to prioritize higher-trust sources.

Unstructured data: The previous Genie architecture was bounded to structured data only. The next-gen Genie introduces built-in connectors to Google Drive and SharePoint, plus support for MCP connections, all managed through the Unity Catalog AI Gateway. This lets Genie pull context from unstructured enterprise knowledge sources alongside structured queries.

However, Connectivity is not ontological understanding. When Genie connects to Google Drive or SharePoint via MCP, it retrieves document context to augment a response. It does not build a semantic relationship between that unstructured content and the entities, metrics, and relationships in your structured data. A document describing a maintenance policy doesn't become a first-class participant in your knowledge graph. It becomes retrievable context. That distinction matters when the question requires reasoning across domains, not just retrieving from them. "Which assets are at highest risk given recent service history and the maintenance thresholds in our contract templates" is a multi-hop inference problem, not a document retrieval problem.

The authorship burden for individual Genie Spaces is unchanged. The next-gen Genie's cross-space routing is only as good as the Spaces it routes across. Each Space still requires domain expert curation: datasets, sample queries, knowledge stores, verified metric logic. The new architecture doesn't reduce the work of building them. The bottleneck remains the upstream semantic definition work, not the routing layer above it.

MCP connections are additive integrations, not governed semantic assets. External connections through MCP surface into Genie via the Unity Catalog AI Gateway, which provides governance over the connection itself. But the external systems remain external. They're not registered in Unity Catalog, don't participate in lineage, and don't carry the same certification and trust signals as governed Metric Views. The governance boundary is real, even as the connectivity boundary moves.

The honest assessment: next-gen Genie is a substantial step forward for Databricks' enterprise reasoning story. It closes the cross-space gap and starts to address unstructured context. What it doesn't do is replace a semantic context graph, a structure that encodes relationships between entities and concepts, not just retrieval access to them. That gap shows up most clearly on questions that require multi-hop reasoning across structured and unstructured domains at the same time.

Unity Catalog: Excellent Governance, Bounded Reach

Unity Catalog is one of the more mature unified governance frameworks in the market: consistent RBAC across clouds and workspaces, attribute-based access control at the row and column level, automated PII classification, end-to-end lineage across pipelines and dashboards, and Delta Sharing for governed cross-organizational data exchange. These are real enterprise capabilities, and Databricks has invested heavily in making Unity Catalog the authoritative governance layer for the Lakehouse.

But Unity Catalog governs assets that live in Databricks. Its reach is defined by what's been registered, whether as managed tables, external tables, foreign tables via Lakehouse Federation, or shared data via Delta Sharing. Data that isn't registered sits outside its governance, outside its lineage, and outside what Genie can query.

Lakehouse Federation extends the reach, but with tradeoffs. Federation lets you register external data sources, Snowflake, SQL Server, PostgreSQL, BigQuery, MySQL, as foreign catalogs in Unity Catalog. Genie and Metric Views can then reference this external data. But the semantic layer you build on top of federated sources requires the same manual Metric View authorship as anything else. Federation gives you connectivity. It doesn't give you semantic intelligence about the external source.

The deeper issue is that Unity Catalog's governance model is Lakehouse-centric by design. Its lineage, quality monitoring, classification, and discovery features are deepest on data that lives in Delta Lake. Federated sources participate partially. Fully external systems, operational SaaS platforms, ERP backends, legacy data stores that will never be registered, sit outside the model entirely.

App Orchid's semantic knowledge graph is federated by architectural design, not as an extension. The ontology virtualizes intelligence across sources where data lives today, without requiring registration, movement, or schema translation into a central catalog. When the answer requires data from a system that isn't in Databricks, and for most enterprises some of it always will, App Orchid reaches it natively.

OSI: A Coalition Databricks Joined, Not One it Leads

In September 2025, Snowflake convened the Open Semantic Interchange (OSI) initiative to create a vendor-neutral, open-source specification for semantic metadata. Databricks joined the coalition alongside Salesforce, dbt Labs, BlackRock, and others. The v1.0 specification shipped January 27, 2026 under Apache 2.0. Interoperability and elimination of vendor lock-in are listed as core goals.

Databricks' own language around Business Semantics echoes this: "One of the key goals with Unity Catalog Business Semantics is to ensure customers can define business meaning in a way that is open, portable, and designed to work across their existing ecosystem, without lock-in."

The intent is right. But today's implementation reality is that Metric Views are Unity Catalog objects, defined in Databricks SQL, governed through Databricks RBAC, and consumed by Databricks surfaces: Genie, AI/BI Dashboards, Lakeflow, Databricks One. No vendor, including Databricks, has shipped native OSI import or export tooling yet.

The coalition is genuinely the right direction. Your semantic investment should compound over time. If it's tied to a platform's schema objects, the portability question becomes relevant the first time your architecture evolves.

App Orchid's ontology is designed around OSI-aligned constructs from the start: entities, relationships, metrics, context. We're building OSI import/export as a first-order commitment as the spec matures. Your business semantics travel with you.

The Right Architecture

Databricks versus App Orchid is the wrong framing. These are not competing platforms for the same problem.

NLQ +
Visualizations

Quick
insights

Full semantic
Layer

AI/ML Dev
Integration

–

Any Data
Lake

Vs.

No NLQ +
visualizations

No quick insights

Unity Catalog(limited Semantics)

AI/ML Dev
Tools

Data Engineering
Tools

Data Lakehouse

‍

Databricks is an exceptional platform for data engineering, ML, and increasingly for governed BI. The next-gen Genie closes real gaps: cross-space routing, unstructured connectors, a unified business user experience. But what remains is the distinction between a retrieval-augmented analytics interface and a semantic knowledge graph. Genie retrieves and routes across curated assets. App Orchid encodes the relationships between those assets.

App Orchid sits above the data platform, automating ontology construction, federating natively across sources, enabling multi-hop inference across structured and unstructured domains, and building semantic portability in from day one. On top of Databricks, it extends the Lakehouse's value to the full enterprise knowledge graph. Without Databricks in the stack, it stands on its own.

App Orchid is not a replacement for Databricks. For organizations that have invested in the Lakehouse, it's the context layer that extends that investment across structured and unstructured data, registered and non-registered sources, and the full reasoning surface that modern enterprise AI requires.

App Orchid delivers automated ontology discovery, federated AI reasoning across structured and unstructured data, and enterprise-grade semantic intelligence, on top of Databricks or any data architecture. Request a demo.

‍