Owning the Dark Matter of AI: Investing in Contextual, Real-World Data

Posted by Amish Patel, Conduit VC | “the next compounding advantage in AI will be built on sensory, spatial, and domain-rich data streams that “see, decide, and act.”

Core Thesis

As open web text plateaus, the next compounding edge in AI will come from contextual, domain-specific, multimodal data tied to real-world sensing and action. The investable opportunity is in the capture, infrastructure, rights, and applications that turn streams from cameras, sensors, and machines into decisions and execution on factory floors, in hospitals, warehouses, farms, and defense.

</aside>

Why now (2025-2030)

Screenshot 2025-09-03 at 9.53.19 AM.png

Researchers have cautioned for years that the stock of human-written, high-quality web text is finite. Recent analyses project that "high-quality" language data could be exhausted around 2026, pressuring model builders to seek new sources and modalities.

At the same time, the world is getting flooded with sensor data. McKinsey forecasts the potential economic value that the IoT could unlock is large and growing. By 2030, we estimate that it could enable $5.5 trillion to $12.6 trillion in value globally, including the value captured by consumers and customers of IoT products and services.

Fei-Fei Li put it simply: "Images dominate our lives, but they are the dark matter of our digital universe." That "dark matter" is increasingly becoming structured, labeled, and actionable through advances in embodied and spatial AI.

Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data

IoT value set to accelerate through 2030: Where and how to capture it

Fei-Fei Li: How do we teach computers to understand the visual world?

Physical World Data: From Language to World models

The field is converging on architectures that build internal world models—systems that predict how the world will change under different actions, and plan accordingly. As Yann LeCun writes, "The world model module predicts possible future world states as a function of imagined action sequences proposed by the actor." This is not chat; it's perception → prediction → control.

The through-line for investors: value is shifting to data with causal structure (physics, geometry, constraints) and the interfaces to act (robots, tools, enterprise workflows).

Two near-term realities (2025–2027)

1. The web-text well is running low; the data hunt goes physical.

As large players squeeze what remains of open text, they will intensify acquisition of contextual data: in-product telemetry, code interactions, and crucially, real-world sensory streams. Expect more licensing deals, vertical partnerships, and M&A for proprietary datasets and capture infrastructure, particularly in video, audio, telematics, industrial logs, and egocentric wearables.

2. The richest deposits are domain-specific, naturally multimodal, and tied to action (also ready-to-mine within the worlds leading physical industries)

In manufacturing, energy, logistics, healthcare, defense, ag, and the built world, data is spatial + temporal (vision, depth, inertial, RF) and semantic (workflow, asset, and regulatory context). This is where Vision/Language/Action systems can translate seeing into doing—planning picks in a warehouse, replanning power dispatch after a fault, triaging patients from video, or tasking mobile robots under safety and ISO constraints.

Conduits Perspective

Proprietary, context-rich, real-world data—what we call spatial intelligence: sensory + spatial data fused with business logic is the next compounding asset for AI. It is harder to scrape, requires on-site integration, carries governance obligations, and creates durable moats through rights, pipelines, and performance.

This is where we are focusing. This is where we are building. This is where we will invest.

</aside>

What this means for investors (practical checklist)

Data Advantage, Not Model Vanity. Underwrite companies for data access, refresh rates, and exclusivity - focus on those that can create novel and proprietary foundation models for an industry, job, task or analysis.

</aside>

Bet on Edge Embodiment & Rich Sensory Data (Video, Sensory, etc). Prioritize platforms that transform vision + telemetry into decisions and actions (robotics, inspection, autonomy, predictive control). Track RT-2/RT-X-style generalization metrics across unseen tasks and robot types.

</aside>