Blog / March 19, 2026
When the System Can't See the Patient: AI, Healthcare, and the Governance Gap That Actually Matters

What Kadıoğlu's team discovered at Fidelity Investments, when they deployed their modular AI framework across twelve open-source libraries, wasn't primarily a technical finding — it was an organizational one, and it has implications that extend well beyond financial services. Trust in that system didn't emerge from the sophistication of individual components, most of which were standard open-source tools, but from the transparency of how they connected to each other. When a recommendation engine passed a decision to a risk model, the chain of reasoning stayed visible at every handoff, and that visibility became the substrate for something more durable than accuracy: the capacity for the people using the system to inspect, debate, and contest its logic rather than simply receiving its outputs. The framework worked because every decision remained reversible and debatable, which is a different design philosophy than most enterprise AI operates on, and a more honest one.
The relevance of this to healthcare becomes immediate when you consider what Da Silva's work on AI governance in clinical settings actually documents. Nurses already spend twenty-five percent of their time on administrative tasks — not because the work is complex in ways that require human judgment, but because the systems surrounding care delivery have accumulated inefficiencies that nobody has successfully rationalized — and introducing AI tools that operate as black boxes compounds the problem by adding another layer of opacity to an environment that is already difficult to navigate. The gap Da Silva identifies between AI development and real-world deployment isn't primarily a usability problem, though usability is part of it; it's a trust problem that develops in real time when healthcare professionals can't follow the reasoning behind a system's recommendations and therefore can't evaluate when to rely on them and when to override them. Herremans' finding that 34% of AI projects fail outright is, in healthcare contexts, not an abstraction — it is a description of what happens when governance structures built for the development environment encounter the actual conditions of care delivery and prove inadequate to them.
Chen's research on learning dynamics in online gaming environments adds something to this picture that isn't immediately obvious but becomes important once you see it. Players in the environments Chen studied developed skills both through their own experience and through observing others, and their engagement with the system was sustained not by perfect calibration of difficulty but by visibility into how the system evaluated their capabilities and matched them with opponents — the legibility of the system's logic, in other words, rather than the elegance of its outputs. The parallel to clinical settings is direct: what clinicians need from AI is not just a recommendation but enough transparency into how that recommendation was generated — what data was weighted most heavily, where the model's confidence is strong and where it wavers — to exercise informed judgment about when to follow it and when the particulars of the patient in front of them require something the model didn't account for.
What Human-Centered Actually Means — and Why Most AI Gets It Wrong
Human-centered AI has become a phrase that organizations deploy so freely it has nearly ceased to mean anything, which is a problem not just semantically but practically, because the gap between what the phrase claims and what most AI systems actually do is where a significant portion of the 34% failure rate Herremans documents lives. The distinction that matters is between systems designed to be understandable as a feature layered onto technical capability and systems designed from the beginning around the premise that human judgment is not a limitation to be worked around but the thing the system exists to augment. These produce different architectures, different governance structures, and different relationships between the people using the system and the system itself.
Chen's research is useful here precisely because gaming environments make visible a dynamic that is present everywhere but easier to study where the stakes are lower. The 23% increase in engagement she documented didn't come from better algorithm performance; it came from players sensing they were collaborating with something that respected their agency — that the system's logic was legible enough to be worked with, challenged, and adapted to in real time. What the research identifies as the alternative is something like learned helplessness: users who become dependent on recommendations they cannot evaluate, operating inside decision processes they cannot meaningfully participate in, which is a form of efficiency that produces its own inefficiencies as the humans in the system gradually lose the capacity to exercise the judgment the system was ostensibly designed to support.
Da Silva and her colleagues document this dynamic in healthcare with uncomfortable specificity. AI tools that performed impressively in controlled environments with clean datasets and defined parameters consistently failed when nurses tried to use them during twelve-hour shifts with incomplete information, competing priorities, and the kind of contextual complexity that clinical judgment navigates constantly but that training data rarely captures. The gap isn't between the AI's potential and its current capability — it's between the conditions the system was designed for and the conditions it was deployed into, which are genuinely different, and the difference is not incidental. Sigfrids and colleagues make the structural critique explicit: the concept of human-centered AI has been narrowed, in most implementations, to individual user experience when what it needs to encompass is community impact, systemic consequences, and the ways AI systems inevitably reshape power dynamics between institutions and the people those institutions are meant to serve.
What Kadıoğlu's modular framework achieved at Fidelity is, in this light, a model for something more than technical architecture. When different teams could inspect, debate, and modify components without requiring permission from a central authority, the system was honoring the distributed intelligence of the organization itself — treating the expertise of the people working with the system as a resource to be drawn on rather than a variable to be managed. That orientation, which is partly cultural and partly structural, is what human-centered AI actually requires, and it is what most implementations that use the phrase are not providing.
Governance That Learns: What Actually Works When Algorithms Meet Organizational Reality
The problem with most AI governance frameworks isn't that they're wrong about what matters — transparency, accountability, alignment with human values — it's that they're built as static structures trying to govern dynamic systems, and the mismatch between those two things produces the predictable failures that Herremans and Da Silva both document from different directions. Herremans finds that 34% of AI projects fail from governance structures that couldn't adapt when real-world conditions diverged from design assumptions. Da Silva finds that regulatory frameworks built for traditional medical devices encounter fundamental difficulties when applied to AI systems that learn and evolve after deployment, creating compliance requirements that the technology's behavior doesn't stay still long enough to satisfy.
What Kadıoğlu's approach at Fidelity demonstrates is that governance becomes tractable when it's made granular rather than comprehensive — when instead of approving or rejecting AI systems as wholes, organizations create structures that can evaluate individual components, trace decision pathways through specific modules, and update policies at the level where change actually happens. When a recommendation engine's handoff to a risk model becomes a discrete governance event with its own accountability structure and audit trail, the governance scales with the system's complexity rather than breaking under it, because the unit of accountability matches the unit of decision. Chen's research on gaming environments shows the same principle operating differently: players trusted algorithmic matchmaking systems not because they understood the neural network architecture but because they could understand the system's logic well enough to work with it, challenge it when its assessments seemed wrong, and rely on it when its track record had earned that reliance.
The more fundamental critique that Sigfrids and colleagues develop is that most AI governance optimizes for compliance rather than empowerment — it asks whether AI systems follow rules rather than whether they serve human flourishing, which are related questions but not the same question, and the difference matters for how governance is structured in practice. Compliance-focused governance treats humans as stakeholders to be protected from the potential harms of AI; empowerment-focused governance treats them as collaborators in an ongoing negotiation about how AI should behave as conditions change and as the system's actual performance in the world generates new information about what it's doing and what it isn't. The shift from one orientation to the other changes the entire architecture of oversight: instead of pre-approving systems and hoping they behave as expected, organizations build ongoing feedback loops where human judgment can reshape AI behavior based on real performance, and where exceptions and edge cases become learning opportunities rather than governance failures.
Community as Co-Designer: Why Consultation Isn't Enough
Most organizations treat community engagement in AI development as a validation exercise — something conducted after the fundamental design decisions have been made, to surface objections that can be addressed before deployment and to produce the documentation that stakeholder input occurred. What Sigfrids and colleagues found when they examined human-centered AI governance across different organizational contexts is that this approach consistently produces systems that reflect the values and priorities of their designers more than the values and priorities of the communities they're meant to serve, because the communities were brought in too late in the process to influence the choices that actually determine what the system does and whose interests it prioritizes.
Da Silva's healthcare research makes this concrete. When hospitals deployed AI diagnostic tools without involving nurses meaningfully in the design process, the systems consistently optimized for metrics that looked impressive in evaluation frameworks but created workflow bottlenecks that reduced care quality, because the AI was trained on outcomes data that couldn't capture the tacit knowledge nurses use to prioritize patients — the observations that don't make it into the record, the contextual judgments that don't have a field in the system, the institutional memory of what a particular patient's "normal" looks like before the numbers reflect a change. The nurses weren't brought in as co-designers; they were brought in as users, and the difference determined what the system was actually capable of understanding about care delivery.
Chen's analysis of skill formation in gaming environments offers an instructive comparison. Sustained engagement in those environments required not just transparency about how systems worked but genuine agency in shaping how they evolved — players who could influence game mechanics through feedback loops stayed engaged meaningfully longer than players who received notification about updates that had already been decided. The parallel to AI governance is not subtle: communities that participate in ongoing calibration of AI systems develop different relationships with those systems, and different capacities for working with them, than communities that are consulted during development phases and then handed the finished product. Kadıoğlu's modular framework at Fidelity demonstrates what genuine interrogability looks like in practice: non-technical stakeholders could trace decision pathways from input to output, question the logic at specific steps, and propose modifications grounded in their domain expertise, which produced both better systems and more durable trust in them.
Herremans' analysis of why AI projects fail identifies "lack of investment in the right people" as a consistent factor, and the people she's describing are not primarily technical — they're people who understand the social context where AI will operate and who have the authority, not just the advisory role, to modify systems based on that understanding. The distinction between authority and advisory capacity is where most community engagement programs reveal their actual priorities: when community representatives can participate in ongoing calibration of AI systems as decision-makers rather than consultants, the governance is real; when they're brought in to validate choices already made, the engagement is a form of institutional communication rather than genuine co-design.
What Healthcare AI Actually Requires: Adaptability, Not Standardization
The distance between what Kadıoğlu's modular framework achieved at Fidelity and what meaningful healthcare transformation actually requires is not primarily technical — the open-source components, the interoperable architecture, the transparency at every handoff — those are tractable. What's harder is the organizational commitment to let communities actually reshape the systems meant to serve them, not as a design phase but as an ongoing condition of deployment, which is a different relationship to control than most institutions are structured to support.
Da Silva's dual governance framework — regulatory oversight that ensures safety alongside contextual adaptation that responds to the lived reality of care delivery — points toward what this requires without fully resolving it. The administrative burden that keeps nurses from the work that requires their judgment is not a problem that better technology solves by itself; it's a problem that requires technology designed with enough flexibility to meet different care environments where they actually are, rather than where the training data assumed they would be. And here is where Herremans' finding about the 34% failure rate becomes most illuminating: the projects failing aren't primarily failing because the algorithms don't work. They're failing because the assumption that successful implementation looks the same across contexts — that you can optimize a system for one set of conditions and then deploy it broadly — runs directly against what clinical reality actually demands.
The modular approach matters in healthcare not for the reasons it's usually invoked but because it makes genuine adaptability possible: the same recommendation engine that helps one hospital system reduce readmission rates can help another increase patient autonomy in treatment decisions, both outcomes emerging from identical technical infrastructure configured differently to reflect different priorities and different definitions of what improvement means in a particular context. Chen's research suggests something encouraging about what becomes possible when systems are actually designed for learning rather than compliance — people develop capabilities in relationship with those systems that they didn't know they possessed, which is a different outcome than automation that removes the conditions for expertise to develop at all.
What healthcare AI requires, ultimately, is a willingness to measure success not by deployment metrics but by whether new forms of care become possible that weren't possible before — whether the system creates space for the kind of human judgment, contextual awareness, and collaborative reasoning that define quality care, or whether it narrows that space in the name of efficiency. The technology becomes transformative when it stops being designed to solve predetermined problems and starts being designed to support communities in addressing problems they're still in the process of discovering, which requires releasing a degree of control over how the technology evolves that most organizations find genuinely difficult, and which is precisely why so few healthcare AI deployments have delivered on the potential the research consistently identifies.
Share this article