Safety and Alignment Leadership
AI Safety for the Real World

Stephen Lieberman
I work with organizations deploying AI at scale at the moment when models stop behaving like deterministic software and emergent, real-world risks become the primary safety challenge. That is where emergence foresight becomes a leadership requirement for AI safety in the real world.
My work focuses on emergent capabilities and emergent risks, the gap between how alignment is evaluated and how it holds under real deployment pressure. I deal with the technical, the organizational, and the human conditions that determine whether AI safety survives at scale.
Available for fractional, remote, and in-house executive roles through 1023AI.

Stephen Lieberman
AI Safety and Alignment Leadership through 1023AI
Selected organizations
The Challenge
Deploying AI at scale creates a leadership problem that evaluation frameworks cannot solve
As capable systems scale, they produce emergent capabilities that were not designed, emergent risks that were not anticipated, and interaction effects with human systems that no evaluation framework fully captures. The gap between what the model was tested on and what it actually does, in messy organizational contexts, across human-AI teams, under real-world deployment pressure, is where the most serious safety failures live.
This is why safety at scale is not just a research problem. It is a leadership problem.
Static controls, compliance checklists, and point-in-time evaluations are designed for systems with stable behavior and enumerable failure modes. They were not designed for systems whose capabilities and risks emerge at scale, shift with deployment context, or interact with human behavior in ways that are invisible during testing.
The robustness gap
A system can appear safe in theory and still fail under real deployment pressure. I use the term robustness gap to describe the distance between nominal safety and real-world resilience. In high-stakes AI deployment environments, safety claims are exposed to shifting incentives, changing contexts, adversarial pressure, organizational fragmentation, and downstream effects that do not appear in controlled settings.
This is also where AI iatrogenics becomes dangerous. In medicine, iatrogenics refers to harm caused by the treatment itself. A narrow intervention can reduce one visible risk while creating new harms elsewhere, distorting incentives, increasing brittleness, or destabilizing the broader system.
In AI systems deployed at scale, emergent misalignment is the most significant expression of the robustness gap. Alignment that appeared solid at one capability level quietly breaks down as the system becomes more capable, through a process that standard evaluation pipelines were not designed to detect. Closing this gap requires leadership that understands how emergence, nonlinearity, and sociotechnical systems behave in the real world.
The Gap
When to bring me in
AI safety and alignment become a leadership function when:
If more than one of these describes your situation, a conversation is worth having.
Approach
How I approach AI safety and alignment
My work produces two things simultaneously. Technical systems and the organizational architectures that make those systems safe under real-world pressure. They are developed together, because the technical system shapes what the human system can do, and the human system shapes what the technical system needs to be.
I approach AI deployed at scale as a systems problem, a leadership problem, and a human problem.
Complex adaptive systems
AI deployed at scale does not operate in isolation. It interacts with organizations, incentives, feedback loops, and people in ways that produce behavior no single component was designed to generate. Safety is a property of the whole sociotechnical environment, not just the model.
Emergent capabilities and risks
Both capabilities and risks emerge through scale, interaction, and deployment context, not through design alone. This includes emergent misalignment, where alignment that held at one capability level degrades quietly as the system grows more capable. Safety strategy must be designed for a moving target.
Epistemic uncertainty
Leaders deploying AI at scale make real decisions under genuine uncertainty. That uncertainty is not a gap to be closed by better evaluation. At scale, in messy organizational contexts, and across human-AI teams, uncertainty is a structural feature of the domain.
Safety at scale
The real test is whether safety survives growth, speed, strategic pressure, and social consequence. That standard cannot be met by evaluation frameworks alone. It requires leadership that can govern the whole system as it scales.
Emergence foresight
Emergence foresight is the capacity to reason about what a system might become, not just what it currently is. Governing for the capability horizon, not just the current deployment state, is what distinguishes genuine AI safety leadership from point-in-time compliance.
Emergent AI safety
Safety itself can be treated as an emergent property of the broader sociotechnical system, not a fixed specification applied to the model. It must be cultivated across technical architecture, organizational design, human-AI teaming, and institutional governance simultaneously.
About
Stephen Lieberman
Stephen Lieberman leads AI safety and alignment work at the intersection of high-stakes AI deployment, emergent complex systems, institutional governance, and human consequence. Through 1023AI, he works with organizations as capable AI systems move from controlled research environments into the messy, high-stakes realities of real-world deployment.
He focuses on emergent capabilities and emergent risks, emergent misalignment, the robustness gap between evaluated safety and deployed safety, and the institutional and human conditions under which AI safety holds or disappears at scale.
His core view is that capable AI cannot be governed as if it were ordinary software. As systems scale, they become emergent complex systems shaped not only by model architecture but by interaction effects, organizational structure, human-AI teaming dynamics, and downstream social consequence.
Organizations scaling AI deployment need more than a policy specialist, more than an ethicist, and more than a narrow technical reviewer. They need leadership that can move between model behavior, executive judgment, institutional design, and real-world consequence.
Mission-critical technical and operational leadership
More than 20 years leading technical and operational teams across government, defense, academia, nonprofit, and industry. Senior leadership on Department of Defense and Veterans Affairs programs within funding environments exceeding $100 billion ⓘ, spanning enterprise architecture, decision-support systems, security and compliance, cloud systems, and data strategy.
Defense, security, and international systems
At the Naval Postgraduate School, served as a DoD civilian program leader and Principal Investigator for programs in defense technology, modeling and simulation, and decision-support systems. Work included counterterrorism, counterinsurgency, peacekeeping operations, and international collaboration across more than 100 countries.
Recognized leadership in high-consequence environments
Led programs with multimillion-dollar budgets and worked directly with senior leaders across defense, government, and institutional settings.
Deep research foundation in complex systems
Research background spans modeling and simulation, agent-based modeling, network theory, human behavior forecasting, sociotechnical systems, cognitive neuroscience, and human-computer interaction. H-index of 7, more than 100 citations, and 8 highly influential citations (Semantic Scholar).
“...a ground-breaking tool that will benefit the U.S. government and our allies as we continue to combat terror.”
Human systems as core variables in AI safety
Most AI safety frameworks treat human systems as context rather than a core variable. Organizational dynamics, institutional incentives, and social structures determine whether safety holds or fails in deployment, and interventions that ignore these dimensions create new failure modes.
Sociotechnical and human-centered disciplinary grounding
Approach draws on sociotechnical systems theory, organizational behavior, industrial psychology, human-centered design, and macro social work: the disciplines that illuminate how people actually behave inside institutions under real pressure, and where AI governance must operate.
The Grand Challenge to Harness Technology for Social Good
Currently advancing AI safety research through the Doctor of Social Work program at the University of Southern California, supporting the Grand Challenge to Harness Technology for Social Good. The most significant gaps in real-world AI safety governance are not purely technical; they are organizational, institutional, and deeply human.
Executive leadership that is operational, not theoretical
Strategic and operational executive since 2005. President and Executive Director of a California technology nonprofit through a decade of sustained growth. CEO and C-suite roles across advisory, technology, and media.
For nearly a decade, applied the same complex adaptive systems methodology underlying defense and simulation research to proprietary quantitative trading in highly competitive derivatives markets. The approach treated market participants as agents in a sociotechnical system, with failure mode architecture designed from the ground up to identify and constrain catastrophic risks structurally before any capital was ever committed. The system ran in full simulation for three years before going live, the same discipline brought to every high-consequence domain: understand the system deeply, stress-test it rigorously, and proceed only when the failure modes are known. The result was consistent profitability across stable and volatile market regimes, including through acute systemic stress events.
That environment built a specific discipline. When the cost of being wrong is immediate, measurable, and unforgiving, decision-making under genuine uncertainty is not theoretical. Tail risk architecture, cascade detection, human-machine interaction dynamics under stress, and regime-change recognition are capacities built and validated in a domain with no tolerance for error, and they are directly the reasoning structures that real-world AI safety governance demands.
Selected institutions
Department of Defense, Department of State, U.S. Congress, FEMA, Northrop Grumman, Defense Manpower Data Center, Department of the Navy, Department of Veterans Affairs, Naval Postgraduate School, University of Southern California
Mission areas
Defense and security, counterterrorism, counterinsurgency, peacekeeping operations, health systems, decision-support systems, disaster recovery, nonprofit leadership, workforce development, digital inclusion, AI safety and alignment
Why 1023AI
The name references Avogadro's number (6.022 x 1023), the precise mathematical boundary where immense collections of microscopic interactions forge emergent macroscopic behavior. That is not a metaphor for AI. It is a description of what actually happens. Scaling does not simply improve performance. It changes what the system is, what it can do, and what it can get wrong. Beyond a certain scale, aggregate behavior changes qualitatively, demanding a different approach.
The European Commission's official Guidelines under the EU AI Act arrive at the same number, establishing 1023 floating-point operations of training compute as the precise threshold at which AI models qualify as General Purpose AI triggering mandatory regulatory oversight. That convergence is not coincidental. It marks the boundary where AI generality becomes real, emergent capabilities and emergent risks become the dominant safety challenge, and governance must cross the same threshold the model does. Safety at that scale requires leadership that understands emergence, not just evaluation. That is what my work is about.
Is your AI system ready for the real world?
I am open to conversations with organizations deploying AI at scale that are exploring fractional or in-house executive leadership in AI safety and alignment. The work is never generic; every engagement is shaped by the specific organization, its specific challenges, and the specific sociotechnical system it is operating within.
If your organization is navigating emergent capabilities or emergent risks, the gap between evaluated safety and real-world resilience, or the human and institutional conditions that determine whether safety holds at scale, reach out.
Safety at scale. That is what I do.