top of page

Entanglement Learning (EL) 

A New Machine Learning Paradigm for Adaptive and Autonomous Agents

Entanglement Learning (EL) is a groundbreaking approach to reinforcement learning that revolutionizes how AI agents learn, adapt, and make decisions in complex, dynamic environments. Drawing from the principles of control and information theories, EL introduces a novel framework for quantifying and optimizing the mutual information between an agent's actions and the environment's states. By capturing the intricate dependencies and correlations in agent-environment interactions, EL enables the development of more efficient, adaptable, and resilient learning algorithms.

EL introduces a novel metric called agent-environment entanglement, which quantifies the mutual predictability between an agent's actions and the environment's states. By using entanglement as an intrinsic, universal reference set point, EL enables agents to establish an internal closed-loop feedback control mechanism. This allows agents to autonomously manage their interactions with dynamic operational environments post-deployment, adapting their behavior based on the evolving entanglement levels, with minimal reliance on external guidance and corrections.

 

Moreover, EL introduces the concept of the Information Genome (InfoGen), a task-specific representation of the learned information patterns and dependencies between an agent's actions and the environment's states. The InfoGen serves as a guiding blueprint, providing the agent with baseline estimates of probable actions in response to observed states and anticipated state changes following specific actions. By leveraging the InfoGen, agents can simulate possible state-action-next state entanglement values and adjust their actions towards higher entanglement, leading to more effective learning and adaptation. with minimal human intervention.

Entanglement Learning Hypothesis

The more complex, the more it is entangled!

The Entanglement Learning (EL) hypothesis proposes that as systems become more complex, their entanglement—the mutual predictability and interdependence between their actions and the environment's states—increases. Consequently, these highly entangled systems can channel more information from-and to-the environment, enabling them to become more effective at achieving their goals.

 

The EL hypothesis suggests that by quantifying system-environment entanglement, it is possible to use that value and its changes to guide and optimize systems' performance and adaptability.

Complex network.jpg

Low complexity, low entanglement 

Comple_System.jpeg

High complexity, High entanglement 

Entanglement Learning Basic Concept

Entanglement Learning considers the complete, closed-loop interaction between an agent and its environment, emphasizing the dual-channel communication and associated uncertainties. EL captures how the agent (left side) and the environment (right side) interact through two distinct channels: the agent channel, where actions are selected based on observed states, and the environment channel, where the environment responds with next states based on the agent's actions.

 

The agent channel captures the uncertainty and dependencies in the action-selection process, while the environment channel captures the uncertainty in the resulting state transitions. This continuous cycle of interaction highlights the mutual dependencies and information exchange, forming the basis for entanglement learning.

 

By understanding the information flow in the two channels and the associated uncertainties, we can better capture and quantify how agents adapt and learn effectively within dynamic environments.

Defining Entanglement Values

The core concept of Entanglement Learning (EL) based on principles of information theory, depicting the relationships between states (S), actions (A), and next states (S') within an agent-environment interaction cycle.

 

The left section shows states (S) with their associated entropy 𝐻(𝑆), representing the uncertainty in the agent's perception. The central section shows actions (A) with entropy 𝐻(𝐴), indicating the uncertainty in action selection. The right section illustrates next states (S') with entropy 𝐻(𝑆′), reflecting the unpredictability of the environment's response. Conditional entropies 𝐻𝑎(𝐴∣𝑆) and 𝐻𝑣(𝑆′∣𝐴) measure uncertainties in action selection given a state and next state prediction given an action, respectively.

 

The Mutual information 𝑀𝐼(𝐴;𝑆) and 𝑀𝐼(𝑆′;𝐴) capture dependencies between states and actions, and actions and next states. The overall mutual information 𝑀𝐼(𝑆,𝐴;𝑆′), highlighted in red, quantifies the entanglement, representing the mutual predictability and interdependency within the entire state-action-next state loop. This comprehensive view enables EL to provide adaptive control signals, enhancing the agent's learning and decision-making processes by leveraging these information-theoretic insights.

EL Calculation.png

Entanglement:  ψ = MI (S,A;S')

Entanglement Learning Architecture 

The architecture of an Entanglement Learning (EL) enabled RL agent introduces an internal reference or a control set point, which enables the agent to autonomously adjust its behavior. Central to this architecture is the Semantic Matrix, containing the Information Genome (InfoGen), which calculates and stores entanglement metrics. These metrics serve as the intrinsic performance reference, allowing the agent to measure deviations from its optimal behavior and generate adaptive control signals through the Entanglement Controller (EC). This feedback mechanism distinguishes EL from other reinforcement learning approaches by enabling continuous, real-time adjustments to the agent's actions and policies based on the evolving state of the environment.

 

The EL-Agent interacts with the environment by observing states and executing actions, forming a closed-loop system. The entanglement metrics derived from the agent's interaction with the environment ensure that the agent can dynamically adapt to changing conditions, enhancing its autonomy and decision-making capabilities. This intrinsic feedback loop, powered by the EL framework, significantly improves the agent's ability to maintain optimal performance in complex and dynamic environments, after its deployment.

EL Agents and Information/Human Digital Twins 

Entanglement Learning (EL) enabled Reinforcement Learning (RL) agents form the foundation for the development of Information Digital Twins and Human Digital Twins. These digital twins are advanced AI-based agents that leverage the principles of EL to capture, understand, and optimize the complex interactions and dynamics between systems, or humans and their environments.

 

By incorporating EL-enabled RL agents, Information and Human Digital Twins can effectively learn, adapt, and make informed decisions in real-time, based on the evolving entanglement between the system or human they support and their surroundings. This enables these digital twins to serve as powerful tools for simulation, prediction, and optimization, across a wide range of domains, from industrial processes and supply chain management to healthcare and personalized services.

Entanglement Learning Algorithm

The following pseudo algorithm illustrates the operational steps of an Entanglement Learning (EL) based agent, which utilizes the Information Genome (InfoGen) to optimize agent decisions to adapt to its environment:

  1. Initialization—Load the Information Genome (InfoGen) with historical or end-of-learning phase data.

  2. State Observation—The agent observes the current state from the environment.

  3. Action Selection—The agent's policy suggests an action based on the observed state.

  4. Expected EL Calculation—Retrieve expected next states and their entanglement values from InfoGen for the recommended action.

  5. Action Execution—The agent executes the recommended action.

  6. Action Update—Update action selection towards higher entanglement.

  7. State Transition and Entanglement Evaluation—Observe the actual next state and calculate the change in entanglement and asymmetry from InfoGen.

  8. Next State Analysis—InfoGen provides possible subsequent actions and their entanglement values for the observed next state.

  9. Entanglement Optimization—Choose the next action to compensate for the entanglement change, either increasing overall entanglement or restoring asymmetry.

  10. RL Agent Update—Update the policy and/or environment model based on entanglement changes to improve decision-making.

  11. InfoGen Update—Update the InfoGen entropies based on actual states and actions.

  12. RepeatThe cycle repeats with the new state and selected action, continually adjusting based on new data and entanglement metrics.

EL_Algorithm.png

We Know how to complete steps 1 to 5, however, we still need to understand how to complete steps 6 to 12, which is the focus of our current research.

Join Us in Advancing Adaptive AI 

At this stage, we have developed and understood the initial steps (1-5) of the Entanglement Learning (EL) algorithm. The subsequent steps (6-12), involving how entanglement metrics are translated into control signals, are still under investigation. We invite researchers and practitioners with expertise in reinforcement learning, information theory, and adaptive systems to join our collaborative effort. Your contribution will be pivotal in advancing the development of a fully integrated EL framework, enhancing the adaptive capabilities of RL agents.

 

If you are interested in contributing to this cutting-edge research, please reach out to us to discuss potential collaboration opportunities. Together, we can push the boundaries of intelligent systems and their applications.

Entanglement Metrics

Entanglement (ψ):   

ψ = MI (S,A;S') 

 

Entanglement (ψ) is defined as the mutual information of state, action, next state triplet (or entanglement tokens). It quantifies the overall level of interdependence and correlation between the agent's actions and the environment's states. A higher value of ψ indicates stronger entanglement, suggesting that the agent's actions are more closely coupled with the environment's dynamics.

 

Differential Entanglement (Δψ):   

Δψ = ψ(t) - ψ(t-1) 

Differential Entanglement (Δψ) measures the change in entanglement between consecutive time steps. It is calculated as the difference between the entanglement at the current time step, ψ(t), and the entanglement at the previous time step, ψ(t-1). A positive value of Δψ indicates an increase in entanglement, while a negative value suggests a decrease. Monitoring Δψ helps to understand the dynamics of entanglement over time and identify significant changes in the agent-environment interaction.

 

Entanglement Asymmetry (Λψ):   

Λψ = MI (A;S) - MI (S';A)

 

Entanglement Asymmetry (Λψ) quantifies the difference between the mutual information of agent actions given the environment states (MI) and the mutual information of subsequent environment states given the agent actions (MIv. It measures the asymmetry or directionality of the information flow between the agent and the environment. A positive value of Λψ indicates that the agent's actions are more influenced by the environment states, while a negative value suggests that the agent's actions have a stronger impact on shaping the future environment states. Λψ provides insights into the balance and directionality of the agent-environment interaction.

Where: 

  • MI (A;S) is the mutual information between agent actions (A) and environment states (S).

  • MI (S';A) represents the mutual information between subsequent environment states (S') and agent actions (A).

  • ψ(t) and ψ(t-1) represent the entanglement at the current and previous time steps, respectively.

These EL metrics - Entanglement (ψ), Differential Entanglement (Δψ), and Entanglement Asymmetry (Λψ) - provide a comprehensive framework for understanding and quantifying the dynamics of agent-environment interactions in Entanglement Learning. By monitoring and optimizing these metrics, EL aims to develop agents that can effectively adapt to changing conditions, make intelligent decisions, and achieve robust performance in complex environments.

bottom of page