meditator | yogi | casual naรฏve artist | self-taught
amateur handbalancer & contortionist-in-progress
earthling | hooooman | occasionally plant


๐Ÿ‘๏ธโ€๐Ÿ—จ๏ธ favorite piece of contemplative wisdom ๐Ÿ‘๏ธโ€๐Ÿ—จ๏ธ

"If your mind is empty...it is open to everything.
In the beginner's mind there are many possibilities, but in the expertโ€™s there are few."
—Shunryu Suzuki, “Zen Mind, Beginnerโ€™s Mind”

Hi ๐Ÿ––๐Ÿฝ

Don't hesitate to reach out if you want to chat about research Schedule a meeting

Bio

Broadly, I am interested in the study of cognition in humans ๐Ÿง  and machines ๐Ÿค–, aiming both to enhance scientific knowledge and understanding, as well as to assist in the development of technology enhancing human health and well-being, and minimizing suffering.

My research spans Reinforcement Learning (RL) and sequential decision making, policy optimization, deep learning and theoretical & computational neuroscience.

I use tools and perspectives from mathematics, statistics, optimization, machine/deep learning, numerical methods, dynamical systems, as well as knowledge and insights from psychopharmacology.

Research questions

๐Ÿค– AI-RL, policy optimization, learning theory, deep & lifelong learning, dynamical systems
๐Ÿง  neuroscience-study and understanding of the mechanisms underlying sensory, motor or cognitive computations, neuroplasticity, psychedelic research
Fan of research mysteries ๐Ÿ”ฎ and fundamental questions ๐Ÿฆ„

Appreciate research that offers a new understanding insight or connection across branches of research or fields in a unifying truth

Interests
  • reinforcement learning/policy optimization
  • deep/online/lifelong learning
  • computational neuroscience
  • psychedelic research
  • altered states of consciousness
  • causality
Education
  • PhD in Computer Science (AI/RL/neuro), 2019-present

    McGill University / Mila Quebec AI Institute

  • M.Sc in Computer Science (AI), 2013

    University Politehnica of Bucharest

  • B.Sc in Computer Science (math/CS), 2009

    University Politehnica of Bucharest

Research overview

I'm currently focusing on ๐Ÿค– acceleration for policy optimization in RL and ๐Ÿง  computational models of psychedelic action.

Past research chapters

I previously explored the idea of anticipating the future and adapting to it, and proposed a simple template for accelerating policy gradient algorithms by integrating foresight into the policy improvement step, via optimistic and adaptive policy updates. I defined optimism as predictive modeling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate accumulating errors from overshooting predictions or delayed responses to change. Currently, I am investigating acceleration within the Policy Mirror Descent (PMD) general family of algorithms, which cover a wide range of novel and fundamental methods in reinforcement learning.

I also studied policy optimization as a joint-maximization problem and worked on a surrogate policy learning objective for the joint maximization of a policy and its value-function. Practical implementations of policy-based algorithms rely on value-functions, represented as neural networks, to compute the policy gradient, introducing challenges related to the accuracy of the policy gradient, particularly in low-capacity regimes, characteristic of agents with bounded rationality.

Previously, I studied credit assignment in value-based agents, focusing on questions related to how agents should model the environments they interact with, whether that be in anticipation using forethought, or retrospectively using hindsight models and backward looking mechanisms for adaptivity.

I then showed how these models can be extended to include selectivity via simple contextual attention-based mechanisms and learn those from experience.

Traditionally, in machine learning, we often assume that the stream of data in the future will resemble the data seen so far, yet these assumptions may not align with the complexity of real-world settings, where the dynamics of the environment evolve in partially predictable ways over time and space. Previously, I have explored some of these challenges, and proposed incorporating predictive knowledge (of the agent's future behavior) to mitigate them.

Industry experience

 
 
 
 
 
Research Scientist Intern ๐ŸŒ€ DeepMind Montreal
April 2022 โ€“ April 2023
 
 
 
 
 
Research Scientist Intern ๐ŸŒ€ DeepMind London
February 2021 โ€“ June 2021
 
 
 
 
 
Machine Learning Engineer ๐ŸŒ€ Apsisware (‘17-‘18) / Sparktech Software (‘16)
January 2016 โ€“ January 2018 Bucharest, Romania
 
 
 
 
 
Software Engineer ๐ŸŒ€ Deutsche Bank (‘15)/ Misys (‘14)/ Cronian Labs (‘13)
January 2013 โ€“ January 2015 Bucharest, Romania

Academic awards

IVADOโ€™s PhD Excellence scholarship
Borealis AI Fellowship
Excellence Scholarship
Merit Scholarship

Recent Publications & Preprints

Quickly discover relevant content by filtering publications.
(2020). Lambda Successor Return Error. Biological and Artificial Reinforcement Learning Workshop at Neural Information Processing Systems (NeurIPS).

(2019). Option Discovery by Aiming to Predict. Proceedings of Reinforcement Learning and Decision Making (RLDM), Multi-Task and Lifelong Reinforcement Learning Workshop, Self-Supervised Learning Workshop at International Conference on Machine Learning (ICML).

PDF