research/design-protein

Design complex Protein de-novo

Public
855 runs

Run time and cost

This model costs approximately $0.00022 to run on Replicate, or 4545 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 1 seconds.

Readme

FractalRL Computational Architecture: A Geometric Paradigm for Molecular Design

  1. Introduction and Computational Paradigm

The FractalRL architecture is defined as an advanced computational paradigm that combines the principles of fractal geometry, deep learning and reinforcement learning. It has been designed to address complex design problems, particularly in rapidly evolving pathological contexts. The founding idea is to integrate into a coherent pipeline models capable of capturing the intrinsic self-similarity of biological systems, molecular generation mechanisms based on geometrically conscious diffusion processes, and adaptive exploration strategies to maximize the evolutionary robustness of the identified solutions. More than an incremental advancement, FractalRL represents a paradigm shift that treats protein sequences not as discrete strings, but as continuous objects within a Riemannian variety, where biological function emerges from geometric properties.

  1. Theoretical Foundations and Data Representation

The framework rests on two mathematical pillars: fractal theory, which describes self-similar structures at different scales, and discrete differential geometry. The latter allows to quantify the informational curvature of biological networks through discrete Ricci curvature formulations. To these is added the theory of optimal transport, used to define significant distances between distributions of molecular states. The synergy of these tools allows you to build a latent representation that preserves the scale, topology and symmetry properties of biochemical structures.

At the center of the system there is a block-based encoder

FractalNet, capable of simultaneously processing a molecule at multiple resolutions. During processing, the network calculates the local fractal dimension using a box-counting scheme to modulate the network paths according to the local geometric complexity. The result is a latent embedding in which the information density reflects the multifractality of the sample, preserving spatial patterns that conventional architectures tend to miss.

  1. Generative Process: Diffusion and Reinforcement Learning

The generative phase uses a dual approach that combines diffusion models and a reinforcement learning agent.

Geometrically Conscious Diffusion: The generation exploits multiscale denoising models. In this process, the noise coefficient is dynamically adapted according to the Ricci curvature calculated on the molecular graph. In regions of high positive curvature (topologically central nodes), the disturbance is reduced to safeguard critical interactions, while where the curvature is almost flat, the disturbance is amplified to explore alternative isomers.

Exploration through Reinforcement Learning: The FractalRL agent interacts with latent space through hierarchical policies. To avoid premature convergence towards excellent premises, the diversity of the candidates is guaranteed by Processes to Determinant Points, which maximize the distance between the candidate actions. The reward function is defined as a multi-objective functional that balances effectiveness, toxicity and resistance to mutations.

  1. Adaptive Learning and Improvement Mechanisms

To address the dynamism of contexts such as evolutionary diseases, the framework integrates a level of logic-based meta-learning

MAML (Model-Agnostic Meta-Learning). When new pathogenic variants emerge, the system performs rapid adaptation cycles (

Inner-loop) updating parameters with high plasticity. This strategy reduces the risk of catastrophic oblivion (

Catastrophic forgetting) and allows the system to readapt with a reduced number of examples. Each design cycle, including failures, generates valuable data that is re-entered into the system, creating a self-reinforcing improvement circuit.

  1. Operational Pipeline and Scalability

The operational flow of the system is divided into four main phases:

Preprocessing: Integration of multi-omic datasets and construction of biological networks annotated with Ricci curvatures.

Training: Distributed training on GPU clusters, using hierarchical sampling and Bayesian optimization.

Generation: The diffusion models produce thousands of candidates, which the RL agent filters according to its diversified policy.

Validation: An in-silico and in vitro validation cycle feeds the meta-learning system again, establishing continuous improvement.

From the point of view of scalability, the architecture achieves a computational complexity of

O(n log n) thanks to hierarchical processing and scattered calculation of curvature, a remarkable improvement over traditional methods with complexity O(n³). Initial screening employs rapid filtering based on fractals to identify the most promising candidates, achieving throughput improvements of three orders of magnitude.

Copyright (C) 2025 (Lorenzo Bernardini Neri) D-AI Research and Research Partners - All Rights Reserved

This source code is provided for viewing and educational purposes only.

You may not use, copy, modify, reproduce, or distribute this software, in whole or in part, without the express written permission of the copyright holder.

All commercial rights, including but not limited to patent rights, are exclusively reserved by the copyright holder. Unauthorized use is strictly prohibited and may be subject to legal action.