Authors: Zian Li, Cai Zhou, Xiyuan Wang, Xingang Peng, Muhan Zhang
Abstract: Recent advancements in molecular generative models have demonstrated
substantial potential in accelerating scientific discovery, particularly in
drug design. However, these models often face challenges in generating
high-quality molecules, especially in conditional scenarios where specific
molecular properties must be satisfied. In this work, we introduce GeoRCG, a
general framework to enhance the performance of molecular generative models by
integrating geometric representation conditions. We decompose the molecule
generation process into two stages: first, generating an informative geometric
representation; second, generating a molecule conditioned on the
representation. Compared to directly generating a molecule, the relatively
easy-to-generate representation in the first-stage guides the second-stage
generation to reach a high-quality molecule in a more goal-oriented and much
faster way. Leveraging EDM as the base generator, we observe significant
quality improvements in unconditional molecule generation on the widely-used
QM9 and GEOM-DRUG datasets. More notably, in the challenging conditional
molecular generation task, our framework achieves an average 31\% performance
improvement over state-of-the-art approaches, highlighting the superiority of
conditioning on semantically rich geometric representations over conditioning
on individual property values as in previous approaches. Furthermore, we show
that, with such representation guidance, the number of diffusion steps can be
reduced to as small as 100 while maintaining superior generation quality than
that achieved with 1,000 steps, thereby significantly accelerating the
generation process.
Source: http://arxiv.org/abs/2410.03655v1