License: arXiv.org perpetual non-exclusive license
arXiv:2402.06772v1 [q-bio.QM] 09 Feb 2024

Retrosynthesis Prediction via
Search in (Hyper) Graph

Zixun Lan11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT   Binjie Hong22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT   Jiajun Zhu33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT   Zuo Zeng44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT   Zhenfu Liu44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT   Limin Yu33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT   Fei Ma1*1{}^{1*}start_FLOATSUPERSCRIPT 1 * end_FLOATSUPERSCRIPT
11{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT Department of Applied Mathematics, Xi’an Jiaotong-Liverpool University
22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Department of Computer Science, The University of Manchester
33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT Electrical and Electronic Engineering, Xi’an Jiaotong-Liverpool University
44{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT Hours Technology Co., Ltd., Suzhou, China
{zixun.lan19, jiajun.zhu22}@student.xjtlu.edu.cn, [email protected],
{zuo.zeng, liuzhenfu}@woshikeji.net, {limin.yu, fei.ma}@xjtlu.edu.cn
*{}^{*}start_FLOATSUPERSCRIPT * end_FLOATSUPERSCRIPTcorresponding author
Abstract

Predicting reactants from a specified core product stands as a fundamental challenge within organic synthesis, termed retrosynthesis prediction. Recently, semi-template-based methods and graph-edits-based methods have achieved good performance in terms of both interpretability and accuracy. However, due to their mechanisms these methods cannot predict complex reactions, e.g., reactions with multiple reaction center or attaching the same leaving group to more than one atom. In this study we propose a semi-template-based method, the Retrosynthesis via Search in (Hyper) Graph (RetroSiG) framework to alleviate these limitations. In the proposed method, we turn the reaction center identification and the leaving group completion tasks as tasks of searching in the product molecular graph and leaving group hypergraph respectively. As a semi-template-based method RetroSiG has several advantages. First, RetroSiG is able to handle the complex reactions mentioned above by its novel search mechanism. Second, RetroSiG naturally exploits the hypergraph to model the implicit dependencies between leaving groups. Third, RetroSiG makes full use of the prior, i.e., one-hop constraint. It reduces the search space and enhances overall performance. Comprehensive experiments demonstrated that RetroSiG achieved competitive results. Furthermore, we conducted experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments verified the efficacy of specific elements, such as the one-hop constraint and the leaving group hypergraph.

Index Terms:
Retrosynthesis, Reinforcement Learning, Semi-Template-Based Method.

I Introduction

The retrosynthesis prediction aims to discern the appropriate reactants necessary for synthesizing a designated product molecule. It was initially formalized by [1] and has since evolved into a foundational issue within the realm of organic chemistry. The challenge lies in the vastness of the search space, encompassing a multitude of potential transformations, as noted by [2]. Thus, there has been a quest for a range of computer algorithms aimed at supporting seasoned chemists in their endeavors. Among them, machine learning based methods have played a vital role and achieved significant progress in this area recently [3].

The template-based and template-free methods have their own set of advantages and disadvantages. The template-based methodologies [3, 4, 5, 6, 7] tackle retrosynthesis prediction as a template retrieval problem. Following template retrieval, these approaches utilize cheminformatics tools like RDKit [8] to construct full reactions. Despite their high interpretability and assurance of molecule validity, they cannot predict reactions outside the template library. In contrast, the template-free approaches leverage deep generative models to produce reactants based on a given product directly. Since molecules can be represented using both graphs and SMILES sequences, existing techniques reframe retrosynthesis as either a sequence-to-sequence problem [9, 10, 11, 12, 13] or a graph-to-sequence problem [14]. Although these generative methods facilitate chemical reasoning within a more expansive reaction space, they lack interpretability.

The semi-template approaches effectively leverage the advantages inherent in generative models along with established chemical knowledge to enhance their methodology. Typical frameworks [15, 16, 17, 18, 19] within this category adhere to a common strategy: they first pinpoint the reaction center and convert the product into synthons. Then, a separate model completes the transformation of these synthons into reactants. These methods offer competitive accuracy and maintain interpretability due to their stage-wise nature. Nonetheless, they continue to face challenges in predicting reactions with multiple reaction center and fail in the scenario where the same leaving group is attached to more than one atom in a molecular graph. This paper refers to the reaction center comprised of multiple bonds or atoms as multiple raction center. Fig. 1 shows a reaction with multiple reaction center and attaching the same leaving group to more than one atom.

Recently, some methodologies [20, 21] have framed retrosynthesis prediction as the task of editing the product molecular graph to the reactant molecular graph. In this paper, we refer to this type of method as the graph edit sequence conditional generative model. The edit operations encompass actions such as Delete-bond, Change-bond, Change-atom, Attach-leaving-group, etc. These techniques extract edit sequences from ground-truth reaction data and then model the conditional probability distribution over these sequences given one product molecular graph. Despite its mitigation of some template-based method limitations, the graph edit sequence conditional generative model remains challenged. It cannot handle attaching the same leaving group to more than one atom in a molecular graph and also underutilizes the prior of chemical rules.

Refer to caption
Figure 1: Architecture overview. RretroSiG first identifies reaction center via (a) search in the product molecular graph and then completes leaving groups through (b) search in the leaving group hypergraph. Finally, RetroSiG converts predicted subgraph into reaction center SMARTS and leaving groups SMARTS, subsequently merging them to derive one retrosynthesis template. RetroSiG obtains the reactants by applying the merged template to the given product. At state t𝑡titalic_t, the red highlighted part represents the explored nodes, and the green nodes denote the action space after applying the one-hop constraint.

Motivated by the aforementioned constraints, we introduce the RetroSiG (Retrosynthesis via Search in (Hyper) Graph) framework (Fig. 1)—a semi-template-based approach—aimed at facilitating single-step retrosynthesis prediction. There are two phases in our RetroSiG, i.e., (a) search in the product molecular graph and (b) search in the leaving group hypergraph. (a) (Fig. 1a): We regard the reaction center identification as the search in the product molecular graph. A crucial observation dictates that the single or multiple reaction center should constitute a node-induced subgraph within the molecular product graph [22]. Therefore, RetroSiG utilizes a reinforcement learning agent with the graph encoder to explore the appropriate node-induced subgraph in the specific product molecular graph, identifying it as the predicted reaction center. At each stage, it contemplates the selection of one node within the molecular product graph as an action, incorporating it into the explored node-induced subgraph. As the reaction center graph maintains connectivity, we implement a one-hop constraint to streamline the search space and optimize the overall performance. (b) (Fig. 1b): We regard the leaving group completion as the search in the leaving group hypergraph. Since the unique leaving group patterns derived from the training dataset can cover 99.7% of the test set [17], it is feasible to predict leaving group patterns from the training leaving group corpus. Implicit dependencies exist among various leaving groups. Naturally, we can model these dependencies by creating a leaving group hypergraph. Each node denotes one unique leaving group pattern, while a hyperedge connects the corresponding leaving groups that co-occur within a template (reaction). Therefore, RetroSiG also utilizes a reinforcement learning agent with the hypergraph encoder to explore the appropriate node-induced subgraph in the leaving group hypergraph, selecting it as the resulting leaving groups. At each step, it contemplates the selection of one node within the leaving group hypergraph as an action, incorporating it into the explored node-induced subgraph. In the evaluation set, we observe that 98.9% of the leaving groups that co-occur within a template are contained within the same hyperedges. Thus, we make full use of the prior and impose a one-hop constraint to reduce the search space. Finally, we convert two predicted node-induced subgraphs into reaction center SMARTS and leaving groups SMARTS, subsequently merging them to derive one retrosynthesis template. We obtain the reactants by applying the retrosynthesis template to the given product [22]. Comprehensive experiments demonstrate that RetroSiG achieves a competitive result. Furthermore, we conduct experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments confirm the efficacy of specific elements, such as the one-hop constraint and the leaving group hypergraph. In summary, we outline our primary contributions as follows:

  • We propose, a semi-template-based method, the RetroSiG framework to conduct single-step retrosynthesis prediction. It has the capability to predicting reactions with multiple reaction center and attaching the same leaving group to more than one atom.

  • The key novelty of RetroSiG is viewing the reaction center identification and leaving group completion as the search in the product molecular graph and the leaving group hypergraph respectively. The hypergraph effectively represents the implicit dependencies among leaving groups, and the search in the graph makes full use of the prior, i.e., one-hop constraint.

  • Comprehensive experiments demonstrate that RetroSiG achieves a competitive result. Furthermore, we conduct experiments to show the capability of RetroSiG in predicting complex reactions. Ablation experiments confirm the efficacy of specific elements.

II Related Work

We summarize the comparison of existing single-step retrosynthesis in Table I. Compared with TB, RetroSiG does not need to perform Subgraph Matching (NP-Hard) since we can determine the exact location of the reaction center by the explored node-induced subgraph. Also, the specific position leads to the unique solution after applying the template, and thus RetroSiG does not need the additional model to ranking Reactants. RetroSiG can obtain Explicit Template and have the capability of predicting complex reactions with multiple reaction center (RC) or attaching the same leaving group (LG) to more than one atom.

TABLE I: Comparison of different baselines in seven dimensions.
Interpretability Subgraph Matching Ranking Reactants Explicit Template Multi-RC Attaching the Same LG to More Than One Atom Extrapolation Ability
TB
TF
STB
Graph2Edits
RetroSiG

II-A Graph Edit Sequence Conditional Generative Methods

MEGAN [20] and Graph2Edits [21] express a chemical reaction as a sequence of graph edits 𝑬=(e0,,et,,eT)𝑬subscript𝑒0subscript𝑒𝑡subscript𝑒𝑇\boldsymbol{E}=\left(e_{0},\cdots,e_{t},\cdots,e_{T}\right)bold_italic_E = ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) that transform the product molecular graph into the reactant molecular graph. Here, the edits e𝑒eitalic_e encompass various actions such as Delete-bond, Change-bond, Change-atom, Attach-leaving-group, and so forth. The probability of a sequence of graph edits 𝑬=(e0,,et,,eT)𝑬subscript𝑒0subscript𝑒𝑡subscript𝑒𝑇\boldsymbol{E}=\left(e_{0},\cdots,e_{t},\cdots,e_{T}\right)bold_italic_E = ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) can be factorized to the probabilities conditional on the product molecule 𝒑𝒑\boldsymbol{p}bold_italic_p and 𝑬<t=(e0,,et1)subscript𝑬absent𝑡subscript𝑒0subscript𝑒𝑡1\boldsymbol{E}_{<t}=\left(e_{0},\cdots,e_{t-1}\right)bold_italic_E start_POSTSUBSCRIPT < italic_t end_POSTSUBSCRIPT = ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ⋯ , italic_e start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ), i.e.,

p(𝑬𝒑)=t=1Tp(et𝒑,𝑬<t),T1formulae-sequence𝑝conditional𝑬𝒑superscriptsubscriptproduct𝑡1𝑇𝑝conditionalsubscript𝑒𝑡𝒑subscript𝑬absent𝑡𝑇1p(\boldsymbol{E}\mid\boldsymbol{p})=\prod_{t=1}^{T}p\left(e_{t}\mid\boldsymbol% {p},\boldsymbol{E}_{<t}\right),T\geq 1italic_p ( bold_italic_E ∣ bold_italic_p ) = ∏ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_p ( italic_e start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∣ bold_italic_p , bold_italic_E start_POSTSUBSCRIPT < italic_t end_POSTSUBSCRIPT ) , italic_T ≥ 1 (1)

When T=0𝑇0T=0italic_T = 0, 𝑬=(e0)𝑬subscript𝑒0\boldsymbol{E}=\left(e_{0}\right)bold_italic_E = ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) and p(e0𝒑)=1𝑝conditionalsubscript𝑒0𝒑1p(e_{0}\mid\boldsymbol{p})=1italic_p ( italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∣ bold_italic_p ) = 1 because e0subscript𝑒0e_{0}italic_e start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is a fixed start token. These techniques extract edit sequences from ground-truth reaction data and then model the conditional probability distribution over these edit sequences given one product molecular graph. MEGAN [20] was the initial attempt to represent reactions as edits sequences for retrosynthesis prediction. However, this approach faced difficulties in reactant generation due to the atomic-level addition operations. It particularly struggled with reactions involving the attachment of large leaving groups. Graph2Edits [21] presents retrosynthesis as a process of reasoning involving reactions between products, intermediates, and reactants, achieved through a sequence of graph modifications. Importantly, in contrast to the MEGAN model, Graph2Edits substitutes the ”add-atom” actions with the attachment of substructures, reducing the number of generation steps and thereby enhancing reactant generation efficiency. These methods acquire knowledge of reaction transformation rules to a certain degree, thus improving their interpretability and generalization capabilities in complex reactions. However, It cannot predict reactions with attaching the same leaving group to more than one atom in a molecular graph and also underutilizes the prior of chemical rules.

II-B Template-based Methods (TB)

The template serves as a reaction representation that’s both more efficient and interpretable [22]. After establishing a library of reaction templates, the methods involve comparing a target molecule against these templates. Upon finding a match, the corresponding template is utilized to transform product molecules into reactant molecules. Numerous research efforts have introduced various strategies for prioritizing templates. In prior studies, [3] utilized molecular fingerprint similarity between the intended product and compounds in the dataset to prioritize potential templates. Meanwhile, [5] employed Neuralsym, a hybrid neural-symbolic model, to glean insights for a multi-class classification task focused on template selection. Additionally, [4] interpreted chemical insights within reaction templates as logical rules, utilizing graph embeddings to derive the conditional joint probability of these rules and reactants. Furthermore, [6] evaluated pertinent local templates within forecasted reaction centers of a target molecule. It also accounted for the non-local impacts of chemical reactions by integrating global reactivity attention. [7, 23] composes templates by selecting and annotating molecule subgraphs from training templates. Despite their interpretability, template-based methods have restricted applicability due to their incapability to forecast reactions beyond the template library. Moreover, their suitability diminishes when dealing with extensive template sets due to the challenges inherent in subgraph matching.

II-C Template-free Methods (TF)

Template-free Methods utilize deep generative models to generate reactant molecules directly. Prior research [9, 10, 11, 12, 24, 13] has employed the Transformer architecture [25], reconceptualizing the issue as a sequence-to-sequence translation task, transforming from product molecules to corresponding reactants. [14] replaces the initial sequence encoder with a graph encoder to guarantee the permutation invariance of SMILES representations. Although these generative methods facilitate chemical reasoning within a more expansive reaction space, they lack interpretability.

II-D Semi-template-based Methods (STB)

Semi-template-based approaches amalgamate the advantages of generative models with additional chemical knowledge. Considering that chemical reactions generally entail altering a limited segment of the molecular structure, most existing research approaches have divided retrosynthesis into a two-stage procedure: first, identifying the reaction center utilizing a graph neural network to generate synthons via molecular editing [26], and subsequently converting these synthons into reactants employing methods like a graph generative model [16], a Transformer [16, 18], or a subgraph selection model [17]. These methods offer competitive accuracy and maintain interpretability due to their stage-wise nature. Nonetheless, they continue to face challenges in predicting reactions with multiple reaction center. They cannot effectively address scenarios where the same leaving group is attached to more than one atom in a molecular graph [27, 28, 29, 30].

III Methodology

III-A Preliminaries

III-A1 Edge Graph Attention Network Layer

Given 𝐇(t)=[𝒉1(t);𝒉2(t);;𝒉n(t)]Rn×dsuperscript𝐇𝑡superscriptsubscript𝒉1𝑡superscriptsubscript𝒉2𝑡superscriptsubscript𝒉𝑛𝑡superscript𝑅𝑛𝑑\mathbf{H}^{(t)}=[\boldsymbol{h}_{1}^{(t)};\boldsymbol{h}_{2}^{(t)};\cdots;% \boldsymbol{h}_{n}^{(t)}]\in R^{n\times d}bold_H start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = [ bold_italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ; bold_italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ; ⋯ ; bold_italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ] ∈ italic_R start_POSTSUPERSCRIPT italic_n × italic_d end_POSTSUPERSCRIPT and 𝐟ij(t)R1×dsuperscriptsubscript𝐟𝑖𝑗𝑡superscript𝑅1𝑑\mathbf{f}_{ij}^{(t)}\in R^{1\times d}bold_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 1 × italic_d end_POSTSUPERSCRIPT are the node embedding matrix and the embedding of edge (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) at the t𝑡titalic_t-th layer repectively, where 𝒉i(t)R1×dsuperscriptsubscript𝒉𝑖𝑡superscript𝑅1𝑑\boldsymbol{h}_{i}^{(t)}\in R^{1\times d}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 1 × italic_d end_POSTSUPERSCRIPT is the node-level embedding for node i𝑖iitalic_i of the graph and is also the i𝑖iitalic_i-th row of 𝐇(t)superscript𝐇𝑡\mathbf{H}^{(t)}bold_H start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT, d𝑑ditalic_d is the dimension of node-level embedding and n𝑛nitalic_n is the number of nodes. 𝒉i(0)superscriptsubscript𝒉𝑖0\boldsymbol{h}_{i}^{(0)}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and 𝒇ij(0)superscriptsubscript𝒇𝑖𝑗0\boldsymbol{f}_{ij}^{(0)}bold_italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT are the initial features of node and edge in molecular product graph 𝒢psubscript𝒢𝑝\mathcal{G}_{p}caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. EGAT injects the graph structure into the attention mechanism by performing masked attention, namely it only computes αijsubscript𝛼𝑖𝑗\alpha_{ij}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for nodes j𝒩i𝑗subscript𝒩𝑖j\in\mathcal{N}_{i}italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, where 𝒩isubscript𝒩𝑖\mathcal{N}_{i}caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the first-order neighbors of node i𝑖iitalic_i in the graph:

𝒇ij(t+1)superscriptsubscript𝒇𝑖𝑗𝑡1\displaystyle\boldsymbol{f}_{ij}^{(t+1)}bold_italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT = LeakyReLU ([𝒉i(t)𝐖𝒇ij(t)𝒉j(t)𝐖]A),absent LeakyReLU delimited-[]superscriptsubscript𝒉𝑖𝑡𝐖normsuperscriptsubscript𝒇𝑖𝑗𝑡superscriptsubscript𝒉𝑗𝑡𝐖𝐴\displaystyle=\text{ LeakyReLU }\left(\left[\boldsymbol{h}_{i}^{(t)}\mathbf{W}% \left\|\boldsymbol{f}_{ij}^{(t)}\right\|\boldsymbol{h}_{j}^{(t)}\mathbf{W}% \right]A\right),= LeakyReLU ( [ bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT bold_W ∥ bold_italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT ∥ bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT bold_W ] italic_A ) , (2)
eijsubscript𝑒𝑖𝑗\displaystyle e_{ij}italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT =𝐚𝒇ij(t+1)T,αij=exp(eij)k𝒩iexp(eik),formulae-sequenceabsent𝐚superscriptsuperscriptsubscript𝒇𝑖𝑗𝑡1𝑇subscript𝛼𝑖𝑗subscript𝑒𝑖𝑗subscript𝑘subscript𝒩𝑖subscript𝑒𝑖𝑘\displaystyle=\mathbf{a}\cdot{\boldsymbol{f}_{ij}^{(t+1)}}^{T},\alpha_{ij}=% \frac{\exp\left(e_{ij}\right)}{\sum_{k\in\mathcal{N}_{i}}\exp\left(e_{ik}% \right)},= bold_a ⋅ bold_italic_f start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = divide start_ARG roman_exp ( italic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_k ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_exp ( italic_e start_POSTSUBSCRIPT italic_i italic_k end_POSTSUBSCRIPT ) end_ARG ,

where eijRsubscript𝑒𝑖𝑗𝑅e_{ij}\in Ritalic_e start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_R and αijRsubscript𝛼𝑖𝑗𝑅\alpha_{ij}\in Ritalic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∈ italic_R are a non-normalized attention coefficient and a normalized attention coefficient representing the weight of message aggregated from node j𝑗jitalic_j to node i𝑖iitalic_i respectively in the t𝑡titalic_t-th layer of EGAT, and \| is the concatenation operation. Besides,𝐖Rd×d𝐖superscript𝑅𝑑𝑑\mathbf{W}\in R^{d\times d}bold_W ∈ italic_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, AR3d×d𝐴superscript𝑅3𝑑𝑑A\in R^{3d\times d}italic_A ∈ italic_R start_POSTSUPERSCRIPT 3 italic_d × italic_d end_POSTSUPERSCRIPT and 𝐚R1×d𝐚superscript𝑅1𝑑\mathbf{a}\in R^{1\times d}bold_a ∈ italic_R start_POSTSUPERSCRIPT 1 × italic_d end_POSTSUPERSCRIPT are learnable parameters in the t𝑡titalic_t-th layer.

EGAT [31] employs multi-head attention to stabilize the learning process of self-attention, similar to Transformer [25]. If there are K𝐾Kitalic_K heads, K𝐾Kitalic_K independent attention mechanisms execute the Eq. 2, and then their features are concatenated:

𝒉i(t+1)superscriptsubscript𝒉𝑖𝑡1\displaystyle\boldsymbol{h}_{i}^{(t+1)}bold_italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t + 1 ) end_POSTSUPERSCRIPT =MLP(k=1Kσ(j𝒩iαijk𝒉j(t)𝐖k))\displaystyle=\operatorname{MLP}\left(\|_{k=1}^{K}\sigma\left(\sum_{j\in% \mathcal{N}_{i}}\alpha_{ij}^{k}\boldsymbol{h}_{j}^{(t)}\mathbf{W}^{k}\right)\right)= roman_MLP ( ∥ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_σ ( ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT bold_italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ) ) (3)

where \| represents concatenation, αijksuperscriptsubscript𝛼𝑖𝑗𝑘\alpha_{ij}^{k}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT are normalized attention coefficients computed by the k𝑘kitalic_k-th learnable 𝐖kRd×dsuperscript𝐖𝑘superscript𝑅𝑑𝑑\mathbf{W}^{k}\in R^{d\times d}bold_W start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT, AkR3d×dsuperscript𝐴𝑘superscript𝑅3𝑑𝑑A^{k}\in R^{3d\times d}italic_A start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 3 italic_d × italic_d end_POSTSUPERSCRIPT and 𝐚kR1×dsuperscript𝐚𝑘superscript𝑅1𝑑\mathbf{a}^{k}\in R^{1\times d}bold_a start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT 1 × italic_d end_POSTSUPERSCRIPT following Eq. 2. Besides, MLPMLP\operatorname{MLP}roman_MLP denotes multi-perceptron.

III-A2 Hyper Graph Neural Network Layer

Specifically, for the l𝑙litalic_l-th layer in HGNN [32], it takes hypergraph 𝒢hgsubscript𝒢𝑔\mathcal{G}_{hg}caligraphic_G start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT’s incidence matrix HRn×eHsuperscript𝑅𝑛𝑒\mathrm{H}\in R^{n\times e}roman_H ∈ italic_R start_POSTSUPERSCRIPT italic_n × italic_e end_POSTSUPERSCRIPT and hidden representation matrix XlRn×dlsuperscriptX𝑙superscript𝑅𝑛superscript𝑑𝑙\mathrm{X}^{l}\in R^{n\times d^{l}}roman_X start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_n × italic_d start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT as input, where n𝑛nitalic_n and e𝑒eitalic_e are the number of nodes and hyper-edges respectively. Then, the node representations in next layer will be computed as follows:

𝐗l+1=σ(𝐃v12𝐇𝐖𝐃e1𝐇𝐃v12𝐗l𝚯l)superscript𝐗𝑙1𝜎superscriptsubscript𝐃𝑣12superscriptsubscript𝐇𝐖𝐃𝑒1superscript𝐇topsuperscriptsubscript𝐃𝑣12superscript𝐗𝑙superscript𝚯𝑙\displaystyle\mathbf{X}^{l+1}=\sigma\left(\mathbf{D}_{v}^{-\frac{1}{2}}\mathbf% {HWD}_{e}^{-1}\mathbf{H}^{\top}\mathbf{D}_{v}^{-\frac{1}{2}}\mathbf{X}^{l}% \boldsymbol{\Theta}^{l}\right)bold_X start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT = italic_σ ( bold_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_HWD start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_H start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT bold_Θ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (4)

where σ()𝜎\sigma(\cdot)italic_σ ( ⋅ ) is the nonlinear activation function. 𝐃vRn×n,𝐃eRe×e,𝐖Re×eformulae-sequencesubscript𝐃𝑣superscript𝑅𝑛𝑛formulae-sequencesubscript𝐃𝑒superscript𝑅𝑒𝑒𝐖superscript𝑅𝑒𝑒\mathbf{D}_{v}\in R^{n\times n},\mathbf{D}_{e}\in R^{e\times e},\mathbf{W}\in R% ^{e\times e}bold_D start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT , bold_D start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_e × italic_e end_POSTSUPERSCRIPT , bold_W ∈ italic_R start_POSTSUPERSCRIPT italic_e × italic_e end_POSTSUPERSCRIPT are the diagonal node degree, edge degree and edge weight matrices, respectively. 𝚯lRdl×dl+1superscript𝚯𝑙superscript𝑅superscript𝑑𝑙superscript𝑑𝑙1\mathbf{\Theta}^{l}\in R^{d^{l}\times d^{l+1}}bold_Θ start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ∈ italic_R start_POSTSUPERSCRIPT italic_d start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT × italic_d start_POSTSUPERSCRIPT italic_l + 1 end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is a trainable parameter matrix.

III-B Search in the Product Molecular Graph

In this study, RDChiral [22] is employed to identify the super general reaction center. Consequently, it ensures that whether it’s a single or multiple reaction center, it must be a node-induced subgraph within the molecular product graph. A product molecular graph 𝒢p=(𝒱p,p)subscript𝒢𝑝subscript𝒱𝑝subscript𝑝\mathcal{G}_{p}=(\mathcal{V}_{p},\mathcal{E}_{p})caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = ( caligraphic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , caligraphic_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) is represented as a set of |𝒱p|subscript𝒱𝑝|\mathcal{V}_{p}|| caligraphic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | nodes (atoms) and a set of |p|subscript𝑝|\mathcal{E}_{p}|| caligraphic_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT | edges (bonds). The reaction center graph 𝒢rc=(𝒱rc,rc)subscript𝒢𝑟𝑐subscript𝒱𝑟𝑐subscript𝑟𝑐\mathcal{G}_{rc}=(\mathcal{V}_{rc},\mathcal{E}_{rc})caligraphic_G start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT = ( caligraphic_V start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT , caligraphic_E start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT ) is a node-induced subgraph of the product molecular graph 𝒢psubscript𝒢𝑝\mathcal{G}_{p}caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT such that 𝒱rc𝒱psubscript𝒱𝑟𝑐subscript𝒱𝑝\mathcal{V}_{rc}\subseteq\mathcal{V}_{p}caligraphic_V start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT ⊆ caligraphic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT and rc={(u,v)u,v𝒱rc,(u,v)p}subscript𝑟𝑐conditional-set𝑢𝑣formulae-sequence𝑢𝑣subscript𝒱𝑟𝑐𝑢𝑣subscript𝑝\mathcal{E}_{rc}=\left\{(u,v)\mid u,v\in\mathcal{V}_{rc},(u,v)\in\mathcal{E}_{% p}\right\}caligraphic_E start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT = { ( italic_u , italic_v ) ∣ italic_u , italic_v ∈ caligraphic_V start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT , ( italic_u , italic_v ) ∈ caligraphic_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT }. Given the product molecular graph 𝒢psubscript𝒢𝑝\mathcal{G}_{p}caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, reaction center identification aims to detect the corresponding reaction center graph 𝒢rcsubscript𝒢𝑟𝑐\mathcal{G}_{rc}caligraphic_G start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT, i.e. the node set 𝒱rcsubscript𝒱𝑟𝑐\mathcal{V}_{rc}caligraphic_V start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT of 𝒢rcsubscript𝒢𝑟𝑐\mathcal{G}_{rc}caligraphic_G start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT. Thus, we regard the reaction center identification as the search in the product molecular graph. At each stage, RetroSiG deliberates on selecting a node within the molecular product graph as an action, incorporating it into the explored node-induced subgraph. Ultimately, we transform the identified subgraph into the SMARTS of reaction center.

State Space: RetroSiG has a discrete state space 𝒮𝒮\mathcal{S}caligraphic_S. Each state stSsubscript𝑠𝑡𝑆s_{t}\in Sitalic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_S is represented as st={𝒢p,𝒱^rct}subscript𝑠𝑡subscript𝒢𝑝superscriptsubscript^𝒱𝑟𝑐𝑡s_{t}=\{\mathcal{G}_{p},\hat{\mathcal{V}}_{rc}^{t}\}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT }, where 𝒱^rctsuperscriptsubscript^𝒱𝑟𝑐𝑡\hat{\mathcal{V}}_{rc}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT denotes the explored node set at step t𝑡titalic_t. Notably, 𝒱^rct=0=superscriptsubscript^𝒱𝑟𝑐𝑡0\hat{\mathcal{V}}_{rc}^{t=0}=\emptysetover^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t = 0 end_POSTSUPERSCRIPT = ∅.

Action Space: RetroSiG has two types of actions in its action space 𝒜𝒜\mathcal{A}caligraphic_A: (1) selecting one node v𝑣vitalic_v in the product molecular graph 𝒢psubscript𝒢𝑝\mathcal{G}_{p}caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, and (2) stop action, (i.e., doing nothing), denoted as STOP. Since the reaction center graph 𝒢rcsubscript𝒢𝑟𝑐\mathcal{G}_{rc}caligraphic_G start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT is connected, we impose a one-hop constraint on the first type of action. This constraint serves to reduce the search space and enhance overall performance. In other words, the action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is to select one node v𝑣vitalic_v from the first-order neighbour of the current explored subgraph 𝒢^rctsuperscriptsubscript^𝒢𝑟𝑐𝑡\hat{\mathcal{G}}_{rc}^{t}over^ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT or a STOP action implying stop of the exploration process. At step t𝑡titalic_t, action space is represented as 𝒜t>0={v|vONEHOP(𝒢^rct)}{STOP}superscript𝒜𝑡0conditional-set𝑣𝑣ONEHOPsuperscriptsubscript^𝒢𝑟𝑐𝑡STOP\mathcal{A}^{t>0}=\{v|v\in\operatorname{ONE-HOP}(\hat{\mathcal{G}}_{rc}^{t})\}% \cup\{\text{STOP}\}caligraphic_A start_POSTSUPERSCRIPT italic_t > 0 end_POSTSUPERSCRIPT = { italic_v | italic_v ∈ start_OPFUNCTION roman_ONE - roman_HOP end_OPFUNCTION ( over^ start_ARG caligraphic_G end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) } ∪ { STOP }. Notably, 𝒜0={v|v𝒱p}superscript𝒜0conditional-set𝑣𝑣subscript𝒱𝑝\mathcal{A}^{0}=\{v|v\in\mathcal{V}_{p}\}caligraphic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = { italic_v | italic_v ∈ caligraphic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT }.

Refer to caption
Figure 2: a. Policy network architecture in the search in the product molecular graph. b. Policy network architecture in the search in the leaving group hypergraph.

Reward Function: At intermediate step t𝑡titalic_t, reward function \mathcal{R}caligraphic_R gives stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a reward 0. At the terminal step T𝑇Titalic_T, if the predicted 𝒱^rcTsuperscriptsubscript^𝒱𝑟𝑐𝑇\hat{\mathcal{V}}_{rc}^{T}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is identical to the ground-truth 𝒱rcsubscript𝒱𝑟𝑐\mathcal{V}_{rc}caligraphic_V start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT, \mathcal{R}caligraphic_R gives stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a reward 1. Otherwise, it gives stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a reward of 0.

Policy Net (Fig. 2a): We use the EGAT [31] to obtain node-level embedding. Each atom u𝑢uitalic_u has an initial feature vector 𝐱usubscript𝐱𝑢\mathbf{x}_{u}bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. Each bond (u,v)𝑢𝑣(u,v)( italic_u , italic_v ) has a feature vector 𝐱uvsubscript𝐱𝑢𝑣\mathbf{x}_{uv}bold_x start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT. The specifics regarding the initial features can be located in Table II and Table III. For ease of reference, we symbolize the encoding process as EGAT()EGAT\mathrm{EGAT}(\cdot)roman_EGAT ( ⋅ ) and elaborate on the architectural specifics in Section III-A. The EGAT computes atom representations {𝐜utu𝒱p}conditional-setsuperscriptsubscript𝐜𝑢𝑡𝑢subscript𝒱𝑝\left\{\mathbf{c}_{u}^{t}\mid u\in\mathcal{V}_{p}\right\}{ bold_c start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∣ italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } via

{𝐜ut}=EGAT(𝒢p,{𝐱u+𝕀[u𝒱^rct]𝐱explored},{𝐱uv}v𝒩(u)).superscriptsubscript𝐜𝑢𝑡EGATsubscript𝒢𝑝subscript𝐱𝑢𝕀delimited-[]𝑢superscriptsubscript^𝒱𝑟𝑐𝑡subscript𝐱𝑒𝑥𝑝𝑙𝑜𝑟𝑒𝑑subscriptsubscript𝐱𝑢𝑣𝑣𝒩𝑢\left\{\mathbf{c}_{u}^{t}\right\}=\operatorname{EGAT}\left(\mathcal{G}_{p},% \left\{\mathbf{x}_{u}+\mathbb{I}[u\in\hat{\mathcal{V}}_{rc}^{t}]\mathbf{x}_{% explored}\right\},\left\{\mathbf{x}_{uv}\right\}_{v\in\mathcal{N}(u)}\right).{ bold_c start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } = roman_EGAT ( caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , { bold_x start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + blackboard_I [ italic_u ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] bold_x start_POSTSUBSCRIPT italic_e italic_x italic_p italic_l italic_o italic_r italic_e italic_d end_POSTSUBSCRIPT } , { bold_x start_POSTSUBSCRIPT italic_u italic_v end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_v ∈ caligraphic_N ( italic_u ) end_POSTSUBSCRIPT ) .

(5)

In order to incorporate information about the explored node set 𝒱^rctsuperscriptsubscript^𝒱𝑟𝑐𝑡\hat{\mathcal{V}}_{rc}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT at step t𝑡titalic_t, we add a learnable embedding 𝐱exploredsubscript𝐱𝑒𝑥𝑝𝑙𝑜𝑟𝑒𝑑\mathbf{x}_{explored}bold_x start_POSTSUBSCRIPT italic_e italic_x italic_p italic_l italic_o italic_r italic_e italic_d end_POSTSUBSCRIPT to the initial feature of the explored node. 𝕀[u𝒱^rct]{0,1}maps-to𝕀delimited-[]𝑢superscriptsubscript^𝒱𝑟𝑐𝑡01\mathbb{I}[u\in\hat{\mathcal{V}}_{rc}^{t}]\mapsto\{0,1\}blackboard_I [ italic_u ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ↦ { 0 , 1 } be the predicate that indicates whether atom u𝑢uitalic_u is the element of 𝒱^rctsuperscriptsubscript^𝒱𝑟𝑐𝑡\hat{\mathcal{V}}_{rc}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_r italic_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Here, 𝒩(u)𝒩𝑢\mathcal{N}(u)caligraphic_N ( italic_u ) refers to the neighboring atoms of atom u𝑢uitalic_u. The graph representation 𝐜𝒢tsuperscriptsubscript𝐜𝒢𝑡\mathbf{c}_{\mathcal{G}}^{t}bold_c start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is formed by aggregating atom representations, i.e. 𝐜𝒢t=u𝒱p𝐜utsuperscriptsubscript𝐜𝒢𝑡subscript𝑢subscript𝒱𝑝superscriptsubscript𝐜𝑢𝑡\mathbf{c}_{\mathcal{G}}^{t}=\sum_{u\in\mathcal{V}_{p}}\mathbf{c}_{u}^{t}bold_c start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_c start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Finally, we calculate the logits sut1,sst1formulae-sequencesuperscriptsubscript𝑠𝑢𝑡superscript1superscriptsubscript𝑠𝑠𝑡superscript1s_{u}^{t}\in\mathbb{R}^{1},s_{s}^{t}\in\mathbb{R}^{1}italic_s start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT for selecting one atom u𝑢uitalic_u and STOP action at each step t𝑡titalic_t through the fully connected layers:

sutsuperscriptsubscript𝑠𝑢𝑡\displaystyle s_{u}^{t}italic_s start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝑾b(σ(𝑾acut+𝑩a))𝕀[u𝒜t],absentsubscript𝑾𝑏𝜎subscript𝑾𝑎superscriptsubscriptc𝑢𝑡subscript𝑩𝑎𝕀delimited-[]𝑢superscript𝒜𝑡\displaystyle=\boldsymbol{W}_{b}\left(\sigma\left(\boldsymbol{W}_{a}\mathrm{c}% _{u}^{t}+\boldsymbol{B}_{a}\right)\right)-\mathbb{I}\left[u\notin\mathcal{A}^{% t}\right]\infty,= bold_italic_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_σ ( bold_italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT roman_c start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_italic_B start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ) - blackboard_I [ italic_u ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ∞ , (6)
sstsuperscriptsubscript𝑠𝑠𝑡\displaystyle s_{s}^{t}italic_s start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝑾b(σ(𝑾ac𝒢t+𝑩a))𝕀[STOP𝒜t],absentsubscript𝑾𝑏𝜎subscript𝑾𝑎superscriptsubscriptc𝒢𝑡subscript𝑩𝑎𝕀delimited-[]STOPsuperscript𝒜𝑡\displaystyle=\boldsymbol{W}_{b}\left(\sigma\left(\boldsymbol{W}_{a}\mathrm{c}% _{\mathcal{G}}^{t}+\boldsymbol{B}_{a}\right)\right)-\mathbb{I}\left[\mathrm{% STOP}\notin\mathcal{A}^{t}\right]\infty,= bold_italic_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_σ ( bold_italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT roman_c start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_italic_B start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ) - blackboard_I [ roman_STOP ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ∞ ,

where 𝑾asubscript𝑾𝑎\boldsymbol{W}_{a}bold_italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT and 𝑾bsubscript𝑾𝑏\boldsymbol{W}_{b}bold_italic_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT are the weights and 𝑩asubscript𝑩𝑎\boldsymbol{B}_{a}bold_italic_B start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the bias. 𝕀[u𝒜t](𝕀[STOP𝒜t]){0,1}maps-to𝕀delimited-[]𝑢superscript𝒜𝑡𝕀delimited-[]STOPsuperscript𝒜𝑡01\mathbb{I}[u\notin\mathcal{A}^{t}]\,(\mathbb{I}[\text{STOP}\notin\mathcal{A}^{% t}])\mapsto\{0,1\}blackboard_I [ italic_u ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ( blackboard_I [ STOP ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ) ↦ { 0 , 1 } be the predicate that indicates whether atom u𝑢uitalic_u (STOP) is not in the action space 𝒜tsuperscript𝒜𝑡\mathcal{A}^{t}caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT at step t𝑡titalic_t. We add the -\infty- ∞ to the logit of the action not appearing in the action space 𝒜tsuperscript𝒜𝑡\mathcal{A}^{t}caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT at step t𝑡titalic_t. It ensures that the probability of an action not appearing in the action space equals 0 after SOFTMAXSOFTMAX\mathrm{SOFTMAX}roman_SOFTMAX operation.

TABLE II: Atom Features used in EGAT. All features are one-hot encoding.
Feature Description Size
Atom type Type of an atom by atomic number. 100
Total degree Degree of an atom including Hs. 6
Explicit valence Explicit valence of an atom. 6
Implicit valence Explicit valence of an atom. 6
Hybridization sp, sp2, sp3, sp3d, or sp3d2. 5
# Hs Number of bonded Hydrogen atom. 5
Formal charge Integer electronic charge assigned to atom. 5
Aromaticity Whether atom is part of aromatic system. 1
In ring Whether an atom is in ring 1
TABLE III: Bond features used in EGAT. All features are one-hot encoding.
Feature Description Size
Bond type Single, double, triple, or aromatic. 4
Conjugation Whether the bond is conjugated. 1
In ring Whether the bond is part of a ring. 1
Stereo None, any, E/Z or cis/trans. 6
Direction The direction of the bond. 3

III-C Search in the Leaving Group Hypergrah

In this paper, we use RDChiral [22] to extract the super general leaving groups from the training set and obtain unique leaving group patterns. Since the unique leaving group patterns derived from the training dataset can cover 99.7% of the test set [17], it is feasible to predict leaving group patterns from the training leaving group corpus. Implicit dependencies exist among various leaving groups. Naturally, we build a hypergraph to model these dependencies. Each node denotes one unique leaving group pattern, while a hyperedge connects the corresponding leaving groups that co-occur within a template. A leaving group hypergraph 𝒢hg=(𝒱hg,hg)subscript𝒢𝑔subscript𝒱𝑔subscript𝑔\mathcal{G}_{hg}=(\mathcal{V}_{hg},\mathcal{E}_{hg})caligraphic_G start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT = ( caligraphic_V start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT , caligraphic_E start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT ) is represented as a set of |𝒱hg|subscript𝒱𝑔|\mathcal{V}_{hg}|| caligraphic_V start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT | nodes (unique leaving group) and a set of |hg|subscript𝑔|\mathcal{E}_{hg}|| caligraphic_E start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT | hyperedges (co-occurrence). Given the product molecular graph 𝒢psubscript𝒢𝑝\mathcal{G}_{p}caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, leaving group completion aims to identify the corresponding leaving groups, i.e. the sub-node set 𝒱lgsubscript𝒱𝑙𝑔\mathcal{V}_{lg}caligraphic_V start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT of 𝒱hgsubscript𝒱𝑔\mathcal{V}_{hg}caligraphic_V start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT. Thus, we regard the leaving group completion as the search in the leaving group hypergraph. RetroSiG considers choosing one node in the leaving group hypergraph and adding it to the explored node set as an action at each step. In the end, we transform the identified nodes into the SMARTS of leaving groups.

State Space: Similar to search in the product molecular graph, each state stSsubscript𝑠𝑡𝑆s_{t}\in Sitalic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ italic_S is represented as st={𝒢p,𝒢hg,𝒱^lgt}subscript𝑠𝑡subscript𝒢𝑝subscript𝒢𝑔superscriptsubscript^𝒱𝑙𝑔𝑡s_{t}=\{\mathcal{G}_{p},\mathcal{G}_{hg},\hat{\mathcal{V}}_{lg}^{t}\}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , caligraphic_G start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT , over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT }, where 𝒱^lgtsuperscriptsubscript^𝒱𝑙𝑔𝑡\hat{\mathcal{V}}_{lg}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT denotes the explored node set at step t𝑡titalic_t. Notably, 𝒱^lgt=0=superscriptsubscript^𝒱𝑙𝑔𝑡0\hat{\mathcal{V}}_{lg}^{t=0}=\emptysetover^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t = 0 end_POSTSUPERSCRIPT = ∅.

Action Space: Here, RetroSiG also has two types of actions in its action space 𝒜𝒜\mathcal{A}caligraphic_A: (1) selecting one node v𝑣vitalic_v in the leaving group hypergraph 𝒢hgsubscript𝒢𝑔\mathcal{G}_{hg}caligraphic_G start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT, and (2) stop action, (i.e., doing nothing), denoted as STOP. In the evaluation set, 98.9% of the leaving groups that co-occur within a template are contained within the hgsubscript𝑔\mathcal{E}_{hg}caligraphic_E start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT. It implies that, during inference, the prediction is likely to be incorrect if the chosen leaving group originates from different hyperedges. Thus, we impose a one-hop constraint on the first type of action. This constraint serves to reduce the search space and enhance overall performance. In other words, the action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is to select one node v𝑣vitalic_v from the first-order neighbour of the current explored node set 𝒱^lgtsuperscriptsubscript^𝒱𝑙𝑔𝑡\hat{\mathcal{V}}_{lg}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT or a STOP action. At step t𝑡titalic_t, action space is represented as 𝒜t>0={v|vONEHOP(𝒱^lgt}{STOP}\mathcal{A}^{t>0}=\{v|v\in\operatorname{ONE-HOP}(\hat{\mathcal{V}}_{lg}^{t}\}% \cup\{\text{STOP}\}caligraphic_A start_POSTSUPERSCRIPT italic_t > 0 end_POSTSUPERSCRIPT = { italic_v | italic_v ∈ start_OPFUNCTION roman_ONE - roman_HOP end_OPFUNCTION ( over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } ∪ { STOP }. Notably, 𝒜0={v|v𝒱hg}superscript𝒜0conditional-set𝑣𝑣subscript𝒱𝑔\mathcal{A}^{0}=\{v|v\in\mathcal{V}_{hg}\}caligraphic_A start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT = { italic_v | italic_v ∈ caligraphic_V start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT }.

Reward Function: At intermediate step t𝑡titalic_t, reward function \mathcal{R}caligraphic_R gives stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a reward 0. At the terminal step T𝑇Titalic_T, if the predicted 𝒱^lgTsuperscriptsubscript^𝒱𝑙𝑔𝑇\hat{\mathcal{V}}_{lg}^{T}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT is identical to the ground-truth 𝒱lgsubscript𝒱𝑙𝑔\mathcal{V}_{lg}caligraphic_V start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT, \mathcal{R}caligraphic_R gives stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a reward 1. Otherwise, it gives stsubscript𝑠𝑡s_{t}italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT a reward of 0.

Policy Net (Fig. 2b): We use the HGNN [32] to obtain node-level embedding. We assign each atom u𝑢uitalic_u in the hypergraph a randomly initialized learnable embedding 𝐟usubscript𝐟𝑢\mathbf{f}_{u}bold_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT. To simplify, we represent the encoding process as HGNN()HGNN\mathrm{HGNN}(\cdot)roman_HGNN ( ⋅ ) and elaborate on the architectural specifics in Section III-A. The HGNN computes atom representations {𝐡utu𝒱hg}conditional-setsuperscriptsubscript𝐡𝑢𝑡𝑢subscript𝒱𝑔\left\{\mathbf{h}_{u}^{t}\mid u\in\mathcal{V}_{hg}\right\}{ bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∣ italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT } via

{𝐡ut}=HGNN(𝒢hg,{𝐟u+𝕀[u𝒱^lgt]𝐟explored}).superscriptsubscript𝐡𝑢𝑡HGNNsubscript𝒢𝑔subscript𝐟𝑢𝕀delimited-[]𝑢superscriptsubscript^𝒱𝑙𝑔𝑡subscript𝐟𝑒𝑥𝑝𝑙𝑜𝑟𝑒𝑑\left\{\mathbf{h}_{u}^{t}\right\}=\operatorname{HGNN}\left(\mathcal{G}_{hg},% \left\{\mathbf{f}_{u}+\mathbb{I}[u\in\hat{\mathcal{V}}_{lg}^{t}]\mathbf{f}_{% explored}\right\}\right).{ bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT } = roman_HGNN ( caligraphic_G start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT , { bold_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + blackboard_I [ italic_u ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] bold_f start_POSTSUBSCRIPT italic_e italic_x italic_p italic_l italic_o italic_r italic_e italic_d end_POSTSUBSCRIPT } ) . (7)

In order to incorporate information about the explored node set 𝒱^lgtsuperscriptsubscript^𝒱𝑙𝑔𝑡\hat{\mathcal{V}}_{lg}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT at step t𝑡titalic_t, we add a learnable embedding 𝐟exploredsubscript𝐟𝑒𝑥𝑝𝑙𝑜𝑟𝑒𝑑\mathbf{f}_{explored}bold_f start_POSTSUBSCRIPT italic_e italic_x italic_p italic_l italic_o italic_r italic_e italic_d end_POSTSUBSCRIPT to the initial embedding of the explored node. 𝕀[u𝒱^lgt]{0,1}maps-to𝕀delimited-[]𝑢superscriptsubscript^𝒱𝑙𝑔𝑡01\mathbb{I}[u\in\hat{\mathcal{V}}_{lg}^{t}]\mapsto\{0,1\}blackboard_I [ italic_u ∈ over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ↦ { 0 , 1 } be the predicate that indicates whether atom u𝑢uitalic_u is the element of 𝒱^lgtsuperscriptsubscript^𝒱𝑙𝑔𝑡\hat{\mathcal{V}}_{lg}^{t}over^ start_ARG caligraphic_V end_ARG start_POSTSUBSCRIPT italic_l italic_g end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. The graph representation 𝐡𝒢tsuperscriptsubscript𝐡𝒢𝑡\mathbf{h}_{\mathcal{G}}^{t}bold_h start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is an aggregation of atom representations, i.e. 𝐡𝒢t=u𝒱hg𝐡utsuperscriptsubscript𝐡𝒢𝑡subscript𝑢subscript𝒱𝑔superscriptsubscript𝐡𝑢𝑡\mathbf{h}_{\mathcal{G}}^{t}=\sum_{u\in\mathcal{V}_{hg}}\mathbf{h}_{u}^{t}bold_h start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ∑ start_POSTSUBSCRIPT italic_u ∈ caligraphic_V start_POSTSUBSCRIPT italic_h italic_g end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. We also use the EGAT [31] to obtain graph-level embedding 𝐜𝒢subscript𝐜𝒢\mathbf{c}_{\mathcal{G}}bold_c start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT of the product molecular graph 𝒢psubscript𝒢𝑝\mathcal{G}_{p}caligraphic_G start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT. Finally, we calculate the logits sut1,sst1formulae-sequencesuperscriptsubscript𝑠𝑢𝑡superscript1superscriptsubscript𝑠𝑠𝑡superscript1s_{u}^{t}\in\mathbb{R}^{1},s_{s}^{t}\in\mathbb{R}^{1}italic_s start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_s start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT for selecting one atom u𝑢uitalic_u and STOP action at each step t𝑡titalic_t through the fully connected layers:

sutsuperscriptsubscript𝑠𝑢𝑡\displaystyle s_{u}^{t}italic_s start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝑾b(σ(𝑾a(𝐡ut+𝐜𝒢)+𝑩a))𝕀[u𝒜t],absentsubscript𝑾𝑏𝜎subscript𝑾𝑎superscriptsubscript𝐡𝑢𝑡subscript𝐜𝒢subscript𝑩𝑎𝕀delimited-[]𝑢superscript𝒜𝑡\displaystyle=\boldsymbol{W}_{b}\left(\sigma\left(\boldsymbol{W}_{a}\left(% \mathbf{h}_{u}^{t}+\mathbf{c}_{\mathcal{G}}\right)+\boldsymbol{B}_{a}\right)% \right)-\mathbb{I}\left[u\notin\mathcal{A}^{t}\right]\infty,= bold_italic_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_σ ( bold_italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_c start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ) + bold_italic_B start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ) - blackboard_I [ italic_u ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ∞ , (8)
sstsuperscriptsubscript𝑠𝑠𝑡\displaystyle s_{s}^{t}italic_s start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT =𝑾b(σ(𝑾a(𝐡𝒢t+𝐜𝒢)+𝑩a))𝕀[STOP𝒜t],absentsubscript𝑾𝑏𝜎subscript𝑾𝑎superscriptsubscript𝐡𝒢𝑡subscript𝐜𝒢subscript𝑩𝑎𝕀delimited-[]STOPsuperscript𝒜𝑡\displaystyle=\boldsymbol{W}_{b}\left(\sigma\left(\boldsymbol{W}_{a}\left(% \mathbf{h}_{\mathcal{G}}^{t}+\mathbf{c}_{\mathcal{G}}\right)+\boldsymbol{B}_{a% }\right)\right)-\mathbb{I}\left[\mathrm{STOP}\notin\mathcal{A}^{t}\right]\infty,= bold_italic_W start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT ( italic_σ ( bold_italic_W start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ( bold_h start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT + bold_c start_POSTSUBSCRIPT caligraphic_G end_POSTSUBSCRIPT ) + bold_italic_B start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ) ) - blackboard_I [ roman_STOP ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ∞ ,

where 𝕀[u𝒜t](𝕀[STOP𝒜t]){0,1}maps-to𝕀delimited-[]𝑢superscript𝒜𝑡𝕀delimited-[]STOPsuperscript𝒜𝑡01\mathbb{I}[u\notin\mathcal{A}^{t}]\,(\mathbb{I}[\text{STOP}\notin\mathcal{A}^{% t}])\mapsto\{0,1\}blackboard_I [ italic_u ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ( blackboard_I [ STOP ∉ caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ] ) ↦ { 0 , 1 } be the predicate that indicates whether atom u𝑢uitalic_u (STOP) is not in the action space 𝒜tsuperscript𝒜𝑡\mathcal{A}^{t}caligraphic_A start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT at step t𝑡titalic_t. The -\infty- ∞ operation ensures that the probability of an action not appearing in the action space equals 0 after SOFTMAXSOFTMAX\mathrm{SOFTMAX}roman_SOFTMAX operation.

III-D Training and Inference

The goal of policy network aims to maximize the expected reward, enhance search quality across episodes and it is updated using Proximal Policy Optimization (PPO) [33]:

LCLIP(θ)=E^t[min(rt(θ)A^t,clip(rt(θ),1ϵ,1+ϵ)A^t)].superscript𝐿𝐶𝐿𝐼𝑃𝜃subscript^𝐸𝑡delimited-[]subscript𝑟𝑡𝜃subscript^𝐴𝑡clipsubscript𝑟𝑡𝜃1italic-ϵ1italic-ϵsubscript^𝐴𝑡L^{CLIP}(\theta)=\hat{E}_{t}[\min(r_{t}(\theta)\hat{A}_{t},\operatorname{clip}% \left(r_{t}(\theta),1-\epsilon,1+\epsilon\right)\hat{A}_{t})].italic_L start_POSTSUPERSCRIPT italic_C italic_L italic_I italic_P end_POSTSUPERSCRIPT ( italic_θ ) = over^ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT [ roman_min ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , roman_clip ( italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) , 1 - italic_ϵ , 1 + italic_ϵ ) over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] . (9)

Here, E^tsubscript^𝐸𝑡\hat{E}_{t}over^ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents the expected value at timestep t, rt(θ)subscript𝑟𝑡𝜃r_{t}(\theta)italic_r start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_θ ) is the ratio of the new policy and the old policy, and A^tsubscript^𝐴𝑡\hat{A}_{t}over^ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the estimated advantage at timestep t.

Inference is executed via beam search [34] employing a log-likelihood scoring function. For a beam size k𝑘kitalic_k, we can obtain two sets of k𝑘kitalic_k candidate solutions from two stages. Finally, we have k2superscript𝑘2k^{2}italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT candidate solutions. From the k2superscript𝑘2k^{2}italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT possibilities, we select k𝑘kitalic_k candidate solutions with the highest cumulative log-likelihood score as the top-k prediction results.

IV Evaluation

IV-A Data

We utilize the well-established retrosynthesis benchmark dataset, USPTO-50K [35], as the basis for evaluating our approach. This dataset comprises 50,016 reactions with accurate atom-mapping, categorized into ten distinct reaction types. Our methodology aligns with the data partition detailed in [3], wherein we allocate 40,000 reactions for training, 5,000 for validation, and 5,000 for testing. To mitigate potential information leakage inherent in the USPTO-50K dataset, as highlighted in previous studies [17, 15], we apply additional steps. This involves canonicalizing the product SMILES and reassigning the atom-mapping to the reactant atoms, following the methodology outlined in [17].

IV-B Evaluation Metric

In line with previous studies, we adopt top-k exact match accuracy as our evaluation metric. In our experiments, we utilized different values of k—1, 3, 5, and 10—for comparative analysis. This metric assesses whether the predicted set of reactants precisely matches the ground truth reactants.

IV-C Baseline

We contrast the predictive outcomes of RetroSiG against various template-based, template-free, and semi-template-based methodologies. Notably, we introduce semi-template-based methods including G2G [16], RetroXpert [15], RetroPrime [18], MEGAN [20], GraphRetro [17], and Graph2Edits [21] as primary baselines due to their exceptional performance. We also considered template-based models like Retrosim [3], Neuralsym [5], GLN [4], LocalRetro [6], and template-free models such as SCROP [10], Augmented Transformer [11], GTA [12], Graph2SMILES [14], Dual-TF [36], Retroformer [13] as robust baseline models for comparison.

IV-D Implementation Details

[37], DeepHypergraph111https://github.com/iMoonLab/DeepHypergraph (DHG), and PyTorch [38]. In the EGAT framework, we configure four identical four-head attentive layers with a hidden dimension of 512, while the HGNN structure consists of two layers with a hidden dimension of 512. Embedding sizes across the model are uniformly set at 512. We use ELU(x)=α(exp(x)1)ELU𝑥𝛼𝑥1\operatorname{ELU}(x)=\alpha(\exp(x)-1)roman_ELU ( italic_x ) = italic_α ( roman_exp ( italic_x ) - 1 ) for x0𝑥0x\leq 0italic_x ≤ 0 and x𝑥xitalic_x for x>0𝑥0x>0italic_x > 0 as our activation function where α=1𝛼1\alpha=1italic_α = 1 as our activation function. Our experiments are executed on a machine equipped with an Intel Xeon 4114 CPU and a Nvidia Titan GPU. Training parameters include a learning rate of 0.0001, 10,000,000 iterations, and the utilization of the Adam optimizer [39]. At intervals of 1000 iterations, we store checkpoints to identify the best-performing ones based on evaluation criteria.

IV-E Performance

Main Results: Table IV displays the top-k exact match accuracy outcomes obtained from the USPTO-50k benchmark. When the reaction class is unknown, our method achieves a 54.9% top-1 accuracy which is competitive and only 0.2% lower than the SOTA (Graph2Edits) result of 55.1%. However, for k equals 3 and 5, RetroSiG outperforms the performance of Graph2Edits. RetroSiG also beats LocalRetro by a margin of 1.5% and 0.1% respectively in top-1 and top-3 accuracy. With the reaction class given, our method achieves a 66.5% top-1 accuracy which is only 0.6% lower than the SOTA (Graph2Edits). For k equals 3, 5 and 10, RetroSiG outperforms the performance of Graph2Edits. We attribute it to the solid prior knowledge carried by the one-hop mechanism.

TABLE IV: Top-k𝑘kitalic_k accuracy for retrosynthesis prediction on USPTO-50K.
Model Top-k accuracy (%)
Reaction class unknown Reaction class known
k=1 3 5 10 1 3 5 10
Template-Based Methods

Retrosim

37.3 54.7 63.3 74.1 52.9 73.8 81.2 88.1

Neuralsym

44.4 65.3 72.4 78.9 55.3 76.0 81.4 85.1

GLN

52.5 69.0 75.6 83.7 64.2 79.1 85.2 90.0

LocalRetro

53.4 77.5 85.9 92.4 63.9 86.8 92.4 96.3
Template-Free Methods

SCROP

43.7 60.0 65.2 68.7 59.0 74.8 78.1 81.1

Aug.Transformer

53.2 - 80.5 85.2 - - - -

GTA

51.1 67.6 74.8 81.6 - - - -

Graph2SMILES

52.9 66.5 70.0 72.9 - - - -

Dual-TF

53.6 70.7 74.6 77.0 65.7 81.9 84.7 85.9
Semi-Template-Based Methods

G2G

48.9 67.6 72.5 75.5 61.0 81.3 86.0 88.7

RetroXpert

50.4 61.1 62.3 63.4 62.1 75.8 78.5 80.9

RetroPrime

51.4 70.8 74.0 76.1 64.8 81.6 85.0 86.9

MEGAN

48.1 70.7 78.4 86.1 60.7 82.0 87.5 91.6

GraphRetro

53.7 68.3 72.2 75.5 63.9 81.5 85.2 88.1

Graph2Edits

55.1 77.3 83.4 89.4 67.1 87.5 91.5 93.8

RetroSiG

54.9 77.6 84.1 89.0 66.5 87.9 92.0 94.1
Refer to caption
Figure 3: Analysis of complex samples.

Complex Sample Results: We investigate the performance effect of some complex reactions in the USPTO-50k, including reactions with multiple reaction center and attaching the same leaving group to more than one atom. As is shown in Fig. 3a, Most reactions with 1, 2 atoms in the reaction center account for the majority, which have 1121 (22.3%) and 3604 (72.0%) cases respectively. However, the reactions with 3, 4, 5, 6, and \geq7 account for a small proportion, which have 184 (3.7%), 29 (0.6%), 24 (0.5%), 36 (0.7%) and 9 (0.2%) cases respectively. Multiple reaction center consists of three or more atoms, and 92.6% (261 / 282) of the samples with multiple reaction center are with attaching the same leaving group to more than one atom. Thus, we report the top-10 accuracy of the number of atoms 3, 4, 5, 6, and \geq7 in Fig. 3b to investigate the performance of complex reactions. We observe that our model’s performance remains stable even as the number of atoms increases. It demonstrates that our model can handle these complex reactions.

TABLE V: Ablation Study.
Hypergraph Structure One-Hop Constraint(rc) One-Hop Constraint(lg) Top-1(%)
1⃝ 54.9
2⃝ 54.6
3⃝ 53.5
4⃝ 53.4
5⃝ 52.8

Ablation Study: In Fig. V, Hypergraph-Structure ✔denotes that we use HGNN and hypergraph structure to encode each node (leaving group) embedding. One-Hop Constraint ✔denotes that we use a one-hop constraint. 1⃝ outperforms 2⃝ and 3⃝, showing that one-hop constraint is effective. 2⃝ outperforms 3⃝, demonstrating that one-hop constraint in reaction center identification contributes more to performance improvement. 4⃝ outperforms 5⃝, implying that hypergraph effectively models dependencies between leaving groups.

Refer to caption
Figure 4: Visualization of Example Predictions.

IV-F Visualization of Example Predictions

In Fig. 4, we illustrate the model predictions alongside the corresponding ground truth for three specific cases. Fig. 4a shows an example with attaching the same leaving group to more than one atom. Moreover, the model identifies both the reaction center and leaving groups correctly. Fig. 4b, the model identifies the reaction center correctly, while the predicted leaving groups are incorrect. We hypothesize this is because carboxyl (COOH) is more common in the training set than acyl chloride (COCl). However, the prediction results of top-1 are also feasible chemically. Fig. 4c, the model incorrectly identifies the reaction center and consequently the leaving group, resulting in top-1 prediction being inconsistent with ground truth. Top-1 prediction is an open-loop operation, which is not necessarily harmful from the perspective of multi-step retrosynthesis. Interestingly, the reaction of top-1 prediction is to attach more than one leaving group to one synthon. Previous semi-template-base methods [17] could not predict such complex samples.

V Discussion and Conclusion

Limitations: One common drawback among all semi-template-based and graph-edit methods is their heavy dependence on atom-mapping details between reactants and products. Incorrect mapping will extract the wrong template. It will provide incorrect reaction center labels and leaving group labels during model training. We look forward to the continuous improvement of the atom-mapping algorithm and the extraction/application template algorithm of Rdchiral [22] in the community, which is very beneficial to the RetroSiG framework.

Future Works: There are three directions for future work. End2End: We need to design an end-to-end architecture to connect the two search processes so that the signal at the final step can update both stages. Decision Transformer : Our current policy net consists of a simple (hyper) graph encoder and MLP; hence, we must design a more powerful policy net. The ability of Transformer [25] has recently been proven in various fields, especially in large language models [40]. Naturally, Decision Transformer [41] is a choice as a policy net. Termination Reward: Upon completion, the environment currently incentivizes the agent solely when its prediction aligns precisely with the ground truth. However, there exist various valid methods to synthesize a product. It is crucial to devise a more reasonable reward mechanism at termination step.

References

  • [1] E. J. Corey, “The logic of chemical synthesis: multistep synthesis of complex carbogenic molecules (nobel lecture),” Angewandte Chemie International Edition in English, vol. 30, no. 5, pp. 455–465, 1991.
  • [2] C. M. Gothard, S. Soh, N. A. Gothard, B. Kowalczyk, Y. Wei, B. Baytekin, and B. A. Grzybowski, “Rewiring chemistry: algorithmic discovery and experimental validation of one-pot reactions in the network of organic chemistry,” Angewandte Chemie, vol. 124, no. 32, pp. 8046–8051, 2012.
  • [3] C. W. Coley, L. Rogers, W. H. Green, and K. F. Jensen, “Computer-assisted retrosynthesis based on molecular similarity,” ACS central science, vol. 3, no. 12, pp. 1237–1245, 2017.
  • [4] H. Dai, C. Li, C. Coley, B. Dai, and L. Song, “Retrosynthesis prediction with conditional graph logic network,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  • [5] M. H. Segler and M. P. Waller, “Neural-symbolic machine learning for retrosynthesis and reaction prediction,” Chemistry–A European Journal, vol. 23, no. 25, pp. 5966–5971, 2017.
  • [6] S. Chen and Y. Jung, “Deep retrosynthetic reaction prediction using local reactivity and global attention,” JACS Au, vol. 1, no. 10, pp. 1612–1620, 2021.
  • [7] C. Yan, P. Zhao, C. Lu, Y. Yu, and J. Huang, “Retrocomposer: Composing templates for template-based retrosynthesis prediction,” Biomolecules, vol. 12, no. 9, p. 1325, 2022.
  • [8] G. Landrum, “Rdkit: Open-source cheminformatics. 2006,” Google Scholar, 2006.
  • [9] K. Lin, Y. Xu, J. Pei, and L. Lai, “Automatic retrosynthetic route planning using template-free models,” Chemical science, vol. 11, no. 12, pp. 3355–3364, 2020.
  • [10] S. Zheng, J. Rao, Z. Zhang, J. Xu, and Y. Yang, “Predicting retrosynthetic reactions using self-corrected transformer neural networks,” Journal of chemical information and modeling, vol. 60, no. 1, pp. 47–55, 2019.
  • [11] I. V. Tetko, P. Karpov, R. Van Deursen, and G. Godin, “State-of-the-art augmented nlp transformer models for direct and single-step retrosynthesis,” Nature communications, vol. 11, no. 1, p. 5575, 2020.
  • [12] S.-W. Seo, Y. Y. Song, J. Y. Yang, S. Bae, H. Lee, J. Shin, S. J. Hwang, and E. Yang, “Gta: Graph truncated attention for retrosynthesis,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 1, 2021, pp. 531–539.
  • [13] Y. Wan, C.-Y. Hsieh, B. Liao, and S. Zhang, “Retroformer: Pushing the limits of end-to-end retrosynthesis transformer,” in International Conference on Machine Learning.   PMLR, 2022, pp. 22 475–22 490.
  • [14] Z. Tu and C. W. Coley, “Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction,” Journal of chemical information and modeling, vol. 62, no. 15, pp. 3503–3513, 2022.
  • [15] C. Yan, Q. Ding, P. Zhao, S. Zheng, J. Yang, Y. Yu, and J. Huang, “Retroxpert: Decompose retrosynthesis prediction like a chemist,” Advances in Neural Information Processing Systems, vol. 33, pp. 11 248–11 258, 2020.
  • [16] C. Shi, M. Xu, H. Guo, M. Zhang, and J. Tang, “A graph to graphs framework for retrosynthesis prediction,” in International conference on machine learning.   PMLR, 2020, pp. 8818–8827.
  • [17] V. R. Somnath, C. Bunne, C. Coley, A. Krause, and R. Barzilay, “Learning graph models for retrosynthesis prediction,” Advances in Neural Information Processing Systems, vol. 34, pp. 9405–9415, 2021.
  • [18] X. Wang, Y. Li, J. Qiu, G. Chen, H. Liu, B. Liao, C.-Y. Hsieh, and X. Yao, “Retroprime: A diverse, plausible and transformer-based method for single-step retrosynthesis predictions,” Chemical Engineering Journal, vol. 420, p. 129845, 2021.
  • [19] Z. Chen, O. R. Ayinde, J. R. Fuchs, H. Sun, and X. Ning, “G2retro as a two-step graph generative models for retrosynthesis prediction,” Communications Chemistry, vol. 6, no. 1, p. 102, 2023.
  • [20] M. Sacha, M. Błaz, P. Byrski, P. Dabrowski-Tumanski, M. Chrominski, R. Loska, P. Włodarczyk-Pruszynski, and S. Jastrzebski, “Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits,” Journal of Chemical Information and Modeling, vol. 61, no. 7, pp. 3273–3284, 2021.
  • [21] W. Zhong, Z. Yang, and C. Y.-C. Chen, “Retrosynthesis prediction using an end-to-end graph generative architecture for molecular graph editing,” Nature Communications, vol. 14, no. 1, p. 3009, 2023.
  • [22] C. W. Coley, W. H. Green, and K. F. Jensen, “Rdchiral: An rdkit wrapper for handling stereochemistry in retrosynthetic template extraction and application,” Journal of chemical information and modeling, vol. 59, no. 6, pp. 2529–2537, 2019.
  • [23] J. Zhu, B. Hong, Z. Lan, and F. Ma, “Single-step retrosynthesis via reaction center and leaving groups prediction,” in 2023 16th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).   IEEE, 2023, pp. 1–5.
  • [24] E. Kim, D. Lee, Y. Kwon, M. S. Park, and Y.-S. Choi, “Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables,” Journal of Chemical Information and Modeling, vol. 61, no. 1, pp. 123–133, 2021.
  • [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [26] Z. Lan, Z. Zeng, B. Hong, Z. Liu, and F. Ma, “Rcsearcher: Reaction center identification in retrosynthesis via deep q-learning,” arXiv preprint arXiv:2301.12071, 2023.
  • [27] Z. Lan, L. Yu, L. Yuan, Z. Wu, Q. Niu, and F. Ma, “Sub-gmn: The subgraph matching network model,” arXiv preprint arXiv:2104.00186, 2021.
  • [28] Z. Lan, Y. Ma, L. Yu, L. Yuan, and F. Ma, “Aednet: Adaptive edge-deleting network for subgraph matching,” Pattern Recognition, vol. 133, p. 109033, 2023.
  • [29] Z. Lan, B. Hong, Y. Ma, and F. Ma, “More interpretable graph similarity computation via maximum common subgraph inference,” arXiv preprint arXiv:2208.04580, 2022.
  • [30] Z. Lan, L. Yu, L. Yuan, Z. Wu, Q. Niu, and F. Ma, “Sub-gmn: The neural subgraph matching network model,” in 2023 16th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI).   IEEE, 2023, pp. 1–7.
  • [31] K. Kamiński, J. Ludwiczak, M. Jasiński, A. Bukala, R. Madaj, K. Szczepaniak, and S. Dunin-Horkawicz, “Rossmann-toolbox: a deep learning-based protocol for the prediction and design of cofactor specificity in rossmann fold proteins,” Briefings in bioinformatics, vol. 23, no. 1, p. bbab371, 2022.
  • [32] Y. Feng, H. You, Z. Zhang, R. Ji, and Y. Gao, “Hypergraph neural networks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 3558–3565.
  • [33] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  • [34] Y. Ma, Z. Lan, L. Zong, and K. Huang, “Global-aware beam search for neural abstractive summarization,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 545–16 557, 2021.
  • [35] N. Schneider, N. Stiefl, and G. A. Landrum, “What’s what: The (nearly) definitive guide to reaction role assignment,” Journal of chemical information and modeling, vol. 56, no. 12, pp. 2336–2346, 2016.
  • [36] R. Sun, H. Dai, L. Li, S. Kearnes, and B. Dai, “Towards understanding retrosynthesis by energy-based models,” Advances in Neural Information Processing Systems, vol. 34, pp. 10 186–10 194, 2021.
  • [37] M. Wang, D. Zheng, Z. Ye, Q. Gan, M. Li, X. Song, J. Zhou, C. Ma, L. Yu, Y. Gai et al., “Deep graph library: A graph-centric, highly-performant package for graph neural networks,” arXiv preprint arXiv:1909.01315, 2019.
  • [38] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  • [39] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [40] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27 730–27 744, 2022.
  • [41] L. Chen, K. Lu, A. Rajeswaran, K. Lee, A. Grover, M. Laskin, P. Abbeel, A. Srinivas, and I. Mordatch, “Decision transformer: Reinforcement learning via sequence modeling,” Advances in neural information processing systems, vol. 34, pp. 15 084–15 097, 2021.