megnet.data.graph module

Abstract classes and utility operations for building graph representations and data loaders (known as Sequence objects in Keras).

Most users will not need to interact with this module.

class BaseGraphBatchGenerator(dataset_size: int, targets: numpy.ndarray, batch_size: int = 128, shuffle: bool = True)[source]

Bases: tensorflow.python.keras.utils.data_utils.Sequence

Base class for classes that generate batches of training data for MEGNet. Based on the Sequence class, which is the data loader equivalent for Keras.

Implementations of this base class must implement the _generate_inputs(), which generates the lists of graph descriptions for a batch.

The process_atom_features() function and related functions are used to modify the features for each atom, bond, and global features when creating a batch.

Parameters
  • dataset_size (int) – Number of entries in dataset

  • targets (ndarray) – Feature to be predicted for each network

  • batch_size (int) – Maximum batch size

  • shuffle (bool) – Whether to shuffle the data after each step

on_epoch_end()[source]

Method called at the end of every epoch.

process_atom_feature(x: numpy.ndarray) → numpy.ndarray[source]
process_bond_feature(x: numpy.ndarray) → numpy.ndarray[source]
process_state_feature(x: numpy.ndarray) → numpy.ndarray[source]
class Converter[source]

Bases: monty.json.MSONable

Base class for atom or bond converter

convert(d: Any) → Any[source]
class DummyConverter[source]

Bases: megnet.data.graph.Converter

Dummy converter as a placeholder

convert(d: Any) → Any[source]
class EmbeddingMap(feature_matrix: numpy.ndarray)[source]

Bases: megnet.data.graph.Converter

Convert an integer to a row vector in a feature matrix

Parameters

feature_matrix – (np.ndarray) A matrix of shape (N, M)

convert(int_array: numpy.ndarray) → numpy.ndarray[source]

convert atomic number to row vectors in the feature_matrix

Parameters

int_array – (1d array) number array of length L

Returns

(matrix) L*M matrix with N the length of d and M the length of centers

class GaussianDistance(centers: numpy.ndarray = array([0.0, 0.05050505, 0.1010101, 0.15151515, 0.2020202, 0.25252525, 0.3030303, 0.35353535, 0.4040404, 0.45454545, 0.50505051, 0.55555556, 0.60606061, 0.65656566, 0.70707071, 0.75757576, 0.80808081, 0.85858586, 0.90909091, 0.95959596, 1.01010101, 1.06060606, 1.11111111, 1.16161616, 1.21212121, 1.26262626, 1.31313131, 1.36363636, 1.41414141, 1.46464646, 1.51515152, 1.56565657, 1.61616162, 1.66666667, 1.71717172, 1.76767677, 1.81818182, 1.86868687, 1.91919192, 1.96969697, 2.02020202, 2.07070707, 2.12121212, 2.17171717, 2.22222222, 2.27272727, 2.32323232, 2.37373737, 2.42424242, 2.47474747, 2.52525253, 2.57575758, 2.62626263, 2.67676768, 2.72727273, 2.77777778, 2.82828283, 2.87878788, 2.92929293, 2.97979798, 3.03030303, 3.08080808, 3.13131313, 3.18181818, 3.23232323, 3.28282828, 3.33333333, 3.38383838, 3.43434343, 3.48484848, 3.53535354, 3.58585859, 3.63636364, 3.68686869, 3.73737374, 3.78787879, 3.83838384, 3.88888889, 3.93939394, 3.98989899, 4.04040404, 4.09090909, 4.14141414, 4.19191919, 4.24242424, 4.29292929, 4.34343434, 4.39393939, 4.44444444, 4.49494949, 4.54545455, 4.5959596, 4.64646465, 4.6969697, 4.74747475, 4.7979798, 4.84848485, 4.8989899, 4.94949495, 5.0]), width=0.5)[source]

Bases: megnet.data.graph.Converter

Expand distance with Gaussian basis sit at centers and with width 0.5.

Parameters
  • centers – (np.array)

  • width – (float)

convert(d: numpy.ndarray) → numpy.ndarray[source]

expand distance vector d with given parameters

Parameters

d – (1d array) distance array

Returns

(matrix) N*M matrix with N the length of d and M the length of centers

class GraphBatchDistanceConvert(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: numpy.ndarray = None, batch_size: int = 128, is_shuffle: bool = True, distance_converter: megnet.data.graph.Converter = None)[source]

Bases: megnet.data.graph.GraphBatchGenerator

Generate batch of structures with bond distance being expanded using a Expansor

Parameters
  • atom_features – (list of np.array) list of atom feature matrix,

  • bond_features – (list of np.array) list of bond features matrix

  • state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension

  • index1_list – (list of integer) list of (M, ) one side atomic index of the bond, M is different for different structures

  • index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the correponding index1.

  • targets – (numpy array), N*1, where N is the number of structures

  • batch_size – (int) number of samples in a batch

  • is_shuffle – (bool) whether to shuffle the structure, default to True

  • distance_converter – (bool) converter for processing the distances

Args: dataset_size (int): Number of entries in dataset targets (ndarray): Feature to be predicted for each network batch_size (int): Maximum batch size shuffle (bool): Whether to shuffle the data after each step

process_bond_feature(x) → numpy.ndarray[source]
class GraphBatchGenerator(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: numpy.ndarray = None, batch_size: int = 128, is_shuffle: bool = True)[source]

Bases: megnet.data.graph.BaseGraphBatchGenerator

A generator class that assembles several structures (indicated by batch_size) and form (x, y) pairs for model training.

Parameters
  • atom_features – (list of np.array) list of atom feature matrix,

  • bond_features – (list of np.array) list of bond features matrix

  • state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension

  • index1_list – (list of integer) list of (M, ) one side atomic index of the bond,

  • is different for different structures (M) –

  • index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the corresponding index1.

  • targets – (numpy array), N*1, where N is the number of structures

  • batch_size – (int) number of samples in a batch

Args: dataset_size (int): Number of entries in dataset targets (ndarray): Feature to be predicted for each network batch_size (int): Maximum batch size shuffle (bool): Whether to shuffle the data after each step

class StructureGraph(nn_strategy: Union[str, pymatgen.analysis.local_env.NearNeighbors] = None, atom_converter: megnet.data.graph.Converter = None, bond_converter: megnet.data.graph.Converter = None, **kwargs)[source]

Bases: monty.json.MSONable

This is a base class for converting converting structure into graphs or model inputs Methods to be implemented are follows:

  1. convert(self, structure)

    This is to convert a structure into a graph dictionary

  2. get_input(self, structure)

    This method convert a structure directly to a model input

  3. get_flat_data(self, graphs, targets)

    This method process graphs and targets pairs and output model input list.

as_dict() → Dict[source]

A JSON serializable dict representation of an object.

convert(structure: pymatgen.core.structure.Structure, state_attributes: List = None) → Dict[source]

Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]]

Parameters
  • state_attributes – (list) state attributes

  • structure – (pymatgen structure)

  • (dictionary)

classmethod from_dict(d: Dict)megnet.data.graph.StructureGraph[source]
Parameters

d – Dict representation.

Returns

MSONable class.

static get_atom_features(structure) → List[int][source]

Get atom features from structure, may be overwritten

Parameters

structure – (Pymatgen.Structure) pymatgen structure

Returns

List of atomic numbers

get_flat_data(graphs: List[Dict], targets: List = None) → tuple[source]

Expand the graph dictionary to form a list of features and targets tensors. This is useful when the model is trained on assembled graphs on the fly.

Parameters
  • graphs – (list of dictionary) list of graph dictionary for each structure

  • targets – (list of float or list) Optional: corresponding target values for each structure

Returns

tuple(node_features, edges_features, global_values, index1, index2, targets)

get_input(structure: pymatgen.core.structure.Structure) → List[numpy.ndarray][source]

Turns a structure into model input

graph_to_input(graph: Dict) → List[numpy.ndarray][source]

Turns a graph into model input

Parameters

(dict) – Dictionary description of the graph

Returns

Inputs in the form needed by MEGNet

Return type

([np.ndarray])

class StructureGraphFixedRadius(nn_strategy: Union[str, pymatgen.analysis.local_env.NearNeighbors] = None, atom_converter: megnet.data.graph.Converter = None, bond_converter: megnet.data.graph.Converter = None, **kwargs)[source]

Bases: megnet.data.graph.StructureGraph

This one uses a short cut to call find_points_in_spheres cython function in pymatgen. It is orders of magnitude faster than previous implementations

convert(structure: pymatgen.core.structure.Structure, state_attributes: List = None) → Dict[source]

Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]]

Parameters
  • state_attributes – (list) state attributes

  • structure – (pymatgen structure)

  • (dictionary)

classmethod from_structure_graph(structure_graph: megnet.data.graph.StructureGraph)megnet.data.graph.StructureGraphFixedRadius[source]
itemgetter_list(l, indices: List) → tuple[source]

Get indices of l and return a tuple

Parameters
  • l – (list)

  • indices – (list) indices

Returns

(tuple)