megnet.data.graph module¶
Abstract classes and utility operations for building graph representations and data loaders (known as Sequence objects in Keras).
Most users will not need to interact with this module.
-
class
BaseGraphBatchGenerator
(dataset_size: int, targets: numpy.ndarray, batch_size: int = 128, shuffle: bool = True)[source]¶ Bases:
tensorflow.python.keras.utils.data_utils.Sequence
Base class for classes that generate batches of training data for MEGNet. Based on the Sequence class, which is the data loader equivalent for Keras.
Implementations of this base class must implement the
_generate_inputs()
, which generates the lists of graph descriptions for a batch.The
process_atom_features()
function and related functions are used to modify the features for each atom, bond, and global features when creating a batch.- Parameters
dataset_size (int) – Number of entries in dataset
targets (ndarray) – Feature to be predicted for each network
batch_size (int) – Maximum batch size
shuffle (bool) – Whether to shuffle the data after each step
-
class
DummyConverter
[source]¶ Bases:
megnet.data.graph.Converter
Dummy converter as a placeholder
-
class
EmbeddingMap
(feature_matrix: numpy.ndarray)[source]¶ Bases:
megnet.data.graph.Converter
Convert an integer to a row vector in a feature matrix
- Parameters
feature_matrix – (np.ndarray) A matrix of shape (N, M)
-
class
GaussianDistance
(centers: numpy.ndarray = array([0.0, 0.05050505, 0.1010101, 0.15151515, 0.2020202, 0.25252525, 0.3030303, 0.35353535, 0.4040404, 0.45454545, 0.50505051, 0.55555556, 0.60606061, 0.65656566, 0.70707071, 0.75757576, 0.80808081, 0.85858586, 0.90909091, 0.95959596, 1.01010101, 1.06060606, 1.11111111, 1.16161616, 1.21212121, 1.26262626, 1.31313131, 1.36363636, 1.41414141, 1.46464646, 1.51515152, 1.56565657, 1.61616162, 1.66666667, 1.71717172, 1.76767677, 1.81818182, 1.86868687, 1.91919192, 1.96969697, 2.02020202, 2.07070707, 2.12121212, 2.17171717, 2.22222222, 2.27272727, 2.32323232, 2.37373737, 2.42424242, 2.47474747, 2.52525253, 2.57575758, 2.62626263, 2.67676768, 2.72727273, 2.77777778, 2.82828283, 2.87878788, 2.92929293, 2.97979798, 3.03030303, 3.08080808, 3.13131313, 3.18181818, 3.23232323, 3.28282828, 3.33333333, 3.38383838, 3.43434343, 3.48484848, 3.53535354, 3.58585859, 3.63636364, 3.68686869, 3.73737374, 3.78787879, 3.83838384, 3.88888889, 3.93939394, 3.98989899, 4.04040404, 4.09090909, 4.14141414, 4.19191919, 4.24242424, 4.29292929, 4.34343434, 4.39393939, 4.44444444, 4.49494949, 4.54545455, 4.5959596, 4.64646465, 4.6969697, 4.74747475, 4.7979798, 4.84848485, 4.8989899, 4.94949495, 5.0]), width=0.5)[source]¶ Bases:
megnet.data.graph.Converter
Expand distance with Gaussian basis sit at centers and with width 0.5.
- Parameters
centers – (np.array)
width – (float)
-
class
GraphBatchDistanceConvert
(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: numpy.ndarray = None, batch_size: int = 128, is_shuffle: bool = True, distance_converter: megnet.data.graph.Converter = None)[source]¶ Bases:
megnet.data.graph.GraphBatchGenerator
Generate batch of structures with bond distance being expanded using a Expansor
- Parameters
atom_features – (list of np.array) list of atom feature matrix,
bond_features – (list of np.array) list of bond features matrix
state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension
index1_list – (list of integer) list of (M, ) one side atomic index of the bond, M is different for different structures
index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the correponding index1.
targets – (numpy array), N*1, where N is the number of structures
batch_size – (int) number of samples in a batch
is_shuffle – (bool) whether to shuffle the structure, default to True
distance_converter – (bool) converter for processing the distances
Args: dataset_size (int): Number of entries in dataset targets (ndarray): Feature to be predicted for each network batch_size (int): Maximum batch size shuffle (bool): Whether to shuffle the data after each step
-
class
GraphBatchGenerator
(atom_features: List[numpy.ndarray], bond_features: List[numpy.ndarray], state_features: List[numpy.ndarray], index1_list: List[int], index2_list: List[int], targets: numpy.ndarray = None, batch_size: int = 128, is_shuffle: bool = True)[source]¶ Bases:
megnet.data.graph.BaseGraphBatchGenerator
A generator class that assembles several structures (indicated by batch_size) and form (x, y) pairs for model training.
- Parameters
atom_features – (list of np.array) list of atom feature matrix,
bond_features – (list of np.array) list of bond features matrix
state_features – (list of np.array) list of [1, G] state features, where G is the global state feature dimension
index1_list – (list of integer) list of (M, ) one side atomic index of the bond,
is different for different structures (M) –
index2_list – (list of integer) list of (M, ) the other side atomic index of the bond, M is different for different structures, but it has to be the same as the corresponding index1.
targets – (numpy array), N*1, where N is the number of structures
batch_size – (int) number of samples in a batch
Args: dataset_size (int): Number of entries in dataset targets (ndarray): Feature to be predicted for each network batch_size (int): Maximum batch size shuffle (bool): Whether to shuffle the data after each step
-
class
StructureGraph
(nn_strategy: Union[str, pymatgen.analysis.local_env.NearNeighbors] = None, atom_converter: megnet.data.graph.Converter = None, bond_converter: megnet.data.graph.Converter = None, **kwargs)[source]¶ Bases:
monty.json.MSONable
This is a base class for converting converting structure into graphs or model inputs Methods to be implemented are follows:
- convert(self, structure)
This is to convert a structure into a graph dictionary
- get_input(self, structure)
This method convert a structure directly to a model input
- get_flat_data(self, graphs, targets)
This method process graphs and targets pairs and output model input list.
-
convert
(structure: pymatgen.core.structure.Structure, state_attributes: List = None) → Dict[source]¶ Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]]
- Parameters
state_attributes – (list) state attributes
structure – (pymatgen structure)
(dictionary) –
-
classmethod
from_dict
(d: Dict) → megnet.data.graph.StructureGraph[source]¶ - Parameters
d – Dict representation.
- Returns
MSONable class.
-
static
get_atom_features
(structure) → List[int][source]¶ Get atom features from structure, may be overwritten
- Parameters
structure – (Pymatgen.Structure) pymatgen structure
- Returns
List of atomic numbers
-
get_flat_data
(graphs: List[Dict], targets: List = None) → tuple[source]¶ Expand the graph dictionary to form a list of features and targets tensors. This is useful when the model is trained on assembled graphs on the fly.
- Parameters
graphs – (list of dictionary) list of graph dictionary for each structure
targets – (list of float or list) Optional: corresponding target values for each structure
- Returns
tuple(node_features, edges_features, global_values, index1, index2, targets)
-
class
StructureGraphFixedRadius
(nn_strategy: Union[str, pymatgen.analysis.local_env.NearNeighbors] = None, atom_converter: megnet.data.graph.Converter = None, bond_converter: megnet.data.graph.Converter = None, **kwargs)[source]¶ Bases:
megnet.data.graph.StructureGraph
This one uses a short cut to call find_points_in_spheres cython function in pymatgen. It is orders of magnitude faster than previous implementations
-
convert
(structure: pymatgen.core.structure.Structure, state_attributes: List = None) → Dict[source]¶ Take a pymatgen structure and convert it to a index-type graph representation The graph will have node, distance, index1, index2, where node is a vector of Z number of atoms in the structure, index1 and index2 mark the atom indices forming the bond and separated by distance. For state attributes, you can set structure.state = [[xx, xx]] beforehand or the algorithm would take default [[0, 0]]
- Parameters
state_attributes – (list) state attributes
structure – (pymatgen structure)
(dictionary) –
-
classmethod
from_structure_graph
(structure_graph: megnet.data.graph.StructureGraph) → megnet.data.graph.StructureGraphFixedRadius[source]¶
-