# Introduction to Develop Tensorflow Estimator Model with DLRover trainer The document describes how to develop tensorflow estimator model with DLRover trainer. ## Develop model with tensorflow estimator [Tensorflow Estimator](https://www.tensorflow.org/guide/estimator) encapsulate Training, Evaluation, Prediction and Export for serving actions. In DLrover, both custome estimators and pre-made estimators are supported. A DLrover program with Estimator typically consists of the following four steps: ### Define the features and label column in the conf Each `Column` identifies a feature name, its type and whether it is label. The following snippet defines two feature columns in the [example](../../examples/tensorflow/criteo_deeprec/train_conf.py). ```python train_set = { "reader": FileReader("test.data"), "columns": ( Column.create( # type: ignore name="x", dtype="float32", is_label=False, ), Column.create( # type: ignore name="y", dtype="float32", is_label=True, ), ), } ``` The first feature is `x` and its type is `float32`. The second feature is `y` and is label. Its type is `float32`. `dlrover.trainer` helps build `input_fn` for train set and test set with those columns. ### Add Custom Reader for TF Estimator in DLrover In some case, the reader provided by DLrover trainer doesn't satisfy user's need. User need to develop custom reader and set it in the conf. #### Add Custom Elastic Reader for TF Estimator in DLrover One necessary arguments in the `__init__` method is path. The key funcion is `read_data_by_index_range` and `count_data`. `count_data` is used for konwing how many dataset are there before training. During training, `read_data_by_index_range` will be called to get train data. ```python from dlrover.trainer.tensorflow.reader.base_reader import ElasticReader class FakeReader(ElasticReader): def __init__(self, path=None): self.count = 1 super().__init__(path=path) def count_data(self): self._data_nums = 10 def read_data_by_index_range(self, start_index, end_index): data = [] for i in range(start_index, end_index): x = np.random.randint(1, 1000) y = 2 * x + np.random.randint(1, 5) d = "{},{}".format(x, y) data.append(d) return data ``` you need to initial you reader and set it in the conf. Here is an example ```python eval_set = {"reader": FakeReader("./eval.data"), "columns": train_set["columns"]} ``` #### Add Custom Non Elastic Reader for TF Estimator in DLrover The key funcion is `iterator`. During training, `iterator` will be called to get train data. ```python class Reader: def __init__( self, path=None, batch_size=None ): pass def get_data(self): # you custom code while True: yield "1,1" def iterator(self): while True: for d in self.get_data(): yield d ``` you need to initial you reader and set it in the conf. Here is an example ```python eval_set = {"reader": Reader("./eval.data"), "columns": train_set["columns"]} ``` ### Instantiate the Estimator The heart of every Estimator—whether pre-made or custom—is its model function, model_fn, which is a method that builds graphs for training, evaluation, and prediction. In `dlrover.trainer`, we assume the Estimator is a custom estimator. And pre-made estimators should be converted to custom estimator with little overhead. #### Train a model from custome estimators When relying on a custom Estimator, you must write the model function yourself. Refer the [tutorial](https://www.tensorflow.org/guide/estimator). #### Train a model from pre-made estimators You can convert an existing pre-made estimators by writing an Adaptor to fit with `dlrover.trainer`. As we can see, the model_fn is the key part of estimator. When training and evaluating, the model_fn is called with different mode and the graph is returned. Thus, you can define a custom estimator in which model_fn function acts as a wrapper for pre-made estimator model_fn. In the example of [DeepFMAdaptor](../../dlrover/trainer/examples/deepfm/DeepFMAdaptor.py), `DeepFMEstimator` in [`deepctr.estimator.models`](https://github.com/shenweichen/DeepCTR/tree/master/deepctr/estimator/models) is a pre-made estimator. ```python from deepctr.estimator.models.deepfm import DeepFMEstimator class DeepFMAdaptor(tf.estimator.Estimator): """Adaptor""" def model_fn(self, features, labels, mode, params): ''' featurs: type dict, key is the feature name and value is tensor. labels: type tensor, corresponding to the colum which `is_label` equals True. ''' x = features["x"] x_buckets = feature_column.bucketized_column(x, boundaries=[1, 3, 5]) linear_feature_columns = [x_buckets] dnn_feature_columns = [x] self.estimator = DeepFMEstimator( linear_feature_columns, dnn_feature_columns, task=params["task"], ) return self.estimator._model_fn( features, labels, mode, self.run_config ) ``` ### Saving object-based checkpoints with Estimator Estimators by default save checkpoints with variable names rather than the object graph described in the Checkpoint guide. The checkpoint hook is added by `dlrover.trainer.estimator_executor`. ### SavedModels from Estimators Estimators export SavedModels through tf.Estimator.export_saved_model. The exporter hook is added by `dlrover.trainer.estimator_executor`. When the job is launched, `dlrover.trainer.estimator_executor` parses the conf and builds input_fn, estimator and related hooks. ## Submit a Job to Train the Estimator model ### Build an Image with Models You can install dlrover in your image. ```bash pip install dlrover[tensorflow] - U ``` Or you also can build your image from the dlrover base image. ```dockerfile FROM registry.cn-hangzhou.aliyuncs.com/intell-ai/dlrover:deeprec_criteo_v1 COPY model_zoo /home/model_zoo ``` ```bash docker build -t ${IMAGE_NAME} -f ${DockerFile} . docker push ${IMAGE_NAME} ``` ### Set the Command to Train the Model We need to set the command of ps and worker to train the model like the [DeepCTR example](../../examples/tensorflow/criteo_deeprec/autoscale_job.yaml) ```yaml command: - /bin/bash - -c - " cd ./examples/tensorflow/criteo_deeprec \ && python -m dlrover.trainer.entry.local_entry \ --platform=Kubernetes --conf=train_conf.TrainConf \ --enable_auto_scaling=True" ``` Then, we can submit the job by `kubectl`. ```bash kubectl -n dlrover apply -f ${JOB_YAML_FILE} ```