Reorg logical flow in train #37

chengfx · 2019-05-11T08:14:51Z

I have reorged the logical flow in train.py in order to better readability, scalability and robustness. It is the first work to add encoding cache mechanism. I have tested the regression & classification & Chinese tasks locally and still need more tests from folks 😄 because there are lots of changes in this PR.
current logical flow is

Init
1. Conf
2. Problem
3. other
  1. finetune
Cache verification (Use cache == True)
1. check
  1. Cache conf
  2. Problem
  3. Embedding
  4. Encoding
2. load
Data preprocessing (Use cache == False or cache verification == false)
1. build dictionary
2. encoding
Environment Preparing
1. Cache Save (Use cache == True)
  1. create dir
  2. conf
  3. problem
  4. embedding
  5. encoding
2. Backup
Train phase
1. init
  1. model
  2. loss
  3. optimizer
2. train
3. test

…lassifier

…ry Binary Classifier

ljshou · 2019-05-11T08:25:54Z

牛🐮辛苦

woailaosang · 2019-05-14T12:48:46Z

problem.py


- self.input_dicts = dict()
+ # init


readability is not good, I think. Strongly suggest enumerate every task.

Hi, @woailaosang Readability is just special for train logical flow here. I made less works on other modules. Yeah, enumerate every task is a good idea . But I think we need to optimize the code to reduce the repeated codes and logic. Otherwise you have to modify every place if you want to make some changes

woailaosang · 2019-05-15T12:18:14Z

.gitignore

@@ -2,5 +2,7 @@
 *~
 *.pyc
 *.cache*
+*.vs*


what's the directory of '.vs'?

it's just the configuration of vs code, not related to this project 😄

woailaosang · 2019-05-16T08:16:19Z

train.py

+ vocab_info, initialize = None, False
+ if not conf.pretrained_model_path:
+ vocab_info, initialize = get_vocab_info(problem, emb_matrix), True
+ print(initialize) 


This 'print' line needs to be deleted, I think.

woailaosang · 2019-05-16T10:21:21Z

train.py

- # first time training, load problem from cache, and then backup the cache to model_save_dir/.necessary_cache/
- if conf.use_cache and os.path.isfile(conf.problem_path):
+ def load(self, conf, problem, emb_matrix):
+ # load dictionary when (not finetune) and (cache invalid)


load dictionary when (not finetune) and (cache valid)

thanks, done

ljshou · 2019-05-20T13:06:51Z

there are conflict on these files: Conflicting files
problem.py
train.py

chengfx · 2019-05-22T08:49:12Z

there are conflict on these files: Conflicting files
problem.py
train.py

done

Feixiang Cheng and others added 30 commits April 25, 2019 22:09

Add new config about knowledge distillation for query binary classifier

674526d

remove inferenced result in knowledge distillation for query binary c…

59d6318

…lassifier

Add AUC.py in tools folder

b4c110e

Add test_data_path into conf_kdqbc_bilstmattn_cnn.json

891f43a

Modify AUC.py

8b1d100

Rename AUC.py into calculate_AUC.py

333bd98

Merge branch 'master' into dev/fecheng

b6523a7

Modify test&calculate AUC commands for Knowledge Distillation for Que…

74976c2

…ry Binary Classifier

Merge branch 'master' into dev/fecheng

936d9fe

Add cpu_thread_num parameter in conf.training_params

8c6e61b

Rename cpu_thread_num into cpu_num_workers

69c0bca

update comments in ModelConf.py

fb11aba

Add cup_num_workers in model_zoo/advanced/conf.json

bbfcde2

Add the description of cpu_num_workers in Tutorial.md

153acd3

fix conflict

4c9380c

Merge branch 'master' into dev/fecheng

2ae9d4a

Update inference speed of compressed model

cff4cd3

Add ProcessorsScheduler Class

cf534ce

Merge branch 'master' into dev/fecheng

37d09d5

Add license in ProcessorScheduler.py

17b8447

use lazy loading instead of one-off loading

e087427

merge master

1fb0440

Remove Debug Info in problem.py

05ddcf8

use open instead of codecs.open

af6ea60

Merge branch 'master' into dev/fecheng

535649e

update the inference of build dictionary for classification

fb4e47b

add md5 function in common_utils.py

a3a0c25

add merge_encode_* function

889aa91

update typo

bab7f54

update typo

576b88d

chengfx added 3 commits May 11, 2019 15:42

reorg the logical flow in train.py

91440a5

Merge branch 'add_encoding_cache' into dev/fecheng

5a747e6

merge master

229622b

chengfx requested review from ljshou, ericwtlin, yangze01, woailaosang and adolphk-yk May 11, 2019 08:14

remove dummy comments in problem.py

49a32fe

woailaosang reviewed May 14, 2019

View reviewed changes

chengfx requested a review from woailaosang May 15, 2019 07:31

enumerate problem types in problem.py

627a80f

woailaosang reviewed May 15, 2019

View reviewed changes

remove data_encoding.py

fbf780d

woailaosang reviewed May 16, 2019

View reviewed changes

Modify comment and remove debug code

c735b45

chengfx requested a review from woailaosang May 20, 2019 07:35

woailaosang approved these changes May 20, 2019

View reviewed changes

merge master

d6566d4

ljshou merged commit dc013c3 into master May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorg logical flow in train #37

Reorg logical flow in train #37

chengfx commented May 11, 2019

ljshou commented May 11, 2019

woailaosang May 14, 2019

chengfx May 15, 2019

woailaosang May 15, 2019

chengfx May 16, 2019

woailaosang May 16, 2019

woailaosang May 16, 2019

chengfx May 18, 2019

ljshou commented May 20, 2019

chengfx commented May 22, 2019

Reorg logical flow in train #37

Reorg logical flow in train #37

Conversation

chengfx commented May 11, 2019

ljshou commented May 11, 2019

woailaosang May 14, 2019

Choose a reason for hiding this comment

chengfx May 15, 2019

Choose a reason for hiding this comment

woailaosang May 15, 2019

Choose a reason for hiding this comment

chengfx May 16, 2019

Choose a reason for hiding this comment

woailaosang May 16, 2019

Choose a reason for hiding this comment

woailaosang May 16, 2019

Choose a reason for hiding this comment

chengfx May 18, 2019

Choose a reason for hiding this comment

ljshou commented May 20, 2019

chengfx commented May 22, 2019