Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade pai restful API #2722

Merged
merged 22 commits into from
Jul 27, 2020
Merged

Upgrade pai restful API #2722

merged 22 commits into from
Jul 27, 2020

Conversation

SparkSnail
Copy link
Contributor

@SparkSnail SparkSnail commented Jul 23, 2020

  1. upgrade restserver v1 to v2
  2. remove password configuration in PAI mode, since in v2 the password related API is not supported.
  3. show error information when get unvalid token or unvalid http host.

@SparkSnail SparkSnail changed the title Upgrade pai API Upgrade pai restful API Jul 23, 2020
@ultmaster ultmaster changed the base branch from master to v1.7.1 July 24, 2020 09:17
@chicm-ms
Copy link
Contributor

would you add this change to fix mac pipeline? https://github.com/microsoft/nni/pull/2715/files

@@ -103,7 +103,6 @@ export namespace ValidationSchemas {
}),
pai_config: joi.object({ // eslint-disable-line @typescript-eslint/camelcase
userName: joi.string().min(1).required(),
passWord: joi.string().min(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also update doc accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PAIMode doc has already removed the password configuration, while it keeps in our code. Only need to remove code this time.

// Status code 200 for success
if ((error !== undefined && error !== null) || response.statusCode !== 200) {
const errorMessage: string = (error !== undefined && error !== null) ? error.message :
`PAI Training service: get job info for trial ${paiTrialJob.id} from PAI Cluster failed!, http code:${response.statusCode}, http body: ${JSON.stringify(_body)}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why _body needs JSON.stringify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the response data of PAI, seems this list API in v2 returns a object, and need to use JSON.stringify to decode the content, while the submit API returns a string directly, does not need to do this.

@@ -248,19 +244,6 @@ export class OpenPaiEnvironmentService extends EnvironmentService {
return deferred.promise;
}

private async refreshPlatform(): Promise<void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the refresh logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to remove, as it only refresh pai token. And it's a pattern for future platform, which need regular refresh credential.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squirrelsc you mean we will add dedicated token refresh logic in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, if there are similar requirements, here is a reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, why remove pai token refresh? no need to refresh token any more on pai?

@SparkSnail
Copy link
Contributor Author

would you add this change to fix mac pipeline? https://github.com/microsoft/nni/pull/2715/files

updated.

@@ -178,7 +177,8 @@ abstract class PAITrainingService implements TrainingService {
const deferred: Deferred<void> = new Deferred<void>();

request(stopJobRequest, (error: Error, response: request.Response, _body: any) => {
if ((error !== undefined && error !== null) || response.statusCode >= 400) {
// Status code 202 for success.
if ((error !== undefined && error !== null) || response.statusCode !== 202) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better keep original status code. The HTTP status code is standard. < 400 is normal state. 4xx is permission issue, and 5xx are server errros. Don't hard code to 202 only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a case that when submit job to http request, pai returns a 3xx response, but the job is not submitted correctly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3xx means redirection, and it should be handled in most HTTP sdk internally. If it happens here, you can check if the sdk is not used correctly. For example, disabled auto redirect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, add followAllRedirects:true to handle redirect.

@SparkSnail SparkSnail merged commit 04deb54 into microsoft:v1.7.1 Jul 27, 2020
this.log.error(`PAI Training service: get job info for trial ${paiTrialJob.id} from PAI Cluster failed!`);
// Status code 200 for success
if ((error !== undefined && error !== null) || response.statusCode >= 400) {
// The job refresh time could be ealier than job submission, so it might return 404 error code, need refactor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why there is no log here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 404 error is always shown, it is caused by NNI's job refresh logic, not expected to shown. The original logic use statusCode>=500 to avoid log 404 error, it is not reasonable. We should refactor the logic to avoid refresh job while it is not submitted. If add a error log here, will cause confuse for users currently, will add a log here after fixing the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants