Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Test Failure] Clojure Integration #14415

Closed
perdasilva opened this issue Mar 13, 2019 · 15 comments
Closed

[Test Failure] Clojure Integration #14415

perdasilva opened this issue Mar 13, 2019 · 15 comments

Comments

@perdasilva
Copy link
Contributor

Description

Seems the scala package tests are failing the Clojure Integration tests on CI
http:https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/master/405/pipeline

Seems related to #14402

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Test

@andrewfayres
Copy link
Contributor

Yep, this was due to the Scala NDArray arange test failing. Going to go ahead and get this one closed.

@mxnet-label-bot [Clojure, Test, Flaky, Scala]

@andrewfayres
Copy link
Contributor

@mxnet-label-bot add [Clojure, Test, Flaky, Scala]

@haojin2
Copy link
Contributor

haojin2 commented Mar 15, 2019

@andrewfayres
Copy link
Contributor

That's a different failure.

Retrieving nrepl/bencode/1.0.0/bencode-1.0.0.jar from clojars
Retrieving nrepl/nrepl/0.5.3/nrepl-0.5.3.jar from clojars
Could not find artifact origami:origami:jar:4.0.0-3 in central (https://repo1.maven.org/maven2/)
Could not find artifact origami:origami:jar:4.0.0-3 in clojars (https://repo.clojars.org/)
Could not transfer artifact origami:origami:jar:4.0.0-3 from/to vendredi (https://repository.hellonico.info/repository/hellonico/): Read timed out
Could not transfer artifact origami:origami:pom:4.0.0-3 from/to vendredi (https://repository.hellonico.info/repository/hellonico/): Read timed out
This could be due to a typo in :dependencies, file system permissions, or network issues.
If you are behind a proxy, try setting the 'http_proxy' environment variable.

Looks like an issue with the origami repo. It had an outage last week and may be experience another one. There's an open issue about mitigating the clojure integ test reliance on this #14394.

@haojin2
Copy link
Contributor

haojin2 commented Mar 16, 2019

@andrewfayres Please address this ASAP as I'm experiencing another occurrence on my other PR here: http:https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-14445/2/pipeline. Thanks!

@andrewfayres
Copy link
Contributor

@haojin2 not related to this issue and I don't own this code

The CI failure you're seeing is due to clojure integ tests having a dependency on the origami repo which is apparently experiencing an outage. This happened last week and @gigasquid opened a PR (#14379) to disable those tests. The repo came back online before the PR got merged so she closed it instead.

I did go ahead and open a new PR (#14448) which has the same changes and will disable the integ tests. Feel free to take a look and review it. I'll post a message in the slack mxnet-clojure channel to hopefully get a committer to look.

@haojin2
Copy link
Contributor

haojin2 commented Mar 16, 2019

@andrewfayres Thanks for the details, please lemme know when the workaround has been merged so that I can re-trigger my builds of all my PRs. Have a great weekend

@lanking520
Copy link
Member

Merged it now. Let's wait for somebody to fix it.

@lanking520 lanking520 removed the Scala label Mar 20, 2019
@gigasquid
Copy link
Member

This has been resolved with the long term solution discussed in #14394 - so closing this one

@haojin2
Copy link
Contributor

haojin2 commented Oct 20, 2019

@gigasquid Seems like this test is failing again now but caused by a different reason here:
http:https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16537/11/pipeline

lein test imclassification.train-mnist-test

Starting Training of MNIST ....

Running with context devices of [#object[org.apache.mxnet.Context 0x5e67a490 cpu(0)]]

[15:44:21] src/io/iter_mnist.cc:110: MNISTIter: load 60000 images, shuffle=1, shape=(10,784)

[15:44:22] src/io/iter_mnist.cc:110: MNISTIter: load 10000 images, shuffle=1, shape=(10,784)

WARN  org.apache.mxnet.DataDesc: Found Undefined Layout, will use default index 0 for batch axis

WARN  org.apache.mxnet.DataDesc: Found Undefined Layout, will use default index 0 for batch axis

WARN  org.apache.mxnet.DataDesc: Found Undefined Layout, will use default index 0 for batch axis

WARN  org.apache.mxnet.DataDesc: Found Undefined Layout, will use default index 0 for batch axis

WARN  org.apache.mxnet.DataDesc: Found Undefined Layout, will use default index 0 for batch axis

INFO  org.apache.mxnet.module.BaseModule: Epoch[0] Train-accuracy=0.13231666

INFO  org.apache.mxnet.module.BaseModule: Epoch[0] Time cost=7499

INFO  org.apache.mxnet.module.BaseModule: Epoch[0] Validation-accuracy=0.338

INFO  org.apache.mxnet.module.BaseModule: Epoch[1] Train-accuracy=0.71955

INFO  org.apache.mxnet.module.BaseModule: Epoch[1] Time cost=6314

INFO  org.apache.mxnet.module.BaseModule: Epoch[1] Validation-accuracy=0.8542

INFO  org.apache.mxnet.module.Module: Saved checkpoint to target/test-0002.params

Finish fit



lein test :only imclassification.train-mnist-test/mnist-two-epochs-test



FAIL in (mnist-two-epochs-test) (train_mnist_test.clj:38)

expected: (= (file-to-filtered-seq "test/test-symbol.json.ref") (file-to-filtered-seq "target/test-symbol.json"))

  actual: (not (= ("{" "  \"nodes\": [" "    {" "      \"op\": \"null\", " "      \"name\": \"data\", " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc1_weight\", " "      \"attrs\": {\"num_hidden\": \"128\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc1_bias\", " "      \"attrs\": {\"num_hidden\": \"128\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"FullyConnected\", " "      \"name\": \"fc1\", " "      \"attrs\": {\"num_hidden\": \"128\"}, " "      \"inputs\": [[0, 0, 0], [1, 0, 0], [2, 0, 0]]" "    }, " "    {" "      \"op\": \"Activation\", " "      \"name\": \"relu1\", " "      \"attrs\": {\"act_type\": \"relu\"}, " "      \"inputs\": [[3, 0, 0]]" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc2_weight\", " "      \"attrs\": {\"num_hidden\": \"64\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc2_bias\", " "      \"attrs\": {\"num_hidden\": \"64\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"FullyConnected\", " "      \"name\": \"fc2\", " "      \"attrs\": {\"num_hidden\": \"64\"}, " "      \"inputs\": [[4, 0, 0], [5, 0, 0], [6, 0, 0]]" "    }, " "    {" "      \"op\": \"Activation\", " "      \"name\": \"relu2\", " "      \"attrs\": {\"act_type\": \"relu\"}, " "      \"inputs\": [[7, 0, 0]]" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc3_weight\", " "      \"attrs\": {\"num_hidden\": \"10\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc3_bias\", " "      \"attrs\": {\"num_hidden\": \"10\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"FullyConnected\", " "      \"name\": \"fc3\", " "      \"attrs\": {\"num_hidden\": \"10\"}, " "      \"inputs\": [[8, 0, 0], [9, 0, 0], [10, 0, 0]]" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"softmax_label\", " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"SoftmaxOutput\", " "      \"name\": \"softmax\", " "      \"inputs\": [[11, 0, 0], [12, 0, 0]]" "    }" "  ], " "  \"arg_nodes\": [0, 1, 2, 5, 6, 9, 10, 12], " "  \"node_row_ptr\": [" "    0, " "    1, " "    2, " "    3, " "    4, " "    5, " "    6, " "    7, " "    8, " "    9, " "    10, " "    11, " "    12, " "    13, " "    14" "  ], " "  \"heads\": [[13, 0, 0]], " "}") ("{" "  \"nodes\": [" "    {" "      \"op\": \"null\", " "      \"name\": \"data\", " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc1_weight\", " "      \"attrs\": {\"num_hidden\": \"128\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc1_bias\", " "      \"attrs\": {\"num_hidden\": \"128\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"FullyConnected\", " "      \"name\": \"fc1\", " "      \"attrs\": {\"num_hidden\": \"128\"}, " "      \"inputs\": [[0, 0, 0], [1, 0, 0], [2, 0, 0]]" "    }, " "    {" "      \"op\": \"Activation\", " "      \"name\": \"relu1\", " "      \"attrs\": {\"act_type\": \"relu\"}, " "      \"inputs\": [[3, 0, 0]]" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc2_weight\", " "      \"attrs\": {\"num_hidden\": \"64\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc2_bias\", " "      \"attrs\": {\"num_hidden\": \"64\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"FullyConnected\", " "      \"name\": \"fc2\", " "      \"attrs\": {\"num_hidden\": \"64\"}, " "      \"inputs\": [[4, 0, 0], [5, 0, 0], [6, 0, 0]]" "    }, " "    {" "      \"op\": \"Activation\", " "      \"name\": \"relu2\", " "      \"attrs\": {\"act_type\": \"relu\"}, " "      \"inputs\": [[7, 0, 0]]" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc3_weight\", " "      \"attrs\": {\"num_hidden\": \"10\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"fc3_bias\", " "      \"attrs\": {\"num_hidden\": \"10\"}, " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"FullyConnected\", " "      \"name\": \"fc3\", " "      \"attrs\": {\"num_hidden\": \"10\"}, " "      \"inputs\": [[8, 0, 0], [9, 0, 0], [10, 0, 0]]" "    }, " "    {" "      \"op\": \"null\", " "      \"name\": \"softmax_label\", " "      \"inputs\": []" "    }, " "    {" "      \"op\": \"SoftmaxOutput\", " "      \"name\": \"softmax\", " "      \"inputs\": [[11, 0, 0], [12, 0, 0]]" "    }" "  ], " "  \"arg_nodes\": [0, 1, 2, 5, 6, 9, 10, 12], " "  \"node_row_ptr\": [" "    0, " "    1, " "    2, " "    3, " "    4, " "    5, " "    6, " "    7, " "    8, " "    9, " "    10, " "    11, " "    12, " "    13, " "    14" "  ], " "  \"heads\": [[13, 0, 0]], " "  \"attrs\": {" "    \"is_np_shape\": [\"int\", 0], " "  }" "}")))



Ran 1 tests containing 1 assertions.

1 failures, 0 errors.

@haojin2 haojin2 reopened this Oct 20, 2019
@haojin2
Copy link
Contributor

haojin2 commented Oct 20, 2019

@gigasquid Could you help with identifying the cause so that we could fix this ASAP? Thanks!

@gigasquid
Copy link
Member

looking into it

@gigasquid
Copy link
Member

The failing test is a bit too brittle - it is verifying the saved model of the mnist. A new attribute has been added so the test is failing "attrs" {"is_np_shape" ["int" 0]} - This test needs to be reworked, I will add a pr to your pr to disable it in the meantime reminisce#19

@haojin2
Copy link
Contributor

haojin2 commented Oct 20, 2019

@gigasquid okay I see the cause. But no hurries on the fix for the test, I think @reminisce has also made changes to make that test working now. Thanks a lot for your prompt reply! I'll also close the issue now.

@haojin2 haojin2 closed this as completed Oct 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants