Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests not passed #14

Open
Jerry-Master opened this issue Jan 18, 2024 · 12 comments
Open

Tests not passed #14

Jerry-Master opened this issue Jan 18, 2024 · 12 comments

Comments

@Jerry-Master
Copy link

I followed your instructions and got one test not passed. It says the following:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/array50tb/projects/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 82.530s

FAILED (failures=1)
****

What could be causing it?

@jiakai0419
Copy link

jiakai0419 commented Jan 18, 2024

I also encountered the same problem.

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bytedance/repo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.162605, -1.1078743] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.162605
-1.1860729455947876

- [-1.162605, -1.1078743]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 37.484s

FAILED (failures=1)

@Faultiness
Copy link

Same error

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/data/username/imo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 126.464s

FAILED (failures=1)

@thtrieu
Copy link
Collaborator

thtrieu commented Jan 20, 2024

It seems the meliad library is not numerically stable, giving different scores for different users.
I will put a note in the README (a8a1dc7)
For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh,
I will let this test fail while we learn more about meliad implementation and outputs.

@jackliugithub
Copy link

same here:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
  File "/Users/Documents/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1898218, -1.1082345] != [-1.1860729455947876, -1.1022869348526]
 
First differing element 0:
-1.1898218
-1.1860729455947876
 

  • [-1.1898218, -1.1082345]
  • [-1.1860729455947876, -1.1022869348526]
     

Ran 2 tests in 82.937s

@yfcai1116
Copy link

same here

Traceback (most recent call last):
File "/home/user/python_code/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1563942, -1.1297226] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1563942
-1.1860729455947876

  • [-1.1563942, -1.1297226]
  • [-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 62.007s

FAILED (failures=1)

@soxziw
Copy link

soxziw commented Jan 29, 2024

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

@robotzheng
Copy link

robotzheng commented Feb 8, 2024

======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
File "/home/notebook/code/personal/80306170/AGI/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

  • [-1.1633697, -1.122621]
  • [-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.584s

FAILED (failures=1)

Ubuntu18.4、pytorch2.1-cu11.8、A100-80G

@aemartinez
Copy link

Same here, the only test that does not pass when executing bash run_tests.sh is test_lm_score_may_fail_numerically_for_external_meliad.

My specific numbers:

AssertionError: Lists differ: [-1.1831452, -1.112445] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1831452
-1.1860729455947876

- [-1.1831452, -1.112445]
+ [-1.1860729455947876, -1.1022869348526]

My setup: Apple M1, macOS Ventura 13.6.1, Python 3.10.8, tensorflow 2.13.0

@soxziw
Copy link

soxziw commented Feb 20, 2024

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

Problems solved using Colab!

@faraday
Copy link

faraday commented Mar 6, 2024

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

@soxziw
Copy link

soxziw commented Mar 9, 2024

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

@TriedTired99
Copy link

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

Hello, have you changed to a different version of jax? I am unable to call TPU using the dependency library in requirements.txt. I don't know if this is due to Meliad's influence, which prevented me from using GPU or CPU to reproduce the results in the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests