Tests not passed #14

Jerry-Master · 2024-01-18T16:05:57Z

I followed your instructions and got one test not passed. It says the following:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/array50tb/projects/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 82.530s

FAILED (failures=1)
****

What could be causing it?

The text was updated successfully, but these errors were encountered:

jiakai0419 · 2024-01-18T16:25:02Z

I also encountered the same problem.

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bytedance/repo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.162605, -1.1078743] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.162605
-1.1860729455947876

- [-1.162605, -1.1078743]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 37.484s

FAILED (failures=1)

Faultiness · 2024-01-19T09:39:48Z

Same error

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/export/data/username/imo/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

- [-1.1633697, -1.122621]
+ [-1.1860729455947876, -1.1022869348526]

----------------------------------------------------------------------
Ran 2 tests in 126.464s

FAILED (failures=1)

thtrieu · 2024-01-20T01:28:46Z

It seems the meliad library is not numerically stable, giving different scores for different users.
I will put a note in the README (a8a1dc7)
For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh,
I will let this test fail while we learn more about meliad implementation and outputs.

jackliugithub · 2024-01-20T17:49:30Z

same here:

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
File "/Users/Documents/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1898218, -1.1082345] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1898218
-1.1860729455947876

[-1.1898218, -1.1082345]

[-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.937s

yfcai1116 · 2024-01-29T09:00:07Z

same here

Traceback (most recent call last):
File "/home/user/python_code/alphageometry-main/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1563942, -1.1297226] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1563942
-1.1860729455947876

[-1.1563942, -1.1297226]

[-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 62.007s

FAILED (failures=1)

soxziw · 2024-01-29T10:50:07Z

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

robotzheng · 2024-02-08T06:10:55Z

======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

Traceback (most recent call last):
File "/home/notebook/code/personal/80306170/AGI/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
self.assertEqual(
AssertionError: Lists differ: [-1.1633697, -1.122621] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1633697
-1.1860729455947876

[-1.1633697, -1.122621]

[-1.1860729455947876, -1.1022869348526]

Ran 2 tests in 82.584s

FAILED (failures=1)

Ubuntu18.4、pytorch2.1-cu11.8、A100-80G

aemartinez · 2024-02-20T19:49:03Z

Same here, the only test that does not pass when executing bash run_tests.sh is test_lm_score_may_fail_numerically_for_external_meliad.

My specific numbers:

AssertionError: Lists differ: [-1.1831452, -1.112445] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1831452
-1.1860729455947876

- [-1.1831452, -1.112445]
+ [-1.1860729455947876, -1.1022869348526]

My setup: Apple M1, macOS Ventura 13.6.1, Python 3.10.8, tensorflow 2.13.0

soxziw · 2024-02-20T23:43:44Z

It seems the meliad library is not numerically stable, giving different scores for different users. I will put a note in the README (a8a1dc7) For now, it seems the small difference in score does not affect run.sh and all other tests in run_tests.sh, I will let this test fail while we learn more about meliad implementation and outputs.

@thtrieu I have encountered the same error here. Indeed, it does not affect other tests in run_tests.sh and the orthocenter problem in run.sh.

However, I find it is not successful when solving Olympiad geometry. For example, when solving 2019 p6, the program terminates early with DD+AR failed to solve the problem. without new LM output generated. (No reason or error to trackback... So weird) (full output log: https://drive.google.com/file/d/1btni6zroBbDLz6OBMpifTyj74bjL5fFy/view?usp=drive_link)

Could you please tell the specific hardware you use to run all Olympiad geometry successfully? (I use Ubuntu 20.04, Python 3.10.12, 64-core vCPU, 2*NVIDIA A10(24GB) but fail to reproduce the results)

Problems solved using Colab!

faraday · 2024-03-06T19:40:56Z

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

soxziw · 2024-03-09T02:59:05Z

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

TriedTired99 · 2024-04-12T17:18:45Z

@soxziw I'm running this in Google Colab and I got the exact same failure when running run_tests.sh

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (__main__.LmInferenceTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/content/alphageometry/lm_inference_test.py", line 82, in test_lm_score_may_fail_numerically_for_external_meliad
    self.assertEqual(
AssertionError: Lists differ: [-1.1527003, -1.1230755] != [-1.1860729455947876, -1.1022869348526]

First differing element 0:
-1.1527003
-1.1860729455947876

- [-1.1527003, -1.1230755]
+ [-1.1860729455947876, -1.1022869348526]

What kind of instance or GPU did you get when your tests were passing?

The free TPU

Hello, have you changed to a different version of jax? I am unable to call TPU using the dependency library in requirements.txt. I don't know if this is due to Meliad's influence, which prevented me from using GPU or CPU to reproduce the results in the paper.

Ehisnet mentioned this issue Jan 30, 2024

TESTING OF ALPHAGEOMETRY #60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests not passed #14

Tests not passed #14

Jerry-Master commented Jan 18, 2024

jiakai0419 commented Jan 18, 2024 •

edited

Loading

Faultiness commented Jan 19, 2024

thtrieu commented Jan 20, 2024

jackliugithub commented Jan 20, 2024

yfcai1116 commented Jan 29, 2024

soxziw commented Jan 29, 2024 •

edited

Loading

robotzheng commented Feb 8, 2024 •

edited

Loading

aemartinez commented Feb 20, 2024

soxziw commented Feb 20, 2024

faraday commented Mar 6, 2024 •

edited

Loading

soxziw commented Mar 9, 2024

TriedTired99 commented Apr 12, 2024

Tests not passed #14

Tests not passed #14

Comments

Jerry-Master commented Jan 18, 2024

jiakai0419 commented Jan 18, 2024 • edited Loading

Faultiness commented Jan 19, 2024

thtrieu commented Jan 20, 2024

jackliugithub commented Jan 20, 2024

FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

yfcai1116 commented Jan 29, 2024

same here

soxziw commented Jan 29, 2024 • edited Loading

robotzheng commented Feb 8, 2024 • edited Loading

====================================================================== FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

aemartinez commented Feb 20, 2024

soxziw commented Feb 20, 2024

faraday commented Mar 6, 2024 • edited Loading

soxziw commented Mar 9, 2024

TriedTired99 commented Apr 12, 2024

jiakai0419 commented Jan 18, 2024 •

edited

Loading

soxziw commented Jan 29, 2024 •

edited

Loading

robotzheng commented Feb 8, 2024 •

edited

Loading

======================================================================
FAIL: test_lm_score_may_fail_numerically_for_external_meliad (main.LmInferenceTest)

faraday commented Mar 6, 2024 •

edited

Loading