Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rating and depth ? #20

Open
tissatussa opened this issue Jun 16, 2024 · 11 comments
Open

rating and depth ? #20

tissatussa opened this issue Jun 16, 2024 · 11 comments

Comments

@tissatussa
Copy link

about this position :

shallow-red-Nxd4-bad

here Shallow-Red -White- played 14.Nxd4, which is a blunder .. at first glance we might understand the idea, White wants to win this pawn, covered by a combination : when Black captures the uncovered Nd4 with the Bf6, White can do Qd5 with a double attack on Ra8 and Bd4, which seems to ensure recapturing one of these pieces, but then Black can play Bg4!, also with dual purpose : this counter move covers the Ra8 and threatens mate on e2 .. White can avoid that loss, but Black will do Rd8, again with dual purpose : this attacks the Qd5 and covers the Bd4 .. all this is a bit unusual but the moves are forced and rather easy to see (for an experienced chess player, like me -ahum..) but Shallow-Red fails, it even didn't play Qd5 after BxNd4, it did Bf4 and lost this game, simply being a piece down.

i don't know upto which depth the engine goes, there's no display of that in the CuteChess GUI .. i propose you try to implement this - it might also give you more insight what the engine's doing : after how many ms will a first (often) bad move be discarded and other ones appear as bestmove, changing, by using MultiPV this calculation process is even more visible, although normally not used in CuteChess games .. you'd add such UCI info line to the output.

the combinations i showed are not very deep - i'm surpised Shallow-Red made this mistake, being just a tactic .. do you have an idea of its rating? - this was a 3m2s game.

are you a (good) chess player yourself? i'm a club player for years (but still below 1900).
how do you judge the played games by your engine? (only) by statistics? -rating can be determined against many (weak) engines.

i guess its rating is no more then 1200.
respect though - it's all about the idea :-)

HypnosCEDR-part2

a minor code change may easily result in a much higher rating .. can i help? -i'm mainly thinking about special positions, to test 'behaviour' .. i'm into chess engine programming, though only in general, i've not created a serious engine myself upto now.

i await your answer !

Replay the game

[ i'm on Xubuntu 22.04 ]

[Event "engine vs engine"]
[Site "Holland"]
[Date "2024.06.16"]
[Round "?"]
[White "Shallow Red v0.3.1"]
[Black "Wowl v1.3.8"]
[Result "0-1"]
[ECO "A00"]
[GameDuration "00:08:56"]
[Opening "Dunst (Sleipner, Heinrichsen) Opening"]
[PlyCount "168"]
[TimeControl "180+2"]

1. Nc3 {4.0s} d5 {+0.40/12 6.3s} 2. e4 {4.1s} d4 {+1.12/12 6.2s} 3. Nd5 {4.1s}
e6 {+1.11/12 6.2s} 4. Nf4 {4.2s} e5 {+1.05/12 6.1s} 5. Nd3 {4.2s}
Nc6 {+0.98/11 6.0s} 6. Nf3 {4.3s} Bd6 {+1.21/12 6.0s} 7. c3 {4.3s}
Kf8 {+0.99/10 6.0s} 8. cxd4 {4.4s} exd4 {+0.64/13 5.8s} 9. e5 {4.4s}
Qe8 {+0.64/11 5.7s} 10. Be2 {4.5s} Nxe5 {+1.70/13 5.6s} 11. Ndxe5 {4.6s}
Bxe5 {+1.46/12 5.5s} 12. Qb3 {4.6s} Bf6 {+1.78/11 5.4s} 13. d3 {4.7s}
b6 {+1.87/12 5.3s} 14. Nxd4 {4.7s} Bxd4 {+4.20/12 5.2s} 15. Bf4 {4.8s}
Bb7 {+4.76/11 5.1s} 16. Qc4 {4.9s} Bxg2 {+5.40/10 5.1s} 17. Qxd4 {4.9s}
Rd8 {+6.50/12 5.0s} 18. Qb4+ {5.0s} c5 {+7.59/13 4.9s} 19. Qc3 {5.1s}
Bxh1 {+8.06/11 4.8s} 20. Be5 {5.2s} f6 {+8.93/12 4.7s} 21. Bc7 {5.2s}
Bf3 {+9.18/13 4.7s} 22. Qc2 {5.3s} Bxe2 {+9.25/12 4.6s} 23. Qxe2 {5.4s}
Qxe2+ {+9.42/14 4.5s} 24. Kxe2 {5.5s} Rd7 {+9.42/13 4.5s} 25. Bf4 {5.6s}
Ne7 {+9.42/11 4.4s} 26. Rd1 {5.7s} Kf7 {+9.59/11 4.3s} 27. Kf3 {5.8s}
Re8 {+10.12/11 4.3s} 28. b3 {5.9s} Nf5 {+10.34/11 4.2s} 29. Be3 {6.0s}
h5 {+10.55/12 4.2s} 30. h3 {6.2s} g5 {+10.64/11 4.1s} 31. Rd2 {6.3s}
Rde7 {+11.33/12 4.2s} 32. Rd1 {6.4s} Nxe3 {+13.39/13 4.5s} 33. fxe3 {6.6s}
Rxe3+ {+14.22/13 4.3s} 34. Kf2 {6.7s} Re2+ {+14.22/12 4.2s} 35. Kg1 {6.9s}
Rxa2 {+15.12/13 3.8s} 36. Rf1 {7.1s} Ree2 {+15.53/11 4.0s} 37. Rc1 {6.6s}
h4 {+15.55/11 3.7s} 38. Rd1 {6.2s} Re3 {+17.06/12 3.7s} 39. Kf1 {5.7s}
Rxh3 {+18.32/13 3.6s} 40. Kg1 {5.4s} g4 {+19.57/12 3.5s} 41. Rc1 {5.0s}
Re3 {+19.66/11 3.5s} 42. d4 {4.7s} cxd4 {+21.69/12 3.4s} 43. Rf1 {4.5s}
d3 {+24.15/11 3.4s} 44. Rb1 {4.2s} h3 {+33.61/14 3.4s} 45. Rc1 {4.0s}
d2 {+857.83/14 3.3s} 46. Rc7+ {3.8s} Kg8 {+833.30/13 3.3s} 47. Rc8+ {1.0s}
Kh7 {+833.67/15 3.3s} 48. Rc7+ {1.0s} Kg6 {+833.69/15 3.2s} 49. Rd7 {1.0s}
h2+ {+833.69/13 3.2s} 50. Kh1 {1.0s} g3 {+833.69/13 3.2s} 51. Rg7+ {1.0s}
Kxg7 {+833.70/14 3.2s} 52. Kg2 d1=Q+ {+833.70/13 3.1s} 53. Kh3
Qf1+ {+833.70/1 0.092s} 54. Kh4 Qf4+ {+833.70/1 0.078s} 55. Kh3
Qf5+ {+833.70/1 0.073s} 56. Kh4 Qg5+ {+833.70/1 0.054s} 57. Kh3
Qh6+ {+833.70/1 0.047s} 58. Kg4 {1.0s} h1=Q {+833.70/1 0.030s} 59. b4 {1.7s}
f5+ {+834.13/7 0.16s} 60. Kxf5 Qd5+ {+834.13/1 0s} 61. Kg4 Qh4+ {+834.13/1 0s}
62. Kxh4 Qc4+ {+834.13/1 0s} 63. Kh3 Qe6+ {+833.73/13 2.2s} 64. Kh4
Rh2+ {+833.73/1 0.002s} 65. Kg5 {2.6s} Qd7 {+833.73/1 0.004s} 66. Kf4 {0.57s}
Qd4+ {+833.73/3 0.005s} 67. Kf5 Qh4 {+833.73/1 0.002s} 68. b5 {2.6s}
Qh5+ {+833.73/3 0.004s} 69. Kf4 {2.5s} Qxb5 {+833.73/1 0.005s} 70. Kxe3 {1.8s}
Re2+ {+833.68/11 3.5s} 71. Kd4 a6 {+833.68/9 0.27s} 72. Kc3
Qb2+ {+833.68/1 0.003s} 73. Kd3 {1.0s} Qd2+ {+833.68/1 0.003s} 74. Kc4 {1.1s}
Re4+ {+833.68/1 0.003s} 75. Kb3 Qd4 {+833.68/1 0.004s} 76. Ka2
Qb2+ {+833.68/3 0.015s} 77. Kxb2 {1.0s} a5 {+833.68/1 0.001s} 78. Kc3 {1.0s}
g2 {+833.64/15 4.1s} 79. Kd3 {1.0s} Rg4 {+833.65/15 4.0s} 80. Ke3 {1.0s}
g1=Q+ {+833.67/14 4.0s} 81. Kf3 {1.0s} Rg3+ {+833.70/13 4.0s} 82. Ke4 {1.0s}
Qc5 {+833.70/12 3.9s} 83. Kf4 {1.0s} Qe3+ {+833.72/13 3.9s} 84. Kf5
Rg5# {+833.74/13 3.8s, Black mates} 0-1
@15jgme
Copy link
Owner

15jgme commented Jun 16, 2024

Hi @tissatussa thanks for this issue.
First up, if you want to contribute in anyway that would be really super; I appreciate any contributions (and these issues). I'm working on a different project right now, but I'll help you out as much as I can.
There may not be too much low hanging fruit but I'll try to take a look soon.

Regarding this issue, I personally find blunders like these hard to debug. I suspect it is related to the bug fixed with v3.1 (I forgot to update the Cargo.toml for the uci wrapper so you probably pulled the one with the engine at v3.0, I'll update it later today)
Essentially we have a cache to avoid looking down the whole search tree multiple times, I had found that some blunders were caused by cached evaluations from a really shallow depth being used for a deep depth search. So that blunder might have been an old cached evaluation from the end of an earlier search (ex depth 1) that was not booted out.

Regarding the rating, you can checkout the bot's lichess. It's around 1700-1800 against the other bots. There's quite a lot of bots on lichess so it's the best benchmark I can get. Blunders like this definitely make it look pretty silly, but it doesn't really know much about chess, just about doing a minimax search! I did build a NNUE evaluation function to give it more insight into tactics but this greatly slowed the search down and reduced its rating to around 1300.
For the depth, it varies based on timing but the uci binary should spit out a log file where you can take a look.

As for myself, I'm actually a pretty poor chess player (not great at visualizing what is being attacked), I actually made ShallowRed because my friends are all quite good players.

Looking forward to working with you @tissatussa I'm sure we can get the rating up a bit 🙂.
When you make a PR to the engine I'll get the bot back up and running on lichess so we can hopefully see a difference in rating.

(Regarding multiPV, let me get back to you once I've thought about it)

@tissatussa
Copy link
Author

tissatussa commented Jun 16, 2024

another strange move appeared .. i was just playing a 15m10s game against v0.3.1 and this position occured (i had the Black pieces) :

fen

in fact i'm lost here : i'll lose the Rook for its kNight and i will probably not survive .. but here Shallow-Red played the 'friendly' 36.Qxc6+ ??, so i captured the Queeen and won this game.

what happened ? feels like a joke :-)

replay the game

[Event "human vs engine"]
[Site "Holland"]
[Date "2024.06.16"]
[Round "?"]
[White "Shallow Red v0.3.1"]
[Black "Roelof Berkepeis"]
[Result "0-1"]
[ECO "A00"]
[Opening "Dunst (Sleipner, Heinrichsen) Opening"]
[TimeControl "900+10"]

1. Nc3 {20s} c6 {16s} 2. Nf3 {20s} d5 {47s} 3. d4 {21s} g6 {80s} 4. a3 {21s}
Bg7 {15s} 5. e4 {21s} a6 {51s} 6. Bd3 {21s} Bg4 {27s} 7. e5 {22s} e6 {10s}
8. h3 {22s} Bf5 {14s} 9. Bg5 {22s} Ne7 {18s} 10. g4 {22s} Bxd3 {4.7s}
11. Qxd3 {23s} a5 {35s} 12. Bf6 {23s} Bxf6 {24s} 13. exf6 {23s} Ng8 {2.2s}
14. g5 {24s} h6 {3.6s} 15. gxh6 {24s} Qxf6 {14s} 16. Ne5 {24s} Nxh6 {86s}
17. Qd2 {25s} Nf5 {126s} 18. O-O-O {25s} Nd6 {45s} 19. Qe3 {25s} Na6 {48s}
20. Ng4 {26s} Qg7 {9.7s} 21. Qg5 {26s} Kd7 {41s} 22. Kb1 {27s} c5 {53s}
23. Ne5+ {27s} Kc8 {23s} 24. f4 {27s} Nf5 {21s} 25. dxc5 {28s} Nxc5 {14s}
26. Qg1 {28s} d4 {60s} 27. Ne2 {29s} Rd8 {107s} 28. Nxd4 {30s} Qf8 {17s}
29. Nxf5 {30s} gxf5 {4.7s} 30. Rxd8+ {31s} Kxd8 {4.3s} 31. b4 {31s} axb4 {6.1s}
32. Qd4+ {32s} Kc7 {4.4s} 33. axb4 {33s} Ne4 {21s} 34. Rd1 {34s} Ra6 {35s}
35. Qc4+ {35s} Rc6 {24s} 36. Qxc6+ {36s} bxc6 {6.8s} 37. Rd7+ {33s} Kc8 {8.4s}
38. Rd4 {31s} c5 {15s} 39. bxc5 {29s} Qxc5 {15s} 40. Rxe4 {27s} fxe4 {6.1s}
41. Kb2 {25s} f6 {8.7s} 42. Ng4 {24s} f5 {13s} 43. Ne5 {22s} Qb4+ {14s}
44. Ka1 {21s} Qe1+ {24s} 45. Ka2 {20s} Qg3 {6.1s} 46. Kb2 {19s} Qxf4 {1.9s}
47. Nf7 {1.0s} e3 {17s} 48. Nd6+ {1.0s} Qxd6 {3.8s} 49. Kb3 {1.0s} e2 {2.4s}
50. Kc3 {1.0s} e1=Q+ {2.4s} 51. Kb2 {1.0s} Qdb4+ {2.6s} 52. Ka2
Qeb1# {2.1s, Black mates} 0-1

@tissatussa
Copy link
Author

tissatussa commented Jun 16, 2024

MultiPV is nice but not needed in the first place, however, it may give insight during development. But only if we look at practical play, not the statistics .. -do you count on those?

i guess i will not be making PRs .. i'm into programming but not Rust, i can instruct though .. i often tried to play such LiChess bot but they almost always are offline ..

@tissatussa
Copy link
Author

btw. your binary takes 1.5 Gb memory, which is much : what's done with it?

@tissatussa
Copy link
Author

the fact that ShallowRed often starts a game by moving the 2 kNights onto their natural squares indicates to me some PST function may induce that, but playing like this needs good understanding of pawns-in-the-center : often the Nc3 is a worse move, because it blocks the c-pawn which can be crucial at the following moves.

i know minimax but HCE (evaluation) should do the pruning, isn't it? And is it easy for you to implement that UCI info string?

@tissatussa
Copy link
Author

tissatussa commented Jun 18, 2024

do you know "Test Suites" ?

a nice and exiting way to see how strong the engine is and what mistakes it can make. I just found this README at https://github.com/Tearth/Inanis/tree/v1.3.0 , see also README.md

i have some experience with this and i can help .. are you interested ?

a puzzle can have multiple 'solutions'. Programs exist to perform such Tests, and many Suites exist. However, it's up to the programmer to interpret the results and adjust some code accordingly. EPD puzzles exist with only (/also) 'am' : 'Avoid Move' - as opposed to 'bm' : 'Best Move'.

@15jgme
Copy link
Owner

15jgme commented Jun 20, 2024

MultiPV is nice but not needed in the first place, however, it may give insight during development. But only if we look at practical play, not the statistics .. -do you count on those?

i guess i will not be making PRs .. i'm into programming but not Rust, i can instruct though .. i often tried to play such LiChess bot but they almost always are offline ..

Hi @tissatussa, could you clarify what you mean by practical play vs statistics?
The Lichess bot was my main interest, but I need to do a bit of server work to get it running reliably again.

For additions to the bot, there's a couple small clean up changes I'm happy to make, but it's not my personal main focus right now. I might work on a nicer NNUE eval function (see this repo for the old one), or some other additions like endgame tables, but I likely myself won't come back to performance improvements for a while. I would be completely happy to help you out however, if you'd like to tinker with the engine. I don't have too much free time, but I hope I could get back to your PRs and issues in a reasonable amount of time. I understand you're deterred by rust, but especially with an existing project it should be a nice language to pickup. If you are interested, changes like this with uci parsing would be good quick changes to get started.

@15jgme
Copy link
Owner

15jgme commented Jun 20, 2024

btw. your binary takes 1.5 Gb memory, which is much : what's done with it?

This is cache allocation, the amount is somewhat arbitrary, and the default value is here. That is just a default though, whatever is controlling the cache (the uci-wrapper layer for example) has the real control over it.

@15jgme
Copy link
Owner

15jgme commented Jun 20, 2024

do you know "Test Suites" ?

a nice and exiting way to see how strong the engine is and what mistakes it can make. I just found this README at https://github.com/Tearth/Inanis/tree/v1.3.0 , see also README.md

i have some experience with this and i can help .. are you interested ?

a puzzle can have multiple 'solutions'. Programs exist to perform such Tests, and many Suites exist. However, it's up to the programmer to interpret the results and adjust some code accordingly. EPD puzzles exist with only (/also) 'am' : 'Avoid Move' - as opposed to 'bm' : 'Best Move'.

The engine has a test suite for puzzles here. I believe cargo test should run it. It is not very extensive, but is sufficient for regression checks so far.

@15jgme
Copy link
Owner

15jgme commented Jun 20, 2024

the fact that ShallowRed often starts a game by moving the 2 kNights onto their natural squares indicates to me some PST function may induce that, but playing like this needs good understanding of pawns-in-the-center : often the Nc3 is a worse move, because it blocks the c-pawn which can be crucial at the following moves.

i know minimax but HCE (evaluation) should do the pruning, isn't it? And is it easy for you to implement that UCI info string?

Yes a PST is the cause of that move, I suggest taking a look through the src folder, it should provide lots of details about the engine. For example there is a psqt.rs file which contains the PST. The lichess bot plays from an opening book so I don't end up using the psqt at the start very often.

The evaluation function is not responsible for the pruning itself, it instead is only responsible for returning the evaluation. The search function performs alpha-beta pruning based on the evaluations coming in. More uci info could possibly be added, but only at the first few moves of the search for speed reasons. The engine searches millions of moves and printing out the same amount of strings would grind it to a halt. Even the check for whether or not our depth is too great to print the string might be unfavorable.

@tissatussa
Copy link
Author

could you clarify what you mean by practical play vs statistics?

many chess engine programmers only use the results of bullet self-play, they don't eximine / study real (longer) games, because they're not a chess player themself.

..work on a nicer NNUE eval function..

i didn't know ShallowRed has an NNUE, does it really ?
i'm not into that, i only know HCE.

..if you'd like to tinker with the engine..

i was not planning to do so, but i might.

..you're deterred by rust, but especially with an existing project it should be a nice language to pickup..

Rust must be a nice language, indeed this might be a good start.

(about the 1.5 Gb memory usage) ..this is cache allocation..

but why that large ? Many engines use upto 250 Mb max ..

..engine has a test suite for puzzles [..] it is not very extensive, but is sufficient for regression checks..

OK, but i don't know what you mean by 'regression checks', although i've seen that term often ..

..the evaluation function is not responsible for the pruning itself, it instead is only responsible for returning the evaluation. The search function performs alpha-beta pruning based on the evaluations coming in.

i'm aware of this mechanism, for me this comes down to the evaluation values determining the pruning, although indirectly - as you explain .. nevermind.

..more uci info could possibly be added..

somehow it's possible to output UCI info during search, most engines do : they output this info (only) when a depth is fully finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants