patternbashMinor
Shell script to count chess game outcomes
Viewed 0 times
scriptoutcomesshellchessgamecount
Problem
I came across this blog post by Adam Drake from around a year ago which is now making the rounds again.
I made some improvements to his code, but wish to see if there are additional tweaks that could be made to make it run even faster.
The task is to extract chess game results from PGN files. The files contain sequences of games, where each has a header which contains a "Result" line like this:
These three results indicate a white win, a black win, and a draw, respectively. The task is to simply collect and report a summary of these results.
Here is my solution to be reviewed:
I was skeptical of using
The optimization I originally had in mind was to close the data file after reading the
This is as far as I got. (An earlier version, based on the blog post, attempted parallel processing, but removing that was the biggest performance improvement I made.) I don't think swit
I made some improvements to his code, but wish to see if there are additional tweaks that could be made to make it run even faster.
The task is to extract chess game results from PGN files. The files contain sequences of games, where each has a header which contains a "Result" line like this:
[Result "1-0"]
[Result "0-1"]
[Result "1/2-1/2"]These three results indicate a white win, a black win, and a draw, respectively. The task is to simply collect and report a summary of these results.
Here is my solution to be reviewed:
find . -type f -name '*.pgn' -print0 |
xargs -0 mawk -F '[-"]' '/Result/ { ++a[$2]; }
END { print a["1"]+a["0"]+a["1/2"], a["1"], a["0"], a["1/2"] }'I was skeptical of using
find over just listing the files in the reference data set, but my timings indicate that this is actually faster than a shell wildcard (Bash 4.3.11(1)-release).tripleee@xvbvntv:ChessData$ time find . -type f -name '*.pgn' | wc -l
3025
real 0m0.014s
user 0m0.008s
sys 0m0.011s
tripleee@xvbvntv:ChessData$ time printf '%s\n' */*.pgn | wc -l
3025
real 0m0.037s
user 0m0.032s
sys 0m0.010sThe optimization I originally had in mind was to close the data file after reading the
Result line, but as it turns out, the reference data set files contain multiple games, and thus multiple results (and the game portion is a lot smaller than I thought it would be).tripleee@xvbvntv:ChessData$ time find . -type f -name '*.pgn' -print0 |
> xargs -0 mawk -F '[-"]' '/Result/ { ++a[$2]; }
> END { print a["1"]+a["0"]+a["1/2"], a["1"], a["0"], a["1/2"] }'
6829065 2602614 1974505 2251946
real 0m50.232s
user 0m19.820s
sys 0m2.542sThis is as far as I got. (An earlier version, based on the blog post, attempted parallel processing, but removing that was the biggest performance improvement I made.) I don't think swit
Solution
I don't see the reason for chaining with
It's better and simpler to use
I don't see a way to make the AWK code faster,
but:
Like this:
-print0 | xargs -0. It's better and simpler to use
-exec:find . -type f -name '*.pgn' -exec mawk -F '[-"]' '...' {} +I don't see a way to make the AWK code faster,
but:
- Some of the double-quoting is unnecessary
- I would add a space around operators for somewhat better readability
- A semicolon can be dropped
Like this:
/Result/ { ++a[$2] }
END { print a[1] + a[0] + a["1/2"], a[1], a[0], a["1/2"] }Code Snippets
find . -type f -name '*.pgn' -exec mawk -F '[-"]' '...' {} +/Result/ { ++a[$2] }
END { print a[1] + a[0] + a["1/2"], a[1], a[0], a["1/2"] }Context
StackExchange Code Review Q#78086, answer score: 3
Revisions (0)
No revisions yet.