Well, it's code that uses machine learning to analyze Mario Kart race result screenshots and dump all the results into a spreadsheet. Basically it's a very silly, highly specific OCR routine.
I wrote it after playing a bunch of Mario Kart online and wondering how the game awards points after a race. It's obvious that you get more points for beating players with a better rating than you, and lose more points for losing to someone with a worse rating than you, but the exact rules aren't clear. I thought I might be able to work out the pattern if I could analyze the data. I haven't figured it out really (I'll probably post more about that another time), but maybe someone else wants to take a crack at it. If nothing else, this might be useful as an example of a simple machine learning task.
The stats shown after a race are an easy target for simple machine learning since the digits are very distinct and consistent in appearance.
The first step is to extract the relevant pixel data from the race results screen. We need:
- Each digit for each player's current rating (called "VR", for Versus Rating)
- Each digit for the number of points awarded to each player
- The awarded points' sign (+ or -) designating a gain or a loss.
This is a supervised classification problem, so the next step is to create a training data set. I did this already by manually labeling a bunch of the pixel data, for use in a classifier. This classification problem is not particularly hard and I suspect the choice of classifier doesn't make a whole lot of difference, but I chose a multi-class support vector machine (SVM). SVMs are really binary classifiers (distinguish A from B only), so scikit-learn implements a one-vs-rest scheme for the multi-class case. That just means that, for the classifier to succeed, a given digit's pixel data need to be separable from the pixel data of all the other digits.
Rather than feed the raw pixels into the classifier, they're processed first by:
- Converting to grayscale
- Applying a sharpening filter to accentuate edges
- Reducing the dimensionality using factor analysis
The model fitting process uses cross-validation to choose the number of components to retain from the factor analysis, and to tune the regularization strength of the SVM. It selects the best-performing model and exposes it to the user.
- Mario Kart 8 Deluxe screenshots. I captured mine using a Switch Lite. I suspect a full-sized Switch would be fine, too, but I didn't try it. You can also just use my screenshots.
- Python3 (I'm using 3.7.1)
- PIL
- NumPy
- scikit-learn
- pandas (and xlrd or openpyxl)
- joblib
- pigeon, only if you want to label your own training data for some reason
The high-level MKDataCompiler class is the easiest place to start, and it's very simple to use. Just give it training data and your screenshots, and get a pandas DataFrame.
from MKDataCompiler import MKDataCompiler
import glob
paths_to_labeled = {'vr_digits': 'vr_digits_labeled.xlsx',
'pts_digits': 'pts_digits_labeled.xlsx',
'pts_signs': 'pts_signs_labeled.xlsx'}
mkdc = MKDataCompiler( paths_to_labeled, n_jobs=4)
df = mkdc.compile( glob.glob( 'images-redacted/*.jpg'))
df.head(20).fillna('')
tuning vr_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=20 and C=0.1
tuning pts_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=15 and C=0.01
tuning pts_signs...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=4 and C=0.1
VR | points | is user | ||
---|---|---|---|---|
race | rank | |||
1 | 1 | 13453 | 20 | |
2 | 10178 | 16 | ||
3 | 2417 | 21 | ||
4 | 10937 | 4 | ||
5 | 10342 | -2 | ||
6 | 10243 | -7 | x | |
7 | 10455 | -16 | ||
8 | 10010 | -19 | ||
2 | 1 | 10311 | 27 | |
2 | 10119 | 21 | ||
3 | 12154 | 17 | ||
4 | 17105 | 3 | ||
5 | 9956 | 9 | ||
6 | 11916 | 2 | ||
7 | 11778 | -2 | ||
8 | 9104 | -1 | ||
9 | 10265 | -9 | x | |
10 | 12190 | -18 | ||
11 | 11686 | -24 | ||
12 | 1007 | -2 |
That's it! It works well; the cross-validation result suggests the classifier has perfect accuracy. It's not super surprising, but pretty cool.
The biggest trouble with this classification task has to do with detecting blank spaces that don't contain digits. Blanks can occur for a few reasons: if there are less than 12 players; if a "VR" rating is less than 5 digits; or if the points awarded are single-digit. In any of those scenarios, some random part of the game's visuals ends up in the extracted data.
This issue made it tough at first to get perfect accuracy. There were always a couple samples that got misclassified. My first solution was to just add more training data, which is why there is such a stupidly large number of labeled samples in my training set. It didn't really help though, so I started playing around with pre-processing and found that applying a sharpening filter to the image made a big difference. This filter made each digit stand out better against the background scenes.
You don't need to use the high-level MKDataCompiler class. Using the classifiers directly might also be interesting:
from MKImageClassifier import MKImageClassifier
import matplotlib.pyplot as plt
import numpy as np
clf_pts_digits = MKImageClassifier( 'pts_digits_labeled.xlsx', 'pts_digits', n_splits=10)
_ = clf_pts_digits.tune( n_jobs=4)
tuning pts_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=15 and C=0.01
fig,axs = plt.subplots( ncols=3, figsize=(13,4))
for d in np.unique( clf_pts_digits.mdl.y_train):
mask = (clf_pts_digits.mdl.y_train == d)
_ = axs[0].scatter( clf_pts_digits.mdl.scores[mask,1], clf_pts_digits.mdl.scores[mask,3], s=2)
_ = axs[0].set_xlabel( 'Factor 2', size=16)
_ = axs[0].set_ylabel( 'Factor 4', size=16)
_ = axs[1].scatter( clf_pts_digits.mdl.scores[mask,3], clf_pts_digits.mdl.scores[mask,6], s=2)
_ = axs[1].set_xlabel( 'Factor 4', size=16)
_ = axs[1].set_ylabel( 'Factor 7', size=16)
_ = axs[2].scatter( clf_pts_digits.mdl.scores[mask,0], clf_pts_digits.mdl.scores[mask,6], s=2)
_ = axs[2].set_xlabel( 'Factor 1', size=16)
_ = axs[2].set_ylabel( 'Factor 7', size=16)
plt.tight_layout()
Each dot is the low-dimensional representation of one sample of digit pixels. Different samples of the same digit are shown in the same color, and they tend to cluster together (which is why the classifier can tell them apart). Cluster separation isn't great along every factor/dimension so I selected a couple of the more interesting planes for plotting. The clusters appear elongated in the low-D space-- there's probably a way to adjust the normalization to fix this but it works well anyway, so it's Probably Fine as-is.
clf_pts_signs = MKImageClassifier( 'pts_signs_labeled.xlsx', 'pts_signs', n_splits=10)
_ = clf_pts_signs.tune( n_jobs=4)
clf_vr_digits = MKImageClassifier( 'vr_digits_labeled.xlsx', 'vr_digits', n_splits=10)
_ = clf_vr_digits.tune( n_jobs=4)
tuning pts_signs...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=4 and C=0.1
tuning vr_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=20 and C=0.1
from MKImageLoader import load_images
vr_digits, pts_digits, pts_signs, user_ranks = load_images( glob.glob( 'images-redacted\\*jpg'))
vr_digits_hat = clf_vr_digits.predict( vr_digits)
pts_digits_hat = clf_pts_digits.predict( pts_digits)
pts_signs_hat = clf_pts_signs.predict( pts_signs)
img = 4
fig,axs = plt.subplots( nrows=12, ncols=8, figsize=(9,13))
for rank in range(12):
for digit in range(5):
axs[rank,digit+3].imshow( vr_digits[:,:,digit,rank,img])
axs[rank,digit+3].set_title( vr_digits_hat[digit,rank,img], fontsize=26)
_ = axs[rank,0].imshow( pts_signs[:,:,0,rank,img])
_ = axs[rank,0].set_title( pts_signs_hat[0,rank,img], fontsize=26)
_ = axs[rank,1].imshow( pts_digits[:,:,0,rank,img])
_ = axs[rank,1].set_title( pts_digits_hat[0,rank,img], fontsize=26)
_ = axs[rank,2].imshow( pts_digits[:,:,1,rank,img])
_ = axs[rank,2].set_title( pts_digits_hat[1,rank,img], fontsize=26)
for ax in axs.flatten():
ax.set_xticks([])
ax.set_yticks([])
plt.tight_layout()
You can use the image loader class, MKImageLoader, if you want more direct access to the pixel data.
from MKImageLoader import MKImageLoader
mkil = MKImageLoader( 'images-redacted\\2020082317571100-16851BE00BC6068871FE49D98876D6C5.jpg')
mkil.main_region
mkil.player_regions[0]
The image loader auto-detects which place you came in, so you can track yourself easily. This is also important because the image loader has to invert the colors for your stats, otherwise they won't be white-on-a-black-background like all the other players.
mkil.player_regions[mkil.user_rank]
vr_digits = mkil.get_vr_digits()
type( vr_digits), vr_digits.shape
(numpy.ndarray, (23, 15, 5, 12))
fig,axs = plt.subplots( ncols=5, figsize=(4,1))
for i,ax in enumerate( axs):
ax.imshow( vr_digits[:,:,i,mkil.user_rank])
pts_signs = mkil.get_pts_signs()
pts_digits = mkil.get_pts_digits()
pts_signs.shape, pts_digits.shape
((13, 13, 1, 12), (18, 12, 2, 12))
fig,axs = plt.subplots( ncols=3, figsize=(2.5,1))
_ = axs[0].imshow( pts_signs[:,:,0,mkil.user_rank])
_ = axs[1].imshow( pts_digits[:,:,0,mkil.user_rank])
_ = axs[2].imshow( pts_digits[:,:,1,mkil.user_rank])
The resulting numpy arrays will be essentially the same dimensions as above, except there is one more dimension added to stack multiple images
all_vr_digits, all_pts_digits, all_pts_signs, all_user_ranks = load_images( glob.glob( 'images-redacted\\*.jpg'))
type(all_vr_digits), all_vr_digits.shape # height,width,ndigit,nplayer,nimages
(numpy.ndarray, (23, 15, 5, 12, 747))
type(all_user_ranks), len(all_user_ranks)
(list, 747)
You don't need to do this, but it's here if you want to for some reason. You can label your own data any way you like, but I used pigeon and thought it was pretty convenient. The MKImageLabeler class is provided to help with this:
from MKImageLabeler import MKImageLabeler
mklbl = MKImageLabeler( glob.glob( 'images-redacted\\*.jpg'))
mklbl.label_vr_digits( 5)
df_vr_digits = mklbl.vr_digits_as_df()
df_vr_digits
rank | digit | path | label | |
---|---|---|---|---|
0 | 0 | 0 | images-redacted\2020082314144200-16851BE00BC60... | 1 |
1 | 0 | 1 | images-redacted\2020082314144200-16851BE00BC60... | 3 |
2 | 0 | 2 | images-redacted\2020082314144200-16851BE00BC60... | 4 |
3 | 0 | 3 | images-redacted\2020082314144200-16851BE00BC60... | 5 |
4 | 0 | 4 | images-redacted\2020082314144200-16851BE00BC60... | 3 |
df_vr_digits.to_excel( 'vr_digits_labeled_example.xlsx', index=False)
mklbl.label_pts_digits( 5)
df_pts_digits = mklbl.pts_digits_as_df()
df_pts_digits
rank | digit | path | label | |
---|---|---|---|---|
0 | 0 | 0 | images-redacted\2020082314144200-16851BE00BC60... | 2 |
1 | 0 | 1 | images-redacted\2020082314144200-16851BE00BC60... | 0 |
2 | 1 | 0 | images-redacted\2020082314144200-16851BE00BC60... | 1 |
3 | 1 | 1 | images-redacted\2020082314144200-16851BE00BC60... | 6 |
4 | 2 | 0 | images-redacted\2020082314144200-16851BE00BC60... | 2 |
mklbl.label_pts_signs( 5)
df_pts_signs = mklbl.pts_signs_as_df()
df_pts_signs
rank | digit | path | label | |
---|---|---|---|---|
0 | 0 | 0 | images-redacted\2020082314144200-16851BE00BC60... | + |
1 | 1 | 0 | images-redacted\2020082314144200-16851BE00BC60... | + |
2 | 2 | 0 | images-redacted\2020082314144200-16851BE00BC60... | + |
3 | 3 | 0 | images-redacted\2020082314144200-16851BE00BC60... | + |
4 | 4 | 0 | images-redacted\2020082314144200-16851BE00BC60... | - |
TBD