Skip to content

image processing + machine learning pipeline to extract race stats from screenshots of Mario Kart. "But Why?"

Notifications You must be signed in to change notification settings

probablyfine/MK-OCR

Repository files navigation

MK-OCR

Table of Contents

What/Why

Well, it's code that uses machine learning to analyze Mario Kart race result screenshots and dump all the results into a spreadsheet. Basically it's a very silly, highly specific OCR routine.

I wrote it after playing a bunch of Mario Kart online and wondering how the game awards points after a race. It's obvious that you get more points for beating players with a better rating than you, and lose more points for losing to someone with a worse rating than you, but the exact rules aren't clear. I thought I might be able to work out the pattern if I could analyze the data. I haven't figured it out really (I'll probably post more about that another time), but maybe someone else wants to take a crack at it. If nothing else, this might be useful as an example of a simple machine learning task.

How

The stats shown after a race are an easy target for simple machine learning since the digits are very distinct and consistent in appearance.

The first step is to extract the relevant pixel data from the race results screen. We need:

  • Each digit for each player's current rating (called "VR", for Versus Rating)
  • Each digit for the number of points awarded to each player
  • The awarded points' sign (+ or -) designating a gain or a loss.

This is a supervised classification problem, so the next step is to create a training data set. I did this already by manually labeling a bunch of the pixel data, for use in a classifier. This classification problem is not particularly hard and I suspect the choice of classifier doesn't make a whole lot of difference, but I chose a multi-class support vector machine (SVM). SVMs are really binary classifiers (distinguish A from B only), so scikit-learn implements a one-vs-rest scheme for the multi-class case. That just means that, for the classifier to succeed, a given digit's pixel data need to be separable from the pixel data of all the other digits.

Rather than feed the raw pixels into the classifier, they're processed first by:

The model fitting process uses cross-validation to choose the number of components to retain from the factor analysis, and to tune the regularization strength of the SVM. It selects the best-performing model and exposes it to the user.

Requirements

  • Mario Kart 8 Deluxe screenshots. I captured mine using a Switch Lite. I suspect a full-sized Switch would be fine, too, but I didn't try it. You can also just use my screenshots.
  • Python3 (I'm using 3.7.1)
  • PIL
  • NumPy
  • scikit-learn
  • pandas (and xlrd or openpyxl)
  • joblib
  • pigeon, only if you want to label your own training data for some reason

Usage

High-level functions

The high-level MKDataCompiler class is the easiest place to start, and it's very simple to use. Just give it training data and your screenshots, and get a pandas DataFrame.

from MKDataCompiler import MKDataCompiler
import glob
paths_to_labeled = {'vr_digits':  'vr_digits_labeled.xlsx',
                    'pts_digits': 'pts_digits_labeled.xlsx',
                    'pts_signs':  'pts_signs_labeled.xlsx'}
mkdc = MKDataCompiler( paths_to_labeled, n_jobs=4)
df = mkdc.compile( glob.glob( 'images-redacted/*.jpg'))
df.head(20).fillna('')
tuning vr_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=20 and C=0.1
tuning pts_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=15 and C=0.01
tuning pts_signs...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=4 and C=0.1
VR points is user
race rank
1 1 13453 20
2 10178 16
3 2417 21
4 10937 4
5 10342 -2
6 10243 -7 x
7 10455 -16
8 10010 -19
2 1 10311 27
2 10119 21
3 12154 17
4 17105 3
5 9956 9
6 11916 2
7 11778 -2
8 9104 -1
9 10265 -9 x
10 12190 -18
11 11686 -24
12 1007 -2

That's it! It works well; the cross-validation result suggests the classifier has perfect accuracy. It's not super surprising, but pretty cool.

The biggest trouble with this classification task has to do with detecting blank spaces that don't contain digits. Blanks can occur for a few reasons: if there are less than 12 players; if a "VR" rating is less than 5 digits; or if the points awarded are single-digit. In any of those scenarios, some random part of the game's visuals ends up in the extracted data.

This issue made it tough at first to get perfect accuracy. There were always a couple samples that got misclassified. My first solution was to just add more training data, which is why there is such a stupidly large number of labeled samples in my training set. It didn't really help though, so I started playing around with pre-processing and found that applying a sharpening filter to the image made a big difference. This filter made each digit stand out better against the background scenes.

Using the individual classifiers

You don't need to use the high-level MKDataCompiler class. Using the classifiers directly might also be interesting:

from MKImageClassifier import MKImageClassifier
import matplotlib.pyplot as plt
import numpy as np

Train the SVM for the points-awarded digits

clf_pts_digits = MKImageClassifier( 'pts_digits_labeled.xlsx', 'pts_digits', n_splits=10)
_ = clf_pts_digits.tune( n_jobs=4)
tuning pts_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=15 and C=0.01

Plot the low-dimensional training data features (i.e. scores from factor analysis)

fig,axs = plt.subplots( ncols=3, figsize=(13,4))
for d in np.unique( clf_pts_digits.mdl.y_train):
    mask = (clf_pts_digits.mdl.y_train == d)
    _ = axs[0].scatter( clf_pts_digits.mdl.scores[mask,1], clf_pts_digits.mdl.scores[mask,3], s=2)
    _ = axs[0].set_xlabel( 'Factor 2', size=16)
    _ = axs[0].set_ylabel( 'Factor 4', size=16)
    
    _ = axs[1].scatter( clf_pts_digits.mdl.scores[mask,3], clf_pts_digits.mdl.scores[mask,6], s=2)
    _ = axs[1].set_xlabel( 'Factor 4', size=16)
    _ = axs[1].set_ylabel( 'Factor 7', size=16)
    
    _ = axs[2].scatter( clf_pts_digits.mdl.scores[mask,0], clf_pts_digits.mdl.scores[mask,6], s=2)
    _ = axs[2].set_xlabel( 'Factor 1', size=16)
    _ = axs[2].set_ylabel( 'Factor 7', size=16)
plt.tight_layout()

Each dot is the low-dimensional representation of one sample of digit pixels. Different samples of the same digit are shown in the same color, and they tend to cluster together (which is why the classifier can tell them apart). Cluster separation isn't great along every factor/dimension so I selected a couple of the more interesting planes for plotting. The clusters appear elongated in the low-D space-- there's probably a way to adjust the normalization to fix this but it works well anyway, so it's Probably Fine as-is.

Fit the other two classifiers

clf_pts_signs = MKImageClassifier( 'pts_signs_labeled.xlsx', 'pts_signs', n_splits=10)
_ = clf_pts_signs.tune( n_jobs=4)

clf_vr_digits = MKImageClassifier( 'vr_digits_labeled.xlsx', 'vr_digits', n_splits=10)
_ = clf_vr_digits.tune( n_jobs=4)
tuning pts_signs...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=4 and C=0.1
tuning vr_digits...done.
classification accuracy w/ 10-fold xval: 100.0% using ncomp=20 and C=0.1

Load a bunch of images to classify (using MKImageLoader, more on that below)...

from MKImageLoader import load_images
vr_digits, pts_digits, pts_signs, user_ranks = load_images( glob.glob( 'images-redacted\\*jpg'))

...and run the loaded data through each classifier

vr_digits_hat  = clf_vr_digits.predict( vr_digits)
pts_digits_hat = clf_pts_digits.predict( pts_digits)
pts_signs_hat  = clf_pts_signs.predict( pts_signs)

Take a look at the predictions

img = 4
fig,axs = plt.subplots( nrows=12, ncols=8, figsize=(9,13))
for rank in range(12):
    for digit in range(5):
        axs[rank,digit+3].imshow( vr_digits[:,:,digit,rank,img])
        axs[rank,digit+3].set_title( vr_digits_hat[digit,rank,img], fontsize=26)
    
    _ = axs[rank,0].imshow( pts_signs[:,:,0,rank,img])
    _ = axs[rank,0].set_title( pts_signs_hat[0,rank,img], fontsize=26)
    
    _ = axs[rank,1].imshow( pts_digits[:,:,0,rank,img])
    _ = axs[rank,1].set_title( pts_digits_hat[0,rank,img], fontsize=26)
    
    _ = axs[rank,2].imshow( pts_digits[:,:,1,rank,img])
    _ = axs[rank,2].set_title( pts_digits_hat[1,rank,img], fontsize=26)
for ax in axs.flatten():
    ax.set_xticks([])
    ax.set_yticks([])
plt.tight_layout()

Work with image data directly

You can use the image loader class, MKImageLoader, if you want more direct access to the pixel data.

from MKImageLoader import MKImageLoader
mkil = MKImageLoader( 'images-redacted\\2020082317571100-16851BE00BC6068871FE49D98876D6C5.jpg')
mkil.main_region

View the winner's pixels

mkil.player_regions[0]

View your own pixels

The image loader auto-detects which place you came in, so you can track yourself easily. This is also important because the image loader has to invert the colors for your stats, otherwise they won't be white-on-a-black-background like all the other players.

mkil.player_regions[mkil.user_rank]

Extract every VR rating as a numpy array of image data (height-by-width-by-digit-by-rank)

vr_digits = mkil.get_vr_digits()
type( vr_digits), vr_digits.shape
(numpy.ndarray, (23, 15, 5, 12))

Inspect individual VR digits from numpy array

fig,axs = plt.subplots( ncols=5, figsize=(4,1))
for i,ax in enumerate( axs):
    ax.imshow( vr_digits[:,:,i,mkil.user_rank])

Do the same with the sign and digits of points awarded

pts_signs = mkil.get_pts_signs()
pts_digits = mkil.get_pts_digits()
pts_signs.shape, pts_digits.shape
((13, 13, 1, 12), (18, 12, 2, 12))
fig,axs = plt.subplots( ncols=3, figsize=(2.5,1))
_ = axs[0].imshow( pts_signs[:,:,0,mkil.user_rank])
_ = axs[1].imshow( pts_digits[:,:,0,mkil.user_rank])
_ = axs[2].imshow( pts_digits[:,:,1,mkil.user_rank])

Load a whole folder full of images

The resulting numpy arrays will be essentially the same dimensions as above, except there is one more dimension added to stack multiple images

all_vr_digits, all_pts_digits, all_pts_signs, all_user_ranks = load_images( glob.glob( 'images-redacted\\*.jpg'))
type(all_vr_digits), all_vr_digits.shape # height,width,ndigit,nplayer,nimages
(numpy.ndarray, (23, 15, 5, 12, 747))

This also returns a list of the user's ranks for each race (rank goes from 0-11, not 1-12)

type(all_user_ranks), len(all_user_ranks)
(list, 747)

Label your own training data

You don't need to do this, but it's here if you want to for some reason. You can label your own data any way you like, but I used pigeon and thought it was pretty convenient. The MKImageLabeler class is provided to help with this:

from MKImageLabeler import MKImageLabeler
mklbl = MKImageLabeler( glob.glob( 'images-redacted\\*.jpg'))

Label the digits for VR ratings (capped at 5 samples just for the example)

mklbl.label_vr_digits( 5)

Convert your labeled data to a pandas DataFrame

df_vr_digits = mklbl.vr_digits_as_df()
df_vr_digits
rank digit path label
0 0 0 images-redacted\2020082314144200-16851BE00BC60... 1
1 0 1 images-redacted\2020082314144200-16851BE00BC60... 3
2 0 2 images-redacted\2020082314144200-16851BE00BC60... 4
3 0 3 images-redacted\2020082314144200-16851BE00BC60... 5
4 0 4 images-redacted\2020082314144200-16851BE00BC60... 3
df_vr_digits.to_excel( 'vr_digits_labeled_example.xlsx', index=False)

Repeat for the digits and signs of points awarded

mklbl.label_pts_digits( 5)

df_pts_digits = mklbl.pts_digits_as_df()
df_pts_digits
rank digit path label
0 0 0 images-redacted\2020082314144200-16851BE00BC60... 2
1 0 1 images-redacted\2020082314144200-16851BE00BC60... 0
2 1 0 images-redacted\2020082314144200-16851BE00BC60... 1
3 1 1 images-redacted\2020082314144200-16851BE00BC60... 6
4 2 0 images-redacted\2020082314144200-16851BE00BC60... 2
mklbl.label_pts_signs( 5)

df_pts_signs = mklbl.pts_signs_as_df()
df_pts_signs
rank digit path label
0 0 0 images-redacted\2020082314144200-16851BE00BC60... +
1 1 0 images-redacted\2020082314144200-16851BE00BC60... +
2 2 0 images-redacted\2020082314144200-16851BE00BC60... +
3 3 0 images-redacted\2020082314144200-16851BE00BC60... +
4 4 0 images-redacted\2020082314144200-16851BE00BC60... -

License

TBD

About

image processing + machine learning pipeline to extract race stats from screenshots of Mario Kart. "But Why?"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages