GitHub - thecomeonman/CodaBonito: Functions to aid football / soccer analysis

This document offers a brief introduction to the functions in this library. Some function might be fairly straightforward to create but some functions may need you to look up the help entry before trying it out.

Disclaimer / Call for Inputs

This library is very much in development. It is a compilation of code I’ve written across different project which is why there may be syntactical inconsistencies, such as some functions using player names and some others using player IDs as a reference for players. Some of this is also code that I have never published the output of so it may need additional arguments. I have, however, tried to give enough documentation for each function so anyone trying to use the library should be well equipped with instructions and a basic understanding of how the functions work.

Any inputs and feedback is welcome. If you’re on github, then head to github.com/thecomeonman/CodaBonit, or else get in touch on Twitter

How to get started

Install R from https://cran.r-project.org

Open R and run this command in the console - install.packages("devtools"); library(devtools); install_github("thecomeonman/CodaBonito");

And you’re ready to run the examples below!

Data

I have added some fake data along with the package to be able to better explain the usage of these functions.

dtPlayerMetrics - aggregated data for players is typically in a format similar to this, with some extra details about the team they play for, their age, etc.

PlayerName	TeamName	playerId	Metric1	Metric2	Metric3	Metric4	Metric5	Metric6	Metric7
gjn xfv	jsw	1	2.229299	0.5955696	1.0000000	0.8763470	0.7329688	3.158645	0.0000013
yqp bfe	rzu	2	3.097161	0.9443782	0.0029271	0.8706489	0.8115634	2.880184	0.0805346
rjs mrx	svk	3	3.132211	0.1286577	0.0021049	0.9959918	1.0587961	6.371049	0.2331633
jtw fqd	rdz	4	2.440632	0.5247019	0.9977317	0.4593465	1.4070274	4.061111	0.0000364
gja jvi	bhj	5	3.325477	0.9318757	0.0000363	0.9999948	2.2463044	7.112237	0.0271460
mol euq	yza	6	2.483550	0.4419821	1.0000000	0.9936560	1.3460208	2.276407	0.0845131

dtMetricCategorisation - some metadata about the metrics. - variableLabel is the name that will be displayed in charts for that metric, - variableCategory is the grouping of variables used in some visualisations, like fNormalisedValueChart - HighValueIsBad is marked true for variables where a high value is bad. Variables such as fouls and goals conceded would be true.

variable	variableLabel	variableCategory	HighValueIsBad	suffix
Metric1	Metric 1	Offense	FALSE
Metric2	Metric 2	Offense	FALSE
Metric3	Metric 3	Defense	FALSE	%
Metric4	Metric 4	Offense	TRUE
Metric5	Metric 5	Defense	FALSE
Metric7	Metric 7	Defense	FALSE	%

dtPasses - passing data. - x,y denote the start coordinates of the pass - endX, endY denote the end coordinates of the pass - passLength is the length of the pass - passAngle is the angle of the pass in radians ( 180 degress = pi radians ) where 0 is along the pitch from defense to offense. - Success 1 for successful pass, 0 for failed pass

playerId	x	y	endX	endY	passLength	passAngle	Success	recipientPlayerId
1	8.187907	56.49550	10.14677	65.23493	8.956269	2.9552227	1	3
1	4.806998	23.82662	32.60867	74.70806	57.981493	1.3998821	1	3
1	6.449829	28.04108	47.98204	41.56566	43.678820	0.5959662	1	8
1	8.502368	25.26964	39.34758	50.24656	39.689717	1.0575427	1	8
1	14.719737	53.69758	64.79699	11.99930	65.165003	-1.3106447	1	2
1	11.117120	49.90731	25.42474	8.75858	43.565195	-2.1075148	1	8

dtFormation - Coordinates as per the formation

playerId	x	y
1	15	40
2	35	20
3	35	60
8	60	40
9	90	40

dtPlayerLabels - Player labels

playerId	playerName
1	asd qwe
2	qwe rty
3	ghj zxc
8	fgh rty
9	cvb dfg

lTrackingData$dtTrackingData - Tracking data

Tag	Player	Frame	X	Y	Time_s	VelocityX	VelocityY
Away	AwayPlayer1	0	80.69826	55.22010	0.0	0.000000	0.0000000
Away	AwayPlayer1	1	80.35326	55.11603	0.2	-1.724991	-0.5203465
Away	AwayPlayer1	2	80.00826	55.01196	0.4	-1.724991	-0.5203465
Away	AwayPlayer1	3	79.66327	54.90789	0.6	-1.724991	-0.5203465
Away	AwayPlayer1	4	79.31827	54.80382	0.8	-1.724991	-0.5203465
Away	AwayPlayer1	5	78.97327	54.69975	1.0	-1.724991	-0.5203465
## Visu	alisations

geom_pitch

fAddPitchLines draws pitch markings with further customisations available

pPitch = ggplot() + geom_pitch()

print(pPitch)

You can add whatever stats you want on top of it like regular ggplot2

# adding passing data on top now
pPitch = pPitch +
   geom_point(
      data = dtPasses,
      aes(x = x , y = y)
   )

print(pPitch)

theme_pitch

If you aren’t interested in having the axis markings, etc., use theme_pitch

pPitch = pPitch +
   theme_pitch()

print(pPitch)

fStripChart

pStripChart = fStripChart (
   dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample',
   vnExpand = c(-0.3, -0.03, 1.2, 1.3)
)

print(pStripChart)

### fBeeswarmChart

pBeeswarmChart = fBeeswarmChart (
   dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample'
)

print(pBeeswarmChart)

fPercentileBarChart

pPercentileBarChart = fPercentileBarChart(
   dtDataset = dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample'
)
print(pPercentileBarChart)

fPercentileBarChart with AbsoluteIndicator

Percentiles can be a little misleading if the underlying numbers aren’t uniformly distributed. You can add annotations for an indicator of the absolute spread of the values and where this particular player’s values fall within that spread.

pPercentileBarChart = fPercentileBarChart(
   dtDataset = dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample',
   # vnQuantileMarkers = c(0.01, 0.25, 0.5, 0.75, 0.99),
   bAddAbsoluteIndicator = T
)

print(pPercentileBarChart)

fRadarPercentileChart

I disapprove of radar charts. It’s a bad visualisation, prone to misinterpretation. They seem to be the accepted norm of comparing players though which is why I had to sell out and have an implementation of that in the package, but I’ve added a warning which states how you should use one of the other visualisations instead as those are better structured visualisation.

pRadarPercentileChart = fRadarPercentileChart (
   dtPlayerMetrics = dtPlayerMetrics,
   vcColumnsToIndex = c('playerId','PlayerName','TeamName'),
   dtMetricCategorisation = dtMetricCategorisation,
   iPlayerId = 2,
   cTitle = 'Sample'
)
#> Warning in fRadarPercentileChart(dtPlayerMetrics = dtPlayerMetrics,
#> vcColumnsToIndex = c("playerId", : Radar charts can be misleading. Use
#> fPercentileBarChart instead.
print(pRadarPercentileChart)

fPlotSonar

pPlotSonar = fPlotSonar(
   dtPassesToPlot = dtPasses,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nZoomFactor = NULL,
   nXLimit = 120,
   nYLimit = 80,
   bAddPitchBackground = F,
   cTitle = NULL
)
print(pPlotSonar)

# Sonar broken up by pitch area
dtPassesByPitchArea = dtPasses[,
   list(
      playerId,
      passLength,
      passAngle,
      x,
      y,
      Success,
      xBucket = (
         ifelse(
            x %/% 20 == 120 %/% 20,
            ( x %/% 20 ) - 1,
            x %/% 20
         ) * 20
      ) + 10,
      yBucket = (
         ifelse(
            y %/% 20 == 80 %/% 20,
            ( y %/% 20 ) - 1,
            y %/% 20
         ) * 20
      ) + 10
   )
]

pPlotSonarVariation1 = fPlotSonar(
   dtPassesToPlot = dtPassesByPitchArea,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nZoomFactor = NULL,
   nXLimit = 120,
   nYLimit = 80,
   bAddPitchBackground = T,
   cTitle = 'Sample by Area of Pitch'
)
print(pPlotSonarVariation1)

# Sonar broken up player, placed at their median passing location
dtPassesByPlayer = merge(
   dtPasses,
   merge(
      dtPasses[,
         list(
            xBucket = median(x),
            yBucket = median(y)
         ),
         list(
            playerId
         )
      ],
      dtPlayerLabels[,
         list(
            playerId,
            bucketLabel = playerName
         )
      ],
      c(
         'playerId'
      )
   ),
   c(
      'playerId'
   )
)

pPlotSonarVariation2 = fPlotSonar (
   dtPassesToPlot = dtPassesByPlayer,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nYLimit = 80,
   nXLimit = 120,
   bAddPitchBackground = T,
   cTitle = 'Sample By Median Position On Pitch'
)
print(pPlotSonarVariation2)

# Sonar broken up player, placed at the location dictated by their role
# in the formations

dtPassesByPlayerFormation = merge(
   dtPasses,
   merge(
      dtFormation[,
         list(
            xBucket = x,
            yBucket = y,
            playerId
         )
      ],
      dtPlayerLabels[,
         list(
            playerId,
            bucketLabel = playerName
         )
      ],
      c(
         'playerId'
      )
   ),
   'playerId'
)
pPlotSonarVariation3 = fPlotSonar(
   dtPassesToPlot = dtPassesByPlayerFormation,
   iBlocksInFirstRing = 4,
   iNbrRings = 8,
   nXLimit = 120,
   nYLimit = 80,
   bAddPitchBackground = T,
   cTitle = 'Sample By Formation'
)
print(pPlotSonarVariation3)

fPassNetworkChart

pPassNetworkChart = fPassNetworkChart(
   dtPasses,
   dtPlayerLabels
)
print(pPassNetworkChart)

fXgBuildUpComparison

pXgBuildUpComparison = fXgBuildUpComparison(
   dtXg,
   dtTeamLabels
)
print(pXgBuildUpComparison)

fDrawVoronoi

WIP using the same data structure as https://github.com/metrica-sports/sample-data which you can parse with fParseTrackingDataBothTeams

You can draw Voronois

pVoronoi = fDrawVoronoiFromTable(
   lTrackingData$dtTrackingData[Frame == min(Frame)],
   nXLimit = 120,
   nYLimit = 80
)

print(pVoronoi)

And if you have multiple frames -

voronoiOutput = fDrawVoronoiFromTable(
   lTrackingData$dtTrackingData,
   nXLimit = nXLimit,
   nYLimit = nYLimit,
   UseOneFrameEvery = 1,
   DelayBetweenFrames = 5
)


if ( !interactive() ) {

   qwe = suppressWarnings(
      file.remove('./README_files/figure-markdown_strict/Voronoi.gif')
   )
   rm(qwe)

   qwe = file.copy(
      voronoiOutput,
      './README_files/figure-markdown_strict/Voronoi.gif'
   )

   rm(qwe)

}

The Friends of Tracking pitch control model -

lPitchControl = fGetPitchControlProbabilities (
    lData = lTrackingData,
    viTrackingFrame = lTrackingData$dtTrackingData[, unique(Frame)[5]],
    nYLimit = nYLimit,
    nXLimit = nXLimit,
    iGridCellsX = nXLimit / 3
)
    
pPlotPitchControl = fPlotPitchControl(
    lPitchControl
)

print(pPlotPitchControl)

Some of my other experiments with tracking data are here - https://github.com/thecomeonman/MakingFriendsWithTrackingData ## Logic and Algorithms

fEMDDetailed

A function to calculate earth mover’s distance. It offers more flexibility and transparency than emdist:emd.

Any distance matrix can be used to calculated EMD, but emdist:emd insists on getting the raw distributions with only up to four dimensions. fEMDDetailed only requires a distance matrix between each combination of observations in the two datasets, irrespective of the nature of the data.

# Two random datasets of three dimension
a = data.table(matrix(runif(21), ncol = 3))
b = data.table(matrix(runif(30), ncol = 3))

# adding serial numbers to each observation
a[, SNO := .I]
b[, SNO := .I]

# evaluating distance between all combinations of data in the two datasets
a[, k := 'k']
b[, k := 'k']
dtDistances = merge(a,b,'k',allow.cartesian = T)
dtDistances[,
   Distance := (
      (( V1.x - V1.y) ^ 2) +
      (( V2.x - V2.y) ^ 2) +
      (( V3.x - V3.y) ^ 2)
   ) ^ 0.5
]

# getting EMD between this dataet
lprec = fEMDDetailed(
   SNO1 = dtDistances[, SNO.x],
   SNO2 = dtDistances[, SNO.y],
   Distance = dtDistances[, Distance]
)

print(fGetEMDFromDetailedEMD(lprec))
#> [1] 0.4185668

# This value should be the same as that computed by emdist package's emd function.
# EMD needs the weightage of each point, which is assigned as equal in our
# function, so giving 1/N weightage to each data point
# emdist::emd(
#    as.matrix(
#       a[, list(1/.N, V1,V2,V3)]
#    ),
#    as.matrix(
#       b[, list(1/.N, V1,V2,V3)]
#    )
# ))

On the topic of transparency, one of the things I find very useful is that you can now see how much distance is being contributed by each observation.

dtDistances[, EMDWeightage := get.variables(lprec)]
ggplot(dtDistances) +
   geom_point(
      data = dtDistances,
      aes(
         x = factor(SNO.x),
         y = factor(SNO.y),
         size = Distance,
         color = EMDWeightage
      )
   ) +
   scale_colour_continuous(
      low = 'black',
      high = 'red'
   ) +
   coord_fixed() +
   xlab('SNO.x') +
   ylab('SNO.y')

Data Parsing

You will find fJsonToListOfTables, fJsonToTabular, fParseTrackingData useful. They aren’t glamarous enough to be demoed here but the documentation should help you use those functions.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
R		R
README_files/figure-markdown_strict		README_files/figure-markdown_strict
data		data
figure		figure
man		man
vignettes		vignettes
.DS_Store		.DS_Store
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CodaBonito.Rproj		CodaBonito.Rproj
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disclaimer / Call for Inputs

How to get started

Data

geom_pitch

theme_pitch

fStripChart

fPercentileBarChart

fPercentileBarChart with AbsoluteIndicator

fRadarPercentileChart

fPlotSonar

fPassNetworkChart

fXgBuildUpComparison

fDrawVoronoi

fEMDDetailed

Data Parsing

About

Releases

Packages

Contributors 2

Languages

License

thecomeonman/CodaBonito

Folders and files

Latest commit

History

Repository files navigation

Disclaimer / Call for Inputs

How to get started

Data

geom_pitch

theme_pitch

fStripChart

fPercentileBarChart

fPercentileBarChart with AbsoluteIndicator

fRadarPercentileChart

fPlotSonar

fPassNetworkChart

fXgBuildUpComparison

fDrawVoronoi

fEMDDetailed

Data Parsing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages