This article demonstrates the basic use of the msmbayes
package to fit Bayesian multi-state models to longitudinal data
consisting of intermittent observations of a state.
It shows that msmbayes
can reproduce the true
parameter values for a simulated dataset, within an expected margin of
simulation error, and gives similar results to
msm
.
It gives some general hints and warnings about Bayesian implementation of these models, e.g. priors, computational challenges. However this is the subject of ongoing research, so it is somewhat vague.
A general introduction to the theory and practice of multi-state
modelling is given in the documentation and course notes for the msm
package.
msmbayes
essentially fits Bayesian versions of some of
the models in msm
. However msmbayes
has more
limited features - see the front page for a more detailed
comparison.
These examples all use a simulated dataset designed to mimic a longitudinal study where people are repeatedly tested for an infection.
We assume a two-state multi-state model, with states “test positive” and “test negative”.
The data are simulated from a continuous-time Markov model with the following transition intensity matrix. Expressed in days or months, respectively, this is
library(msmbayes)
Qtrue <- rbind(c(-1/180, 1/180),
c( 1/10, -1/10))
Qtruemo <- Qtrue*365/12
That is, everyone starts with no infection, then,
the mean time until the next infection is 180 days (6 months) (the mean sojourn time in state 1)
the mean time with infection is 10 days (0.33 months) (the mean sojourn time in state 2)
We then suppose that 100 people are tested every 28 days, and assume
the test is a perfect indicator of infection. The final dataset stores
the state at each test time in the variable state
.
We also simulate some covariates, including sex and age, and a state
outcome statec
which depends on these covariates (see below).
Note: age is expressed in units of (years - 50)/10. Centering around a typical value (50 years), and scaling to a unit of interest (10 years) will help with both MCMC computation and interpretation of parameters.
head(infsim)
## subject days months state sex age10 statec statep statepc
## 1 1 0 0.00 1 male -0.31322691 1 1 1
## 2 1 28 0.92 1 male 0.09182166 2 1 1
## 3 1 56 1.84 1 male -0.41781431 1 1 1
## 4 1 84 2.76 1 male 0.79764040 1 1 1
## 5 1 112 3.68 1 male 0.16475389 2 1 1
## 6 1 140 4.60 1 male -0.41023419 2 1 2
We analyse this dataset with continuous-time multistate models, where
we assume transitions between states can happen at any time, and not
just at the observation times. In this demonstration, when we fit the
models, we pretend that we only know the state at the time of each test,
and that we don’t know the true times of infection and recovery. This is
the typical style of data that the msm
package is used for
— where the state is only known at a series of arbitrary times.
Note: Both
msm
andmsmbayes
allow any number of states and structure of allowed transitions. This includes models with or without “absorbing” states, such as death. However,msmbayes
has some limitations, listed on the front page. For example, the “exact death times” observation scheme (where we suppose the time of death is known, but the state immediately before death is unknown) is not supported.
First we demonstrate fitting the basic two-state Markov multi-state model with no covariates. There are two unknown parameters: the transition intensities between 1-2 and 2-1.
The first argument to msmbayes
is the dataset, and
additional named arguments indicate the names of the columns in the data
that contain the state, the time of observation and the subject
(individual) identifier.
Note: unlike in
msm()
, the names of variables in the data must be quoted as strings, not “bare” variable names.
Q <- rbind(c(0, 1),
c(1, 0))
draws <- msmbayes(data=infsim, state="state", time="months", subject="subject",
Q=Q)
The argument Q
to msmbayes()
is a square
matrix that indicates the transition structure. This is in the same
format as msm()
:
The number of rows (or columns) indicates the number of states, here 2.
The diagonal of this matrix is ignored - what you specify on the diagonal doesn’t matter.
The off-diagonal entries of Q
which
are 1 indicate the transitions that are allowed in
continuous time (here, state 1 to state 2, and 2 to 1).
The off-diagonal entries of Q
which
are 0 indicate the transitions that are not allowed in
continuous time.
The parameters of the model, \(q_{rs}\), are transition intensities in
continuous time. These are not probabilities, but rates. In particular,
they are not probabilities of transition over an interval of time, as in
a discrete-time Markov model. See, e.g. the msm
course notes for more discussion of this distinction.
To interpret these values, note that \(q_{rs}\) is the rate at which transitions to \(s\) are observed for a population in state \(r\). So \(1/q_{rs}\) is the mean time to the next transition to \(s\) that would be observed in a population in state \(r\), if we were to observe one person at a time (switching to observing a different person if a “competing event”, i.e. a transition to a state other than \(s\), happens)
In a Bayesian model, prior distributions must be defined for all
parameters. In msmbayes
, normal priors are used for the log
transition intensities. The mean and standard deviation of these priors
can be set through the priors
argument. This can be done in
various alternative ways:
priors <- list(
msmprior("logq(1,2)", mean=-1, sd=5),
msmprior("logq(1,2)", mean=-1, sd=5)
)
We can also specify two out of the median
(i.e. 50%
quantile), the lower
95% quantile or the upper
95% quantile for any of these quantities. Perhaps the easiest to
interpret is \(1/q_{rs}\), the mean
time to event \(s\) for people in \(r\), supplied here as
time(r,s)
. Only two quantiles should be provided, because
this allows a unique normal distribution on \(\log(q_{rs})\) to be deduced. Different
specifications can be mixed for different parameters, e.g.
priors <- list(
msmprior("q(2,1)", lower=0.001, upper=5),
msmprior("time(1,2)", median=10, upper=30)
)
Note: the prior on
time
represents a belief about the average in a population - not a distribution for individual outcomes. Here, it means that we expect the average time from state 2 to state 1 is 10 months, but this average could be as high as 30. We are not saying that we expect to see individual times to events of up to 30.
For any parameters not supplied in the priors
argument,
a normal distribution with a mean of -2 and a standard deviation of 2
will be used for \(\log(q_{rs})\). This
implies a 95% credible interval of between \(\exp(-6)=0.002\) and \(\exp(2)=7\) for the event rate, equivalent
to a mean time to event of between 0.1 and 400. This is appropriately
vague in many applications, but a more thoughtful choice is recommended
in practice. Note the prior depends on the time unit (e.g. days or
months).
The object priors
is supplied as the priors
argument to msmbayes
,
e.g. msmbayes(..., priors=priors,...)
The msmbayes()
function uses Stan to draw a sample from the joint
posterior distribution of the model parameters. By default, MCMC is
used, but faster approximations are available (see below).
This returns an object in the draws
format defined by
the posterior
R package. This format is understood by
various Bayesian R packages. For example, we can use the
bayesplot
package to check that the MCMC chains have
converged, using trace-plots of the main parameters (here the log
transition intensities, labelled logq
), and to examine the
posterior distributions. Trace plots should look flat and fuzzy, like a
sequence of independent draws from the same distribution, if the chains
have converged (as here).
library(bayesplot)
mcmc_trace(draws, pars=c("logq[1]","logq[2]"))
mcmc_dens(draws, pars=c("logq[1]","logq[2]"))
To summarise the basic parameter estimates (transition intensities
here) use the summary()
function. This also gives the
“Rhat” convergence diagnostic, which should be close to 1.0 if the MCMC
estimation has converged (see here for
more explanation.)
summary(draws)
## # A tibble: 2 × 5
## name from to value rhat
## <chr> <int> <int> <rvar[1d]> <dbl>
## 1 q 1 2 0.19 ± 0.24 1.01
## 2 q 2 1 4.23 ± 5.24 1.01
Transition intensities are hard to interpret in isolation. However we can obtain from them the mean sojourn times in each state. The mean sojourn times are estimated to be (within estimation error) close to the true values of 6 and 0.33 months, as expected.
mean_sojourn(draws)
## # A tibble: 2 × 2
## state value
## <int> <rvar[1d]>
## 1 1 6.45 ± 1.709
## 2 2 0.29 ± 0.076
See also qdf()
to get just the intensities, or
qmatrix()
for the same quantities in a matrix rather than
data frame format.
The value
column of data frames returned by these
functions is of type rvar
. This data type is from the
posterior
package. It is designed to contain random
variables, such as uncertain quantities in Bayesian models. Instead of a
single value, it stores a sample from the quantity’s distribution. Here
this is the posterior distribution. It is printed here as a posterior
mean \(\pm\) one standard
deviation.
The summary
function can be used on these data frames to
extract posterior summary statistics from the object in the
value
column. With no further arguments, the default
summary statistics from the posterior
package will be used.
Or the user can specify their own functions, as in the following call.
See the posterior
package documentation
for more examples.
summary(mean_sojourn(draws), mean, median, ~quantile(.x,c(0.025, 0.975)))
## state mean median 2.5% 97.5%
## 1 1 6.454005 6.6578060 1.96612006 9.1542458
## 2 2 0.292686 0.3040692 0.08669233 0.4129587
To demonstrate how to fit a Markov multi-state model with covariates, some data are simulated from a model with covariates.
The variable statec
in the infsim
data is
simulated from a continuous-time Markov model with intensities \(q_{rsi} = q_{rs}^{(0)} \exp(\beta_{rs}^{(male)}
male_i + \beta_{rs}^{(age)} age_i)\), where \(i\) is an individual, \(male_i\) is a binary indicator of whether
they are male, and \(age_i\) is their
age. The log hazard ratios for the effects of covariates on the 1-2
transition are \((\beta_{12}^{(male)},
\beta_{12}^{(age)}) = (2,1)\), and for the 2-1 transition, \((\beta_{21}^{(male)}, \beta_{21}^{(age)}) =
(0,-1)\).
\(q_{rs}^{(0)}\) is the intensity
for a “reference” or baseline individual, defined as a 50 year old
woman. The same values as above are used for
this. In the data infsim
, age is expressed in units of 10
years, and centered around age 50. Centering around a value of interest
helps with interpretation of the intercept term \(q_{rs}^{(0)}\). Centering and scaling
covariates to a roughly unit scale also tends to help with computation,
especially for MCMC.
To include covariates in a msmbayes
model, a
covariates
argument is included. The following model
supposes that sex and age are predictors of both the 1-2 and 2-1
transition intensities. Weakly informative priors are defined for the
covariate effects, through specifying upper 95% credible limits of 50
for the hazard ratios.
priors <- list(msmprior("hr(age10,1,2)",median=1,upper=50),
msmprior("hr(age10,2,1)",median=1,upper=50),
msmprior("hr(sexmale,1,2)",median=1,upper=50),
msmprior("hr(sexmale,2,1)",median=1,upper=50))
draws <- msmbayes(infsim, state="statec", time="months", subject="subject",
Q=Q, covariates = list(Q(1,2)~sex+age10, Q(2,1)~sex+age10),
fit_method="pathfinder",
priors = priors)
The syntax of the covariates
argument is a list of
formulae. Each formula has
a left-hand side that looks like Q(r,s)
indicating
which \(r \rightarrow s\) transition
intensity is the outcome,
a right-hand side giving the predictors as a linear model formula.
For example, Q(2,1) ~ sex + age
indicates that \(\log(q_{21})\) is an additive linear
function of age and sex.
Note: This syntax is slightly different from
msm
, where we would use a named list of formulae with empty “outcome” left-hand sides, e.g.covariates = list("1-2" = ~sex+age, "2-1" = ~ sex+age)
.
After fitting a msmbayes
model with covariates, the
summary
function on the model object returns the estimates
of the baseline intensities \(q_{rs}^{(0)}\) and the log hazard ratios
\(\beta_{rs}\), indicated here by
covariate name and transition.
summary(draws)
## # A tibble: 6 × 5
## name from to value rhat
## <chr> <dbl> <dbl> <rvar[1d]> <dbl>
## 1 q 1 2 0.21 ± 0.073 1.00
## 2 q 2 1 4.76 ± 1.630 0.999
## 3 sexmale 1 2 1.86 ± 0.408 1.00
## 4 age10 1 2 1.12 ± 0.322 1.00
## 5 sexmale 2 1 -0.42 ± 0.401 0.999
## 6 age10 2 1 -0.98 ± 0.395 1.00
The estimates of the log hazard ratios are close to the true values of 2, 1, 0, and -1 used to simulate the data, within estimation error.
The function hr
can be used to summarise the hazard
ratios \(\exp(\beta_{rs})\). Note the
posterior distributions are skewed, hence the posterior mean of the
hazard ratio is different from the \(\exp()\) of the posterior mean of the log
hazard ratio. The mode or median may be preferred as a point
estimate.
hr(draws)
## from to name value
## 1 1 2 sexmale 6.94 ± 2.76
## 2 1 2 age10 3.22 ± 0.98
## 3 2 1 sexmale 0.71 ± 0.27
## 4 2 1 age10 0.40 ± 0.15
priors
argument, as above, either by specifying the mean
and SD on the log hazard ratios, or quantiles of either the hazard
ratios or the log hazard ratios. Note the specific covariate should be
referred to as well as the transition. For example, to define priors for
the effect of age on the 1-2 and 2-1 transition rates respectively:priors <- list(msmprior("loghr(age,1,2)", mean=0, sd=2),
msmprior("hr(age,2,1)", median=1, upper=5))
For any not specified through priors
, a default normal
prior with mean 0 and SD 10 is used for the log hazard ratio. Be careful
of this prior choice. Bear in mind the likely size of covariate effects
when choosing the prior variance. Make sure the prior represents any
evidence about the effect that exists outside the data.
The fit_method
argument specifies the quoted name of
a function in cmdstanr
to be used to do the sampling. “sample”
is the default. Users can pass arguments to these functions in to
msmbayes
to control sampling, e.g. initial values, parallel
processing…
fit_method="laplace"
uses a Laplace
approximation which determines the posterior mode exactly, then uses
a normal approximation to the distribution around this mode. MCMC
reveals some skewed posterior distributions in this case, so the Laplace
approximation is not ideal. However, it is much faster than
MCMC.
fit_method="pathfinder"
implements a variational
approximation to the posterior. This is designed to be better than
Laplace, and better than the standard variational method. It runs much
faster than MCMC in this case, and is used in the example
above.
fit_method="variational"
runs quickly, similar to
the Laplace method, but seems to give less nuanced posterior
distributions than the Pathfinder method.
This example works with Stan’s default MCMC initial values, which
are sampled from the priors. Users could also supply the
init
argument to msmbayes
, which is passed to
Stan.
If the model fits the data badly, then the sampling may be bad with any method.
Sampling can also be bad when covariate values are on different scales. Centering and scaling is usually sensible, as mentioned.
Compare with the same model fitted in the msm
package by
maximum likelihood. Awkwardly, the optimiser used by msm
needs to be tuned (with the fnscale
option) to find the
global optimum.
library(msm)
msm(statec~months, subject=subject, data=infsim, qmatrix=Q,
covariates=list("1-2"=~sex+age10, "2-1"=~sex+age10),
control=list(fnscale=1000, maxit=10000))
##
## Call:
## msm(formula = statec ~ months, subject = subject, data = infsim, qmatrix = Q, covariates = list(`1-2` = ~sex + age10, `2-1` = ~sex + age10), control = list(fnscale = 1000, maxit = 10000))
##
## Maximum likelihood estimates
## Baselines are with covariates set to their means
##
## Transition intensities with hazard ratios for each covariate
## Baseline sexmale
## State 1 - State 1 -0.5114 (-1.2556,-0.2083)
## State 1 - State 2 0.5114 ( 0.2083, 1.2556) 4.5467 (1.5166,13.631)
## State 2 - State 1 3.7258 ( 1.5021, 9.2415) 0.4722 (0.1407, 1.585)
## State 2 - State 2 -3.7258 (-9.2415,-1.5021)
## age10
## State 1 - State 1
## State 1 - State 2 2.3576 (0.68633,8.099)
## State 2 - State 1 0.2687 (0.07144,1.011)
## State 2 - State 2
##
## -2 * log-likelihood: 2625.65
exp(c(2,1,0,-1))
## [1] 7.3890561 2.7182818 1.0000000 0.3678794
The estimates of the hazard ratios (note, not the log hazard ratios here) are within estimation error of the true values, but they differ from those of the Bayesian model, due to the influence of the prior distributions. This shows that the information from the data is not strong enough to overcome the influence of the prior, and suggests that priors should be thoughtfully chosen and transparently communicated if this were a real example.
One common problem with multi-state models for
intermittently-observed data is that the likelihood surface is awkwardly
shaped. When using msm
, this can lead to the optimiser
appearing to “converge”, but to a local optimum, or “saddle point”,
rather than the true global maximum of the likelihood. Symptoms of this
problem may include very large confidence intervals (which happens here
if fnscale
is not set), or a warning related to the Hessian
matrix being non-invertible. More seriously, there may be convergence to
the wrong optimum but with no obvious symptoms - this can only be
detected by running the optimiser many times with different initial
values.
An advantage of the Bayesian estimation methods used here is that after sampling, we can see the full shape of the posterior distribution. In this example, the posterior is skewed for all the model parameters, and there is evidence that it may be “multimodal”, i.e. has several peaks.
mcmc_dens(draws, pars=c("logq[1]","logq[2]","loghr[1]","loghr[2]","loghr[3]","loghr[4]"),
adjust=1)
See the vignette about advanced multi-state
models in msmbayes
.