GOMPERTZ REGRESSION MODEL WITH GAMMA FRAILTY : A STUDY ON THE APPLICATION IN LUNG CANCER

Survival models with frailty are used when some variables are nonavailable to explain the occurrence time of an event of interest. This non-availability may be considered as a random effect related to unobserved covariates, or that cannot be measured, such as environmental or genetic factors. This paper focuses on the Gamma-Gompertz (denoted by G-G) model that is one of a class of models that investigate the effects of unobservable heterogeneity. We assume that the baseline mortality rate in the G-G model is the Gompertz model, in which mortality increases exponentially with age and the frailty is a fixed property of the individual, and the distribution of frailty is a gamma distribution. The proposed methodology uses the Laplace transform to find the unconditional survival function in the individual frailty. Estimation is based on maximum likelihood methods and this distribution is compared with its particular case. A simulation study examines the bias, the mean squared errors and the coverage probabilities considering various samples sizes and censored data. A real example with lung cancer data illustrates the applicability of the methodology, where we compared the G-G and without frailty models via criteria which select the best fitted model to the data.

1 Introduction Vaupel et al. (1979) introduced the term frailty to indicate that different individuals are at risks even though on the surface they may appear to be quite 1 Universidade Federal de São Carlos -UFSCAR, Departamento de Estatística, CEP: 13.565-905, São Carlos, SP, Brasil.E-mail: vera@ufscar.br;tcristina.md@gmail.com 2 Universidade Federal de Goiás -UFG, Instituto de Matemática e Estatística, CEP: 74.690-900 Goiânia, GO, Brazil.E-mail: edinhomilani@hotmail.com similar with respect to measurable attributes such as age, gender, weight, etc.They used the term frailty to represent an unobservable random effect shared by subjects with similar (unmeasured) risks in the analysis of mortality rates.A random effect describes excess risk or frailty for distinct categories, such as individuals or families, over and above any measured covariates.Thus random effects, or frailty models, have been introduced into the statistical literature in an attempt to account for the existence of unmeasured attributes such as genotype that do introduce heterogeneity into a study population.A common approach to the analysis of survival data is to assume a homogeneous population of individuals with the same covariate structure.However, it is clear that individuals identical in many respects such as age, sex and treatment may differ in unmeasured ways, only because of genotypical differences.
It is easy to see that it is important to consider the effect of ignoring frailty in any study where the existence of such heterogeneity may be present.More formally, a heterogeneous population can be sometimes modeled as a mixture problem with an underlying random variable called frailty.This random effect or frailty is introduced in the baseline hazard rate (HR) additively or multiplicatively.Several authors have studied the use of multiplicative frailty models, which represent a generalization of the Cox model (COX, 1972).Andersen (1993) and Houggard (1995) presented a review of the multiplicative frailty models in the classical perspective, whereas Sinha and Dey (1997) presented a review of these models under the Bayesian point of view.Some authors have studied models with univariate frailty.For example, Aalen and Tretli (1999) applied the compound-Poisson distribution to data from testicular cancer, Henderson and Oman (1999) studied the consequence of ignoring the frailty in the fitting, Tomazella et al. (2008) presented an approach involving objective Bayesian reference analysis to the frailty model with survival time univariate, Hanagal and Sharma (2012) considered the shared gamma frailty model with Gompertz distribution as baseline hazard for bivariate survival times, Sharma and Hanagal (2014) proposed frailty regression models in Gompertz mixture distributions and assume the distribution of frailty as gamma or inverse Gaussian or positive stable or power variance function distribution and, Milani et al. (2015) proposed a frailty model with non-proportional hazard.
Suppose that T is the occurence time of a event of the subject (for example, the time to infection for a kidney patient using portable dialysis) and the x is a covariate; then, the probability density of T might be modeled conditional on v, an unobserved non-measurable random variable, called frailty, which is intended to allow for individual variation.This representation can be symbolized by f (t; x, v).Under this representation, the occurence-time distribution can be considered to be continuous mixture induced by the frailty v.
The frailty term of the model is random and a distribution must be assumed for it.Due to the way as the frailty term acts on the HR, natural candidates to the frailty distribution must be supposed as continuous and time independent, such as gamma, inverse Gaussian, log-normal and Weibull distributions (HOUGGARD, 1995).In this paper, we focus on gamma distribution and use the G-G frailty model.The frailty in the frailty model is assumed to follow a gamma distribution and baseline HR is the Gompertz model.
The Gompertz distribution is widely used to fit data from clinical trials and construct life tables in actuarial area.Various authors have studied the parameters estimation of this model.Among others, Garg et al. (1970) obtained the maximum likelihood estimators and properties for the parameters and Lenart (2012) presented a revision about Gompertz distribution.The G-G model is one of a class of models that investigate the effect of unobserved heterogeneity that is either unobservable or unobserved on mortality rates.
Our goal is analyze a dataset that represents a patient's lifetime with lung cancer when we adopt the G-G regression model in a possible cure rate scenario.In this methodology, we use the Laplace transform of the frailty density to obtain the population (or unconditional) survival function.The use of this transformation makes it easier to obtain survival function.
This paper is organized as follows.In Section 2 we present review of frailty model, in Section 3 we present Gompertz regression model and in Section 4 we present Gompertz regression model with gamma frailty and the estimation about the parameters of this model.In section 5 we present a simulation study of the proposed model considering right-censored and no censored data to some samples sizes and in Section 6 we present an application on a real lung cancer dataset.Finally, in Section 7 we make some concluding remarks.

Frailty models
Consider an unobserved source of heterogeneity, which is not readily captured by a covariate on a univariate frailty model.It extends the Cox model, such that the hazard function (HF) of a patient depends on an unobservable value of the random variable V , which acts multiplicatively on the baseline HF.Therefore, the conditional HR of V random variable to pacient i available at time t, given V = v i , is given by where v i is the frailty of the patient i and h 0 (t) is a baseline HF.Note that (1) is known as the Clayton model (CLAYTON, 1991).From (1), note that the HR of the patient i decreases if v i < 1 and increases if v i > 1.The corresponding conditional survival function (SF) can be obtained from (1) as where H 0 (t) = t 0 h 0 (s)ds is the baseline cumulative hazard function.Let (t, δ) be the observed data for a sample of size n, where t i is the occurrencetime of the event of interest and δ i is the indicator of censoring, that is, δ i = 1 if the observation is the time to the event of interest and δ i = 0 if it is right censored, for i = 1, . . ., n.Then, from (1) and (2), the corresponding likelihood function is where µ is the vector of parameters, t = (t 1 , . . ., t n ), v = (v 1 , . . ., v n ) and δ = (δ 1 , . . ., δ n ).Now, conditional on the unobserved frailties v, the likelihood function given in (3) forms the basis for the parameters estimation.The frailties must be integrated out (in closed form or by numerical or stochastic integration, depending on the frailty distribution) to get a likelihood function not depending on unobserved quantities of the type (4)

Unconditional hazard and survival functions
The unconditional (population) SF of T can be obtained by integrating S T |V =vi (t) given in (2) on the frailty v.This function may be viewed as the SF of patients randomly drawn from the population under study (see KLEIN and MOESCHBERGER, 2003;AALEN et al., 2008 andWIENKE, 2011).Unconditional HF and SF can be obtained with the Laplace transform (HOUGAARD, 1984).Then, when seeking distributions for the frailty variable V , it is natural to use frailty distributions with an explicit Laplace transform, because it facilitates the use of traditional maximum likelihood method for parameter estimation.To obtain the unconditional SF, we need to integrate out the frailty component as where f V (v) is the probability density function (PDF) of the V and S T |V (t) is the conditional SF given in (2).
In general, the Laplace transform of real argument s of a function f (x) is given by be the frailty PDF and s = H 0 (t).Then, according to (6), we obtained the Laplace transform of the unconditional SF as Note that (7) conducts to the same form as the unconditional SF given in (5) (see VAUPEL et al., 1979 andWIENKE, 2011).The frailties random variables v i are usually assumed to be independent and identically distributed.As mentioned, the frailty distribution can be gamma, inverse Gaussian or Weibull, which have simple Laplace transforms and then are convenient to use.In this paper we considered a reparametrized version of the gamma distribution traditionally used in frailty models.

Gompertz regression model
The HF of the Gompertz model, in which mortality increases exponentially with age t, is given by h(t|b, α) = be αt , where b denotes the level of the force of mortality at age t = 0 and α the rate of aging.We considered a continuous random variable T with a Gompertz PDF with location parameter b and shape parameter α, The truncated distribution yields a proper density function by rescaling the α parameter to correspond to t = 0 (GARG et al., 1970 andLENART, 2012).The distribution function is Considering b = exp(x β) in ( 8) we have the Gompertz regression model or time-dependent proportional hazard model.This model is defined by the HF given by h(t|α, where α is a measure of the time effect, β =(β 0 , β 1 , . . ., β k ) is a vector of k + 1 unknown parameters measuring the influence of the k covariates x = (1, x 1 , . . ., x k ) and, t represents the univariate survival time of a unit or individual.
The behavior of the HF (9) takes several forms, according to the value of α: for α > 0, the hazard function is increasing; for α < 0, the hazard function is decreasing and for α = 0, the hazard function is constant.The Figure 1 (left) shows some examples of possible shapes of the hazard function.
When the survival times of the n individuals are observed, the ratio of the hazard function of two individuals, i and j, with i = j and i, j = 1, . . ., n, with different covariates vector is given by Note that the time effect disappears in equation ( 10) and hence the proportionality becomes evident.
From equation ( 9) the SF is given by Observe that the function in ( 11) also has its behavior determined by the value of α.For α > 0, S(0|α, β) = 1 and S(∞|α, β) = lim t−→∞ S(t|α, β) = 0, in other words, the survival function is proper.For α < 0, S(0|α, β) = 1 and S(∞|α, β) = 0, the survival function is improper, that is, when α < 0 we have a model for cure rate or long duration, with the cure rate, p, given by Some examples of hazard functions (left) and survival functions are shown in Figure 1 (right).Using the equations ( 9) and ( 11), the PDF of T is given by when the survival function is proper.Let (t i , x i , δ i ) be n the observed times, δ i is an indicator variable and x i is a covariates vector, i = 1, . . ., n.The likelihood function for right-censored data is given by The maximum likelihood estimators (MLEs) are obtained by direct maximization of equation ( 12) or through the log-likelihood function.The asymptotic confidence intervals are obtained assuming asymptotic normality of the MLEs.

Gompertz regression model with gamma frailty
Considering the frailty model given in (1) and the equation ( 9), the HF of the ith individual with the multiplicative frailty term v i (v i > 0) is given by, interpreted as the conditional hazard function of the ith individual given v i and the respective conditional SF is given by In models with multiplicative frailty, we are considering that different individuals have different frailties.Then, the individuals who present the highest values of variable v i tend to die earlier than the individuals who present the lowest values of the same variable.
The frailty model not only explains the heterogeneity among individuals, but it also allows to assess the covariates effect that for some reason were not considered at the fitting.
The value v is not observed, which is why we assume that v is an observation of the random variable V with a given probability density function.In the literature the gamma, lognormal, Weibull and inverse Gaussian distributions are the most used (HOUGAARD, 1995).In this paper we considered that V has a gamma distribution with parameters τ > 0 and η > 0, G(τ, η), with density function written as Considering univariate times, if we built the likelihood function using the hazard and survival functions given in ( 13) and ( 14), respectively, we would have more parameters than observations, so we need to calculate the hazard and unconditional survival functions.In the context of proportional hazard, according to Elbers and Ridder (1982), when working with frailty it is necessary that the random effect distribution has finite mean for the model to be identifiable.This way, in order to keep the identifiability of the model it is convenient to take the distribution with mean 1.
To get the unconditional SF we need to calculate where f (v) is the PDF in (15).In order to calculate the unconditional SF we use the Laplace transform since both have the same shape.The Laplace transform of the gamma distribution (15) with parameters (1/θ, 1/θ) (WIENKE, 2011) and considering s a real argument, is given by Substituting s = H(t) in the equation ( 16), we obtain the unconditional SF, given by The behavior of the SF is determined by the value α.For α > 0 the survival function is proper and for α < 0 it is improper, then we have a long duration model with the cure rate, p, given by From equation ( 17) we get the correspondent hazard function, given by h(t|α, β, θ) = e αt+x β 1 + θ α e x β (e αt − 1) .
In this case we have hazard function of the G-G model that is an attractive explanation for the widely observed pattern of decelerating increase in mortality with age, in both humans and other species (e.g., VAUPEL et al., 1979).Some examples of hazard (18) and survival (17) functions are shown in Figure 2.

Inference
Let (t i , x i , δ i ) be n the observed times, δ i is an indicator variable and x i is a covariates vector, i = 1, . . ., n.The likelihood function for right-censored data, constructed from the equations ( 17) and ( 18), is given by where t = (t 1 , . . ., t n ), x = (x 1 , . . ., x n ) and δ = (δ 1 , . . ., δ n ) is arandom variable censoring indicator.The log-likelihood function (from ( 19)) is given by log The MLEs are obtained by direct maximization of equation ( 19) or by maximization of (20).In the case of uncensored times, the log of the likelihood function and the first and second derivatives of the α, β and θ parameters are shown in Appendix.
The asymptotic confidence intervals are obtained assuming asymptotic normality of the maximum likelihood estimators.The comparison between the Gompertz regression model and Gompertz regression model with gamma frailty is made by considering the Akaike information criterion (AIC) (AKAIKE, 1974) and the Bayesian information criterion (BIC) (SCHWARZ, 1978).The lowest AIC and BIC indicate the best fitted model to data.

Simulation study
In this work, the main concern of the simulation study is to assess the mean absolute bias (MAB) and mean squared error (MSE) of the MLEs, as well as the coverage probabilities of the asymptotic confidence intervals for the parameters of the frailty model.We generated 5,000 samples for each sample size (n = 50, 100, 200 and 300).The parameters were fixed at α = 0.1, β 0 = 1, β 1 = −1.8 and θ = 0.5 and the dummy covariate was generated from a Bernoulli distribution with success probability equal to 0.6.The choice of the parameters was made considering the form of the risk function and for the α > 0 the survival function is proper.The right-censored times were generated from an exponential distribution with mean equals to 16.94 for 10% of censoring and 3.70 for 30% of censoring.The software R (R Core Team, 2017) was used in the analysis.
For each sample we obtained the maximum likelihhod estimatives and the asymptotic 95% confidence intervals.Using these values, we calculated MAB, MSE and the coverage probability (CP).For example, for the parameter α there is where α * is the true value of the parameter α and α j is the maximum likelihood estimate of α in the jth sample, and CP is given by the quotient between the number of intervals containing the true parameter value and the total number of intervals constructed.The MAB and MSE are shown in the Table 1 and the CP in Figure 3.We observed that both MAB and MSE metrics decrease with increasing sample size, for the three scenarios of censoring.We also noted that when the amount of censoring increases, the value of the metrics also increases.Comparing the values of the metrics of the parameters β 0 and β 1 with the values of α and θ, we observed that, the relative increase of the parameters that measure the effect of covariates is almost always smaller than the other parameters, when increasing the amount of censoring present in the sample.
Regardless of the amount of censoring in the sample, the coverage probabilities of parameters seem to converge to the nominal level.We observed that the convergence of the coverage probability of the parameter θ seems to be slower than the coverage probability of other parameters.

Lung cancer data
To illustrate the applicability of the proposed model, we adopted the dataset of the annual incidence of lung cancer in Northern Ireland, between 01/01/1991 to 09/30/1992 (WILKINSON, 1995).In this period, 900 cases of lung cancer were identified, however, we excluded all individuals with missing information on some covariates from the analysis, resulting in 751 patients to have their lifetime analyzed (in months).
We considered for the fitting only the categorical covariates the sodium level (X 1 ) with the categories < 136mmol/l and >=136mmol/l and, albumen level (X 2 ), with categories <35g/l and >=35g/l.We verified the assumption of proportional hazards of these covariates using the graphical method presented in Colosimo and Giolo (2006).The result of the method is shown in Figure 4 and it is possible to note that both covariates show proportional hazard.We fitted the Gompertz regression and G-G models for the dataset, using the possible combinations of the covariates (X 1 only, X 2 only, and, X 1 with X 2 , Tables 4, 5 and 3).We adopted the criteria AIC and BIC to select which model best fitted the data.The results are shown in Table 2.We observed that the model that best fits the data in both criteria is the Gompertz regression model with gamma frailty with the covariates X 1 and X 2 because the values of these criteria are the lowest in these models.Comparing only the scenarios with the same covariates adopted, the model with frailty (G-G) is preferred in all the settings.The maximum likelihood estimates and asymptotic 95% confidence intervals of the Gompertz and G-G models, with covariates X 1 and X 2 are presented in Table 3.We observed changes in the maximum likelihood estimates of the parameters that measure the effect of time and of the covariates, for the models with and without frailty.We also noted, that all parameters are significant, including the parameter θ, which measures the heterogeneity of the individuals.We observed that in the fitted frailty model, there is a reduction of heterogeneity present in the data.For example, in the various scenarios (Table 2) when we fitted the frailty model with X 1 only covariate (Table 4), we got θ = 1.08, when we fitted the model without frailty with X 2 only covariate, we got θ = 0.90 (Table 5), and when we used the two covariates with X 1 and X 2 in the frailty model, we got θ = 0.89 (Table 3).
In Figure 5 we presented the survival functions estimated by Kaplan-Meier (KME) (KAPLAN and MEIER, 1958) and the frailty model.We observed through the graphics that both curves are close, indicating that the G-G model showed a good fit for the data.

Final comments
In this paper we have studied a model where the gamma distribution is employed, in the Gompertz regression model, to describe the unobserved heterogeneity.We have explicitly derived the unconditional hazard rate and the survival functions using the Laplace transformation.To study this model, we presented a simulation study and a real example on lung cancer, which is compared to the modeling without frailty via selection criteria to determine which model best fits the data.More specifically, in the simulation study we considered the presence of frailties, as well as different percentage of censured data (0%, 10% and 30%) and samples sizes (n = 50, 100, 200 and 300).The metrics used to compare the adjusted values with real values are MAB and MSE, and we observed that when the censoring percentage is fixed and the sample size is increased, measures decrease.In addition, when n increased we observed that the estimates for parameters were very close to the real values.This fact occurred in all the studied.We noted that for the parameters α, β 1 and β 0 , the CP is very close to the nominal for n greater than or equal to 200.However, for the parameter θ, the CP has only achieved this for n = 300, in all studied censoring levels.In the model with fragility, the simulation study showed good properties of MLEs, giving which grants us confidence in stating that the estimation of the effect of time and covariates is important and make it possible to explain the data more accurately.
In case a lung cancer data, the use of the Gompertz regression model with gamma fragility, explained the heterogeneity present in the data when there are risks factors not observed.Using the criteria AIC and BIC, there is evidence in favour of this model, when compared to model without frailty (Figure 5).That is, the G-G model with sodium level (X 1 ) and albumen level (X 2 ) covariates fits better the dataset.Also, the use of important covariates on modeling causes decreases in heterogeneity, which is showed by parameter θ.It is important to note that in both cases, simulation study and application, the G-G model best describes the behavior of data and captures the fragility.Also, we focus on the α > 0 case, that is, the survival function is proper.Future work may be performed for the case where the survival function is improper, that is, in long-term models.

Appendix
In the case of uncensored lifetimes, the log of the likelihood function is given by, log(L(α, β, θ|t, x) = − x i .
In the same way, from (21) we wrote the second derivatives to the θ, α and β parameters are given respectively by,

Figure 1 -
Figure 1 -Different forms of the hazard function (left) and survival function (right)to the Gompertz regression model.

Figure 2 -
Figure 2 -Different forms of the hazard function (left) and survival function (right)for the G-G model.

Figure 4 -
Figure 4 -Proportional hazard assumption for the sodium level (left) and albumen level (right) covariates.

Table 2 -
Results of the criteria for the fitted models

Table 3 -
The results of the fit of the without and with frailties models

Table 4 -
The results of the fit of the without and with frailties models with X 1

Table 5 -
The results of the fit of the without and with frailties models with X 2