 #### 聯系方式 #### 您當前位置：首頁 >> Matlab編程Matlab編程

###### 日期：2019-05-06 11:15

Midterm Project

Names:

Student IDs:

1 Problem Description

Read article: Maximum Likelihood Algorithms for Generalized Linear Mixed Models (McCulloch

1997), and try to understand the basic concept of generalized linear mixed model

(GLMM). Section 3.1 in this paper described a Monte Carlo EM (MCEM) method to derive

Maximum Likelihood Estimates (MLE). Please use your own software (Matlab, R, C++,

etc.) to perform following simulation study and answer related questions.

1.1 Model and Notations

In this project, we consider a clustering problem. Suppose we have observed n observations,

each observation is a binary process, i.e. the response Yij = 0 or 1, i = 1, ..., n,

j = 1, ..., T. Here n is the number of subjects and T is the length of observation. In general,

T might vary across subjects, time points may also be different. In this project, however, we

simply assume that all subjects have common time length and time points. We also assume

that these subjects belong to two clusters. For each cluster, the conditional expectation of

1

response variable is:

Pij ≡ E(Yij |Ui = 1, X1,ij , Z1,i) = g1

(β1X1,ij + Z1,i)

Pij ≡ E(Yij |Ui = 2, X2,ij , Z2,i) = g1

(β2X2,ij + Z2,i) (1)

where U is cluster membership, Xc,ij and Zc,i (c = 1, 2) are fixed and random effects,

respectively. The link function g1（x) = exp(x)

1+exp(x)

is given. In a typical clustering problem, U

is usually unknown, and hence we treat U as another random effect.

For random effects, we assume that Zc,i ～ N(0, σ2c) and P(U = 1) = π1 (then π2 =π1).

Then the parameter to be estimated is .= {β1, β2, σ1, σ2, π1}. Treating random effects as

missing data, one can write the complete data likelihood function as

where fc(Zc,i) is the density function of Normal distribution, fc(Yij |Zc,i) = P

ωic is the dummy variable of Ui, i.e.

1 if subject i belongs to cluster c

0 otherwise,

1.2 Simulation Setup and Requirement

Generate 100 simulations. In each simulation, set n = 100 and T = 10. The true values of

parameter are: β1 = β2 = 1, π1 = 0.6,σ1 = 2, and σ2 = 10.

Before you start, use N(0, 1) to generate the fixed effect X, and use them for all 100

simulations. Please follow the paper mentioned earlier and use MCEM to evaluate the loglikelihood

function. In the E-step, perform K = 500 Gibbs sampling incorporated with a

Metropolis-Hastings step, and drop the first 100 as a burn-in procedure.

2

(1) Please write down the form of Monte Carlo average of log-likelihood (which you are

going to evaluate)

(2) Please write down details of the EM algorithm you use for this simulation, especially

the Metropolis-Hastings steps.

(4) How to accelerate your EM algorithm? Any improvement you can observe?

(4) Try different numbers of simulations: 200,300,...,1000. And plot the corresponding

MSE.