 #### 聯系方式 #### 您當前位置：首頁 >> OS作業OS作業

###### 日期：2018-09-24 01:46

Lab Assignment #1 - Estimation

Due: Monday, 9/24 at the beginning of class.

Disclaimer: Labs typically take more time to complete than a regular homework assignment.

In this lab you will:

? Explore bias & variance;

? Numerically calculate MLE’s.

Format:

The lab has two sections which will walk you through each topic and example code (there is a .R file

containing all of code provided in this lab). At the end of each section, you will find a box with a lab

question. This is to be completed for your homework due Monday 9/24. You must include all of your code

and relevant graphs.

Remember: to get the help file for any R function, just type ?functionname in the console.

Section #1: Bias & Variance

Part I

Now that we are all experts at simulating data and writing for loops, we will apply this knowledge to some

dart throwing.

Recall from lecture:

?

?θ is a random variable.

It is a function of X1, . . . , Xn and will therefore produce a different estimate from each different sample.

? As a RV, ?θ has a sampling distribution.

This is the distribution of values of ?θ produced from all possible samples of the same size from the

population.

? We would like ?θ to be unbiased.

The sampling distribution of ?θ is centered at θ.

? We would like ?θ to have small variance.

The sampling distribution of ?θ is concentrated as much as possible around θ.

We are going to simulate data to provide estimates of θ that have varying properties (low bias/low variance,

high bias/high variance, low bias/high variance, high bias/low variance).

To begin, we must first create the target. Run the following code:

t = seq(0, 2*pi, length = 1000)

# Circle centered at (0,0) with radius 2

coords = t(rbind(sin(t)*2, cos(t)*2))

plot(coords, type = ’l’, xlab = "", ylab = "", xaxt = ’n’, yaxt = ’n’,

A. Flynt, Department of Mathematics, Bucknell University, MATH 304 1

xlim = c(-3, 3), ylim = c(-3, 3))

# Circle centered at (0,0) with radius 1.75

coords.out = t(rbind(sin(t)*1.75, cos(t)*1.75))

lines(coords.out, type = ’l’)

# Circle centered at (0,0) with radius 1

c.1 = t(rbind(sin(t)*1, cos(t)*1))

lines(c.1, type = ’l’)

# Circle centered at (0,0) with radius .75

c.2 = t(rbind(sin(t)*.75, cos(t)*.75))

lines(c.2, type = ’l’)

# Circle centered at (0,0) with radius .25

coords.in = t(rbind(sin(t)*.25, cos(t)*.25))

lines(coords.in, type = ’l’)

# Circle centered at (0,0) with radius .1

coords.in.in = t(rbind(sin(t)*.1, cos(t)*.1))

# Color in center circle green

polygon(coords.in.in, col = ’green’)

Notice that the outer circle of the target has a radius of 2. When simulating data, we are going to make

draws from two separate Uniform distributions. One that represents our x-coordinate and one that

represents our y-coordinate. If we want to produce estimates centered on the bullseye (low bias) that could

occur at the far edges of the target (high variance), then we will simulate data from two Uniform

distributions with parameters -2 and 2 as follows:

# Initiate vectors for (x, y) coordinates

x.ran = y.ran = NULL

for(i in 1:10){

# Randomly draw a dart from a Uniform(-2, 2).

# Here, left of the bullseye is negative and right of the bullseye is positive.

x.ran[i] = runif(1, -2, 2)

# Here, above the bullseye is positive and below the bullseye is negative.

y.ran[i] = runif(1, -2, 2)

# Draw the dart at the (x,y) coordinates from the previous draw

points(x.ran, y.ran, pch="X", cex=2, col = ’black’)

# Wait 2 seconds between each throw

Sys.sleep(2)

}

A. Flynt, Department of Mathematics, Bucknell University, MATH 304 2

You now have 10 black X’s on your target that represent observations with low bias and high variance.

We can calculate the bias and variance as follows:

# Bias of throws. Just the sum of the average in each (x,y) direction

Bias.black = mean(x.ran) + mean(y.ran)

# Variance of throws. Just the sum of the variance of each (x,y) direction

Variance.black = var(x.ran) + var(y.ran)

Lab Question #1

You will now generate darts for the scenarios of low variance/low bias (color red), low variance/high bias

(color blue), high variance/high bias (color orange). For each set of darts, calculate bias and variance

# Legend for bias of each scenario

legend("topleft", title = "Bias", legend=round(c(Bias.black, Bias.red,

Bias.blue, Bias.orange), 2), col = c("black", "red", "blue", "orange"), pch="X")

# Legend for variance of each scenario

legend("topright", title = "Variance", legend = round(c(Variance.black,

Variance.red, Variance.blue, Variance.orange), 2), col = c("black", "red",

"blue", "orange"), pch="X")

Part II

We are now going to run some simulations based on the example that we used in the “Properties of Point

Estimators” packet from lecture.

Let X1, X2, . . . , Xn

iid～ Uniform(0, θ). Recall, that ?θMOM = 2Xˉ and ?θMLE? =

n+1

n X(n) are two unbiased

estimators for θ.

We are going to run a simulation to see how these two estimators perform.

The idea is to draw random samples of size 25 from a Uniform(0, 12). For each sample, compute 2ˉx and

26/25 × max{x1, x2, . . . , xn} and record those values. Repeat this 1000 times.

Lab Question #2

(a) Write and run the code for the simulation described above.

(b) Create two histograms of the 1000 replications, one for each estimate.

Make sure that your histograms have the same x and y axes (this can be done with the arguments

xlim = c(a, b) and ylim = c(d, e), where you have to decide on a, b, d, e).

(c) Which estimator exhibits the smallest amount of variability? Is that what you expected? Why or

why not?

A. Flynt, Department of Mathematics, Bucknell University, MATH 304 3

Section #2: Calculating MLE’s

In this final section, we will calculate maximum likelihood estimators for particular distributions using R.

As always, the first thing to do is write down the log-likelihood function. We have not spent time on creating

functions yet, so I will just show you the syntax for how this works:

# You will first give your function a name.

# Then specify the necessary parameters (pars) and data (object) for your function.

# You will open the function ({)

# Then you will specify the form of your log-likelihood (logl)

# You need to tell the function what it should return when you use it

# Note: the function that we will use to optimize our likelihood only minimizes

# functions, therefore we will return the negative log-likelihood

# Finally, close your function (})

name = function(pars, object){

logl = loglikelihood function

return(-logl)

}

Example

Let X1, X2, . . . , Xn

iid～ Poisson(λ). Then, the log-likelihood function is given by:

`(λ) = Xni=1xiln(λ) ? nλ ?Xni=1ln(xi!).

Since the last term does not include the parameter λ, it can be safely ignored for optimization. Thus, the

kernel of the log-likelihood function is:

`(λ) ∝Xni=1xiln(λ) ? nλ.

We can program this function using the following syntax:

poisson.lik = function(lambda, x){

logl = sum(x)*log(lambda) - length(x)*lambda

return(-logl)

}

Once the log-likelihood function has been declared, then the nlm command can be invoked. The minimum

specification of this command is nlm(log-likelihood, starting values, data). Here, starting values

is your (vector of) starting value(s) for the optimization of your parameter.

We will now simulate a data set of 1000 observations from a Poisson(λ = 3.25) and numerically find the

maximum likelihood estimate.

# Set the seed so that we all generate the same data set.

set.seed(288)

poiss.data = rpois(1000, 3.25)

A. Flynt, Department of Mathematics, Bucknell University, MATH 304 4

# Note that we are using 1 as the starting value.

out = nlm(poisson.lik, 1, x = poiss.data)

pois.mle = out\$estimate

pois.mle

#  3.242

How did we do? Not too bad considering the true value of λ was 3.25. Is this the value that you were

expecting? Why or why not?

Lab Question #3

Recall that on HW #9, I asked you to write down (but not solve) the equations that you would need to

maximize to find the MLE’s for (α, β) from a Gamma distribution. These partial derivatives are not

something that we would want to do by hand (taking the derivative of the gamma function is not too

fun), so instead, we will find the MLE’s numerically.

(a) Write the code for gamma.lik, the log-likelihood function of a Gamma(α, β). Note, that this

function will now have a vector for the pars argument.

(b) Read in “gamma data.txt” from Moodle and optimize this data with starting values of (1, 1) for

(α, β).

Note: if you use read.table(), R will read in the data as a data frame and your likelihood

function will not be able to handle an object of that class. To avoid this issue, you will have to pull

off the x column from the data and pass that into your likelihood function.

(c) What are your MLE’s for (α, β)?

A. Flynt, Department of Mathematics, Bucknell University, MATH 304 5

25选5一等奖多少钱