maximum entropy PS

Initial code from @bartosz-bogulas on the maximum entropy propensity score as described in Chapter 11 in Kim, J. K., & Shao, J. (2021). Statistical methods for handling incomplete data. Chapman and Hall/CRC [link](https://www.routledge.com/Statistical-Methods-for-Handling-Incomplete-Data/Kim-Shao/p/book/9780367280543).

```r
#' Maximum Entropy Propensity Score Estimator
#'
#' @description
#' Fits a maximum entropy propensity score estimator for non-probability surveys
#' based on a density ratio model. The estimator is obtained by solving the
#' moment condition:
#' \deqn{\sum_{i=1}^{N} \delta_i\exp(\phi_0+\phi_1^\top x_i)(x_i-\hat{\bar{x}}_0)=0,}
#' where
#' \deqn{\hat{\bar{x}}_0=\frac{1}{\hat{N}_0}\Biggl(\sum_{i \in A} d_i\,x_i-\sum_{i=1}^{N}\delta_i\,x_i\Biggr), \quad \hat{N}_0=\sum_{i \in A} d_i-N_1,}
#' and $N_1=\sum_{i=1}^{N} \delta_i$.
#'
#' The intercept $\phi_0$ is determined by
#' \deqn{\phi_0=\log(N_1)-\log\Bigl\{\sum_{i=1}^{N}\delta_i\exp(\phi_1^\top x_i)\Bigr\}.}
#'
#' The final propensity score estimator for the population mean of $y$ is given by
#' \deqn{\hat{\theta}_{PS2}=\frac{1}{N}\sum_{i=1}^{N}\delta_i\left\{1+\frac{\hat{N}_0}{N_1}\exp(\phi_0+\phi_1^\top x_i)\right\}y_i.}
#'
#' @param X Matrix of covariates.
#' @param delta Binary indicator vector; 1 indicates membership in the non-probability sample.
#' @param d Numeric vector of design weights for the probability sample.
#' @param y Numeric vector of outcome values.
#' @param start_phi1 Numeric vector of starting values for $\phi_1$ (default is a zero vector).
#' @param maxit Maximum number of iterations for the solver (default 100).
#' @param tol Convergence tolerance for the solver (default 1e-6).
#' @param nleqslv_method Character; method for \code{nleqslv} optimization (default "Broyden").
#' @param nleqslv_global Character; global strategy for \code{nleqslv} (default "dbldog").
#' @param nleqslv_xscalm Character; scaling method for \code{nleqslv} (default "auto").
#'
#' @return A list with components:
#' \itemize{
#'   \item \code{phi}: A list with \code{phi0} and \code{phi1} (the estimated parameters).
#'   \item \code{theta_PS2}: The final propensity score estimator for the population mean of $y$.
#'   \item \code{diagnostics}: The output from the \code{nleqslv} solver.
#' }
#'
#' @importFrom nleqslv nleqslv
#'
#' @noRd
est_method_maxent <- function(X, delta, d, y,
                              start_phi1 = rep(0, ncol(X)),
                              maxit = 100,
                              tol = 1e-6,
                              nleqslv_method = "Broyden",
                              nleqslv_global = "dbldog",
                              nleqslv_xscalm = "auto") {
  
  # Convert X to a matrix and determine the number of covariates
  X <- as.matrix(X)
  p <- ncol(X)
  
  # Compute Horvitz-Thompson totals
  N_total <- sum(d)         
  N1 <- sum(delta)          
  N0_hat <- sum(d) - N1     
  
  # Compute the target covariate mean for the probability sample
  x_bar0 <- (colSums(d * X) - colSums(delta * X)) / N0_hat
  
  # Define the moment function
  moment_fun <- function(phi1) {
    eta <- as.vector(X %*% phi1)
    phi0 <- log(N1) - log(sum(delta * exp(eta)))
    # Compute moment vector over covariates
    moments <- colSums(delta * exp(phi0 + eta) * (X - matrix(rep(x_bar0, each = nrow(X)), nrow = nrow(X))))
    moments
  }
  
  # Solve for theta_1 using nleqslv
  sol <- nleqslv::nleqslv(x = start_phi1, fn = moment_fun,
                          method = nleqslv_method,
                          global = nleqslv_global,
                          xscalm = nleqslv_xscalm,
                          control = list(maxit = maxit, ftol = tol))

  phi1_hat <- sol$x
  eta_hat <- as.vector(X %*% phi1_hat)
  phi0_hat <- log(N1) - log(sum(delta * exp(eta_hat)))
  
  # Compute the final propensity weight component
  ps_weight <- 1 + (N0_hat / N1) * exp(phi0_hat + eta_hat)
  
  # Compute the final propensity score estimator for the population mean of y
  theta_PS2 <- sum(delta * ps_weight * y) / N_total
  
  # Result
  result <- list(
    estimator = "ipw",                
    estimator_method = "maxent",      
    phi = list(phi0 = phi0_hat, phi1 = phi1_hat),
    theta_PS2 = theta_PS2,            
    diagnostics = sol,                
    output = data.frame(mean = theta_PS2, SE = NA), 
    confidence_interval = data.frame(lower_bound = NA, upper_bound = NA),
    svydesign = NULL,                 
    control = list(control_inference = list(vars_selection = "false", var_method = "analytic")),
    pop_size_fixed = FALSE,           
    y = list(y),                      
    call = match.call()               
  )
  
  # We set the class so that print.nonprob and print.nonprob_method will be used
  class(result) <- "nonprob"
  
  return(result)
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

maximum entropy PS #77

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

maximum entropy PS #77

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions