Skip to content

maximum entropy PS #77

@BERENZ

Description

@BERENZ

Initial code from @bartosz-bogulas on the maximum entropy propensity score as described in Chapter 11 in Kim, J. K., & Shao, J. (2021). Statistical methods for handling incomplete data. Chapman and Hall/CRC link.

#' Maximum Entropy Propensity Score Estimator
#'
#' @description
#' Fits a maximum entropy propensity score estimator for non-probability surveys
#' based on a density ratio model. The estimator is obtained by solving the
#' moment condition:
#' \deqn{\sum_{i=1}^{N} \delta_i\exp(\phi_0+\phi_1^\top x_i)(x_i-\hat{\bar{x}}_0)=0,}
#' where
#' \deqn{\hat{\bar{x}}_0=\frac{1}{\hat{N}_0}\Biggl(\sum_{i \in A} d_i\,x_i-\sum_{i=1}^{N}\delta_i\,x_i\Biggr), \quad \hat{N}_0=\sum_{i \in A} d_i-N_1,}
#' and \(N_1=\sum_{i=1}^{N} \delta_i\).
#'
#' The intercept \(\phi_0\) is determined by
#' \deqn{\phi_0=\log(N_1)-\log\Bigl\{\sum_{i=1}^{N}\delta_i\exp(\phi_1^\top x_i)\Bigr\}.}
#'
#' The final propensity score estimator for the population mean of \(y\) is given by
#' \deqn{\hat{\theta}_{PS2}=\frac{1}{N}\sum_{i=1}^{N}\delta_i\left\{1+\frac{\hat{N}_0}{N_1}\exp(\phi_0+\phi_1^\top x_i)\right\}y_i.}
#'
#' @param X Matrix of covariates.
#' @param delta Binary indicator vector; 1 indicates membership in the non-probability sample.
#' @param d Numeric vector of design weights for the probability sample.
#' @param y Numeric vector of outcome values.
#' @param start_phi1 Numeric vector of starting values for \(\phi_1\) (default is a zero vector).
#' @param maxit Maximum number of iterations for the solver (default 100).
#' @param tol Convergence tolerance for the solver (default 1e-6).
#' @param nleqslv_method Character; method for \code{nleqslv} optimization (default "Broyden").
#' @param nleqslv_global Character; global strategy for \code{nleqslv} (default "dbldog").
#' @param nleqslv_xscalm Character; scaling method for \code{nleqslv} (default "auto").
#'
#' @return A list with components:
#' \itemize{
#'   \item \code{phi}: A list with \code{phi0} and \code{phi1} (the estimated parameters).
#'   \item \code{theta_PS2}: The final propensity score estimator for the population mean of \(y\).
#'   \item \code{diagnostics}: The output from the \code{nleqslv} solver.
#' }
#'
#' @importFrom nleqslv nleqslv
#'
#' @noRd
est_method_maxent <- function(X, delta, d, y,
                              start_phi1 = rep(0, ncol(X)),
                              maxit = 100,
                              tol = 1e-6,
                              nleqslv_method = "Broyden",
                              nleqslv_global = "dbldog",
                              nleqslv_xscalm = "auto") {
  
  # Convert X to a matrix and determine the number of covariates
  X <- as.matrix(X)
  p <- ncol(X)
  
  # Compute Horvitz-Thompson totals
  N_total <- sum(d)         
  N1 <- sum(delta)          
  N0_hat <- sum(d) - N1     
  
  # Compute the target covariate mean for the probability sample
  x_bar0 <- (colSums(d * X) - colSums(delta * X)) / N0_hat
  
  # Define the moment function
  moment_fun <- function(phi1) {
    eta <- as.vector(X %*% phi1)
    phi0 <- log(N1) - log(sum(delta * exp(eta)))
    # Compute moment vector over covariates
    moments <- colSums(delta * exp(phi0 + eta) * (X - matrix(rep(x_bar0, each = nrow(X)), nrow = nrow(X))))
    moments
  }
  
  # Solve for theta_1 using nleqslv
  sol <- nleqslv::nleqslv(x = start_phi1, fn = moment_fun,
                          method = nleqslv_method,
                          global = nleqslv_global,
                          xscalm = nleqslv_xscalm,
                          control = list(maxit = maxit, ftol = tol))

  phi1_hat <- sol$x
  eta_hat <- as.vector(X %*% phi1_hat)
  phi0_hat <- log(N1) - log(sum(delta * exp(eta_hat)))
  
  # Compute the final propensity weight component
  ps_weight <- 1 + (N0_hat / N1) * exp(phi0_hat + eta_hat)
  
  # Compute the final propensity score estimator for the population mean of y
  theta_PS2 <- sum(delta * ps_weight * y) / N_total
  
  # Result
  result <- list(
    estimator = "ipw",                
    estimator_method = "maxent",      
    phi = list(phi0 = phi0_hat, phi1 = phi1_hat),
    theta_PS2 = theta_PS2,            
    diagnostics = sol,                
    output = data.frame(mean = theta_PS2, SE = NA), 
    confidence_interval = data.frame(lower_bound = NA, upper_bound = NA),
    svydesign = NULL,                 
    control = list(control_inference = list(vars_selection = "false", var_method = "analytic")),
    pop_size_fixed = FALSE,           
    y = list(y),                      
    call = match.call()               
  )
  
  # We set the class so that print.nonprob and print.nonprob_method will be used
  class(result) <- "nonprob"
  
  return(result)
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions