Initial code from @bartosz-bogulas on the maximum entropy propensity score as described in Chapter 11 in Kim, J. K., & Shao, J. (2021). Statistical methods for handling incomplete data. Chapman and Hall/CRC link.
#' Maximum Entropy Propensity Score Estimator
#'
#' @description
#' Fits a maximum entropy propensity score estimator for non-probability surveys
#' based on a density ratio model. The estimator is obtained by solving the
#' moment condition:
#' \deqn{\sum_{i=1}^{N} \delta_i\exp(\phi_0+\phi_1^\top x_i)(x_i-\hat{\bar{x}}_0)=0,}
#' where
#' \deqn{\hat{\bar{x}}_0=\frac{1}{\hat{N}_0}\Biggl(\sum_{i \in A} d_i\,x_i-\sum_{i=1}^{N}\delta_i\,x_i\Biggr), \quad \hat{N}_0=\sum_{i \in A} d_i-N_1,}
#' and \(N_1=\sum_{i=1}^{N} \delta_i\).
#'
#' The intercept \(\phi_0\) is determined by
#' \deqn{\phi_0=\log(N_1)-\log\Bigl\{\sum_{i=1}^{N}\delta_i\exp(\phi_1^\top x_i)\Bigr\}.}
#'
#' The final propensity score estimator for the population mean of \(y\) is given by
#' \deqn{\hat{\theta}_{PS2}=\frac{1}{N}\sum_{i=1}^{N}\delta_i\left\{1+\frac{\hat{N}_0}{N_1}\exp(\phi_0+\phi_1^\top x_i)\right\}y_i.}
#'
#' @param X Matrix of covariates.
#' @param delta Binary indicator vector; 1 indicates membership in the non-probability sample.
#' @param d Numeric vector of design weights for the probability sample.
#' @param y Numeric vector of outcome values.
#' @param start_phi1 Numeric vector of starting values for \(\phi_1\) (default is a zero vector).
#' @param maxit Maximum number of iterations for the solver (default 100).
#' @param tol Convergence tolerance for the solver (default 1e-6).
#' @param nleqslv_method Character; method for \code{nleqslv} optimization (default "Broyden").
#' @param nleqslv_global Character; global strategy for \code{nleqslv} (default "dbldog").
#' @param nleqslv_xscalm Character; scaling method for \code{nleqslv} (default "auto").
#'
#' @return A list with components:
#' \itemize{
#' \item \code{phi}: A list with \code{phi0} and \code{phi1} (the estimated parameters).
#' \item \code{theta_PS2}: The final propensity score estimator for the population mean of \(y\).
#' \item \code{diagnostics}: The output from the \code{nleqslv} solver.
#' }
#'
#' @importFrom nleqslv nleqslv
#'
#' @noRd
est_method_maxent <- function(X, delta, d, y,
start_phi1 = rep(0, ncol(X)),
maxit = 100,
tol = 1e-6,
nleqslv_method = "Broyden",
nleqslv_global = "dbldog",
nleqslv_xscalm = "auto") {
# Convert X to a matrix and determine the number of covariates
X <- as.matrix(X)
p <- ncol(X)
# Compute Horvitz-Thompson totals
N_total <- sum(d)
N1 <- sum(delta)
N0_hat <- sum(d) - N1
# Compute the target covariate mean for the probability sample
x_bar0 <- (colSums(d * X) - colSums(delta * X)) / N0_hat
# Define the moment function
moment_fun <- function(phi1) {
eta <- as.vector(X %*% phi1)
phi0 <- log(N1) - log(sum(delta * exp(eta)))
# Compute moment vector over covariates
moments <- colSums(delta * exp(phi0 + eta) * (X - matrix(rep(x_bar0, each = nrow(X)), nrow = nrow(X))))
moments
}
# Solve for theta_1 using nleqslv
sol <- nleqslv::nleqslv(x = start_phi1, fn = moment_fun,
method = nleqslv_method,
global = nleqslv_global,
xscalm = nleqslv_xscalm,
control = list(maxit = maxit, ftol = tol))
phi1_hat <- sol$x
eta_hat <- as.vector(X %*% phi1_hat)
phi0_hat <- log(N1) - log(sum(delta * exp(eta_hat)))
# Compute the final propensity weight component
ps_weight <- 1 + (N0_hat / N1) * exp(phi0_hat + eta_hat)
# Compute the final propensity score estimator for the population mean of y
theta_PS2 <- sum(delta * ps_weight * y) / N_total
# Result
result <- list(
estimator = "ipw",
estimator_method = "maxent",
phi = list(phi0 = phi0_hat, phi1 = phi1_hat),
theta_PS2 = theta_PS2,
diagnostics = sol,
output = data.frame(mean = theta_PS2, SE = NA),
confidence_interval = data.frame(lower_bound = NA, upper_bound = NA),
svydesign = NULL,
control = list(control_inference = list(vars_selection = "false", var_method = "analytic")),
pop_size_fixed = FALSE,
y = list(y),
call = match.call()
)
# We set the class so that print.nonprob and print.nonprob_method will be used
class(result) <- "nonprob"
return(result)
}
Initial code from @bartosz-bogulas on the maximum entropy propensity score as described in Chapter 11 in Kim, J. K., & Shao, J. (2021). Statistical methods for handling incomplete data. Chapman and Hall/CRC link.