Skip to content

Drop NAs in the ESTIMATE() function #17

@fuleky

Description

@fuleky

Dear Andrea,

Thank you for your work on the bimets package. I am not sure if you accept feature requests, but the one I outline below would be really handy. Thank you for considering it. If there is an existing workaround, please let me know.

Feature request: Drop NAs in the ESTIMATE() function

Background:
The ESTIMATE function expects data that has no NAs in TSRANGE. In practice, variables have different start and end dates. GETRANGE can get the intersection of dates. But a typical model contains various lags of variables and GETRANGE does not consider lags in the model.

Problem:
ESTIMATE throws an error when it encounters NAs.
TSRANGE can be set globally in ESTIMATE function, but this risks collinearity.

Example:
Say there are three equations:

  1. first eq: the data for the first has a shorter range, e.g. 2000-2020.
  2. second eq: the data for the second has the same range as the first, but the equation contains lags of variables, e.g. TSLAG(x, 4).
  3. third eq: the third one contains a (dummy) indicator variable with 1 before the data range of the first two equation, i.e. ones before 2000, and zeros after.
    Setting global TSRANGE to overlap with the range of equation 1 will result a variable with only zeros in equation 3 (apparent collinearity). And equation 2 will throw an error because the lags are not accommodated by TSRANGE.
    Setting global TSRANGE to overlap with the range of equation 3 will result in missing values in equation 1, since the range would start before 2000.

Proposed solution:
These problems could be solved by having the following options/arguments in the ESTIMATE function:

  1. A logical argument to drop data rows (time periods) containing NAs from each equation. For this to accommodate lags, the full model matrix with all the lags should be first assembled before rows with NAs are dropped. If this is implemented at the equation level, that would still allow TSRANGE to vary from equation to equation. (Internal NAs could still throw an error).
  2. A logical argument to drop data columns (variables) that cause collinearity when setting TSRANGE globally (just like stats::lm() does).

The two could also be combined to just drop all NAs in the model matrix before estimation. Or this could also just be a separate function that can be applied to a model with data. The function could print messages listing equations whose data and TSRANGE was altered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions