This repo allows users to construct a "standard" set of point verification results using harp, a set of R packages developed within the ACCORD consortium for analysis, verification, and visualisation of NWP data. The typical RRJVSP workflow for deterministic or ensemble experiments is the following:
- Convert HARMONIE-AROME vfld and vobs files to sqlite tables.
- Use the sqlite data to perform point verification using harp over a standard set of groupings (e.g.
lead_time,SID) for surface and upper-air parameters. - The point verification produces harp
.rdsfiles which can be visualised usingharpVis::shiny_plot_point_verif. - If desired, a set of local png files are also produced for a standard set of verification scores. These pngs can be visualised in the simple shiny app provided in the repo.
- If desired. produce a scorecard comparing two models. While this functionality is available in the scripts, some care must be taken in the construction and interpretation of scorecards. See here for more information. The default configuration settings outlined in these scripts are only indicative.
Little to no prior knowledge of harp is assumed, although users are encouraged to familiarise themselves with harp using the 2022 and 2024 training courses. Note that harp provides many more utilites than just NWP verification.
The instructions for installing harp can be found here and in the 2024 tutorial page. This should install fairly easily on the atos supercomputer at ECMWF.
To install packages required by this repo, you can use:
pkg_list <- c("here","argparse","yaml","dplyr","tidyr",
"purrr","forcats","stringr","RColorBrewer","grid",
"gridExtra","pracma","RSQLite","scales","pals",
"shiny","shinyWidgets","lubridate","scico","cowplot","sf")
for (pkg in pkg_list) {
install.packages(pkg)
}Note that installation of the package sf can sometimes be problematic. These scripts only require sf for polygon station filtering. If you do not require polygon filtering, then sf does not have to be installed. If you try to use polygon filtering and sf is not found, polygon filtering will simply be skipped.
In order to install the latest version of harp from the main branch in github, you can use:
install.packages("remotes")
library(remotes)
install_github("harphub/harp")If you want to use a specific release of harp, you can use:
install_github("harphub/harp@v0.2.2")or a specific branch or a single repository:
install_github("harphub/harpPoint@develop")All development versions are in the "develop" branch for each of the harp packages. Note that harp installation can take some time, and you will be asked about updating dependencies.
Occasionally the harp installation can fail with a message like:
Downloading GitHub repo andrew-MET/harp@master
Error: HTTP error 403.
API rate limit exceeded for 130.226.71.190. (But here's the good news: Authenticated requests get a hi
gher rate limit. Check out the documentation for more details.)or:
Error: Failed to install 'unknown package' from GitHub:If this happens, you need to create a personal token. Follow the instructions here to get yourself a token. Once you do that, follow these instructions
gitcreds::gitcreds_set()
<Enter token here>Then use:
usethis::edit_r_environ()This will open an editor to edit the file .Renviron, where you can write GITHUB_PAT=thetokenabove.
The system will ask you to restart R to take effect. Then try to install harp again.
To update harp (or a single package, like harpPoint) to the latest main branch, do the following:
library("remotes")
install_github("harphub/harpPoint") You generally have the option to update package dependencies.
Create a configuration file for your project using the structure provided in the example configs. The config file is split into options used for pre-processing (pre), verification (verif), visualisation/output (post), and scorecards (scorecards). A description of these options is given below. Note that paths specified should already exist on your system as the scripts typically do not assume permission to generate parent directories.
| Type | Option name | Description | Example (s) |
|---|---|---|---|
| verif | project_name | A descriptive name for the verification project. This is the same as the the project name variable PROJECT in monitor’s configuration file scr/Env_exp. | project_name: "TEST" |
| verif | fcst_model | The names of the forecast models to consider. | fcst_model: - model_A - model_B |
| verif | lead_time | Lead times to consider in the verification e.g. 0 to 48 in steps of 3 hours. seq and c() will be parsed and evaluated by R. |
lead_time: seq(0,48,3) |
| verif | lead_time_UA | If you want to use specific lead times for upper-air variables. The default is NULL, which means that upper-air variables will use lead times specfied by verif:lead_time. Useful if you are verifying a list of surface and upper-air parameters, e.g. "T2m,T,Pmsl", and want to change the lead times used without having to change the config. |
Default: lead_time_UA: NULL Use hours 0-24 in steps of 1 for upper-air variables lead_time_UA: seq(0,24,1) |
| verif | by_step | Interval (in hours) between cycles to be used for verification i.e. cycles considered are start_date to end_date in steps of verif:by_step. |
by_step: "6h" |
| verif | fcst_type | Forecast type, either det or eps. |
fcst_type: "det" |
| verif | domains | What SID lists to verify (equivalent to SURFSELECTION/TEMPSELECTION in monitor, where the station lists are defined in scr/selection.pm). The choice "All" will use all stations available in the data. SID lists, either defined explicitly or based on a geographic area, can be defined in fn_station_selection (see the documentation below for more information). Alternatively you can select a single SID by using it's SID number as a domain. |
domains: -"All" domains: - "IRL" - "IE_EN" Verify domain "IRL" and a single station (SID=3953): domains: - "IRL" - "3953" |
| verif | domains_UA | What SID lists to verify for upper-air parameters (following same convention as "domains" above). This may be useful if you want to use only certain domains for upper-air verification e.g. a large region containing more radiosondes. If this option is missing or set to NULL then it will default to "domains". | To use "domains" for "domains_UA": domains_UA: - NULL To verify surface parameters over multiple domains but upper-air parameters over one: domains: - "IRL" - "IE_EN" domains_UA: - "IE_EN" |
| verif | members | What ensemble members to use in the verification for each verif:fcst_model. Set to NULL to use all the members available in each ensemble. Include a list here if the members to be used differ for each of the verif:fcst_model. |
For a deterministic model or to read all memeber in an ensemble: members: - NULL Specific members from different ensembles: members: - c(1,2,3) - c(4,5,6) |
| verif | lags | How to lag the ensemble members for each ensemble. Include a list if the lags differ for each of the verif:fcst_model. Set to “0h” for no lagging. The parent cycles for lagging are determined by the start date, end date, and verif:by_step. See harpIO::read_point_forecast for more information. |
For a deterministic model: lags: - "0h" For 3hr lagging in the first ensemble, no lagging in the second: lags: - c("0h","3h") - "0h" |
| verif | shifts | Experimental and for deterministic models only! How to shift forecasts forwards/backards in time for each model. Include a list if the shifts differ for each of the verif:fcst_model. Set to NULL to do nothing (i.e. default behaviour, equivalent to shifting by zero hours). Use digits only, where the shift is assumed to be in hours. Positive values shift forward in time, negative values backwards. |
Default: shifts: - NULL Do not shift first model, but shift second by 6h forward in time: shifts: - "0" - "6" |
| verif | num_ref_members | Number of reference members to use when computing fair scores for an ensemble (e.g. "Inf"). Not used for deterministic models. See harpPoint::ens_verify for more details. |
num_ref_memebrs: "Inf" |
| verif | ua_fcst_cycle | For upper air variables, do you want to group by fcst_cycle? Either TRUE or FALSE (default). | ua_fcst_cycle: FALSE |
| verif | ua_restrict_vh | For upper air variables, do you want to restrict to valid hours 00, 06, 12, and 18Z? Either TRUE (default) or FALSE. | ua_restrict_vh: TRUE |
| verif | force_valid_thr | Set to TRUE if you really want threshold scores over valid times/hours. FALSE is generally sufficient. | force_valid_thr: FALSE |
| verif | lt_split | Set to TRUE if you want to split the auxiliary scores (e.g. scatterplots) into "short" (<=24) and "long" (>24) leadtime ranges. Results using all leadtimes will always be generated | lt_split: FALSE |
| verif | fcst_path | Path to the forecast sqlite tables generated from the vfld. | fcst_path: "/path/to/FCTABLE" |
| verif | obs_path | Path to the observation sqlite tables generated from the vobs. | obs_path: "/path/to/OBSTBALE" |
| verif | verif_path | A root directory where the rds verification files are stored. For a given project, these rds files will be located in verif:verif_path/verif:project_name. |
verif_path: "/path/to/rds" |
| pre | fcst_model | Carry out the vfld to sqlite conversion for these forecast models. Note that this can differ from verif:fcst_model (e.g. the sqlite tables may already exist for some models, and you may want to skip regenerating these). |
fcst_model: - model_A - model_B |
| pre | lead_time | Which lead times to use when converting the vfld. Can differ from verif:lead_time. |
lead_time: seq(0,48,1) |
| pre | vfld_path | Path to where the vfld files are stored (note that the full file names to be read are constructed from pre:vfld_path and pre:vfld_template). The vfld files for each pre:fcst_model should exist in this directory. |
vfld_path: "/path/to/vfld" |
| pre | vfld_by | Interval (in hours) between cycles to be used when generating the sqlite tables from vfld i.e. read vfld files from start_date to end_date in steps of pre:vfld_by. This can differ from verif:by_step. |
vfld_by: "3h" |
| pre | vfld_template | The template used to construct the vfld file names. See harpIO::show_file_templates. For deterministic/ensemble experiments, the standard choices are “vfld” and "vfld_eps". If different templates are to be used for different forecast models, then include a pre:vfld_template entry corresponding to each forecast model. Custom templates can also be created. |
Standard: vfld_template: - "det" Custom: vfld_template: - "{fcst_model}/{YYYY}/{MM}/vfld{fcst_model}{YYYY}{MM}{DD}{HH}{LDT2}" |
| pre | vobs_path | Path to where the vobs files are stored. | vobs_path: "/path/to/vobs" |
| pre | vobs_by | Interval (in hours) between the vobs files i.e. read vobs files from start_date to end_date in steps of pre:vobs_by. |
vobs_by: "1h" |
| pre | custom_asl_path | Path to a specific HARMONIE-AROME allsynop.list to use when converting the vfld data. If not specified, harp's default stationlist is used (harpCore::station_list). |
custom_asl_path: "/path/to/allsynop.list" |
| pre | members | Which members to read for each ensemble. If only one option is set for members, it will be repeated for each pre:fcst_model. Set to NULL for a deterministic experiment. Note: this variable is different from verif:members as the two can differ in general. |
For deterministic models: members: - NULL For ensembles: - seq(0,6) |
| pre | lags | How the ensemble members are lagged for each pre:fcst_model. For a given model, pre:lags should be the same length as pre:members. Include a list if the lags differ for each of the pre:fcst_model. Note: this variable is different from verif:lags as the notation used for each is different. |
For deterministic models: lags: - NULL For ensemble models: lags: - c(0,0,0,0,3,3,3) |
| pre | params | Which parameters to consider when generating the sqlite tables from vfld. See harpIO::show_harp_parameters and harpIO::harp_params for recognised vfld parameters in harp. Set to NULL to convert everything found. |
params: - T2m - Td2m - T To convert everything found in the vfld: params: - NULL |
| post | plot_output | A root directory indicating where to save the png files generated during the verification process. For a given project, the png files will be stored in post:plot_output/verif:project_name. Set to "default" to use verif:verif_path/archive. |
plot_output: "/path/to/png" |
| post | create_png | Flag to indicate if png files should be generated. | create_png: TRUE |
| post | save_vofp | Flag to indicate if the verification object used for creating pngs should be saved. Useful for generating images at a later stage. Files will be stored in post:plot_output/verif:project_name. Default is FALSE. |
save_vofp: TRUE |
| post | save_sidrds | Flag to indicate if the verification object used for SID scores (i.e. maps) should be saved. May be useful for investigating issues with certain stations. Files will be store in post:plot_output/verif:project_name. Default is FALSE. |
save_sidrds: TRUE |
| post | cmap | What palette to use for the line plots. Choose from a palette in "RColorBrewer" or a function from the "pals" package e.g. "trubetskoy". Defaults to "Set2" from RColorBrewer if not specified. | cmap: "trubetskoy" |
| post | cmap_hex | What pallete to use for scatterplots. Choose from "paired" (RColorBrewer), "magma", "viridis", or a scico pallette (see scico::scico_pallete_names). Defaults to magma. | cmap_hex: "bukavu" |
| post | map_cbar_d | Logcial flag to indicate if a discrete colourbar (with fixed breaks and bounds) should be used in the station map plots as opposed to a continuous one where the bounds change depending on the data. Defaults to "FALSE" if not specified. | map_cbar_discrete: TRUE |
| scorecards | create_scrd | Logical switch to produce scorecard data, save the data, and plot the scorecard. | create_scrd: TRUE |
| scorecards | ref_model | The reference model for the scorecards. | ref_model: "old_model" |
| scorecards | fcst_model | The “new” model for the scorecards (i.e. score differences are scorecards:fcst_model - scorecards:ref_model). |
fcst_model: "new_model" |
| scorecards | parameters | Which parameters to plot in the scorecards (parameters appear as ordered here). For upper-air variables, if variable “X” is included in the verification, then you can include individual pressure levels in the scorecard plots by using “X750”, etc. . | parameters: - T2m - Td2m - T925 - T700 |
| scorecards | scores | Which of the deterministic or ensemble summary scores to plot in the scorecard. These appear from left to right in the scorecard. | Deterministic: scores: - "bias" - "rmse" Ensemble: scores: - "mean_bias" - "spread" |
| scorecards | domains | Construct scorecards for these domains only (provided they exist in verif:domains or verif:domains_UA). |
domains: - "All" |
| scorecards | pooled_by | How to pool the data for bootstrapping. See harpPoint:bootstrap_verify for more details. |
pooled_by: fcst_dttm |
| scorecards | numboot | Number of bootstrap replicates. See harpPoint:bootstrap_verify for more details. Should be “large”. |
numboot: 1000 |
| scorecards | parallel | Use parallel processing when performing the bootstrapping. | parallel: FALSE |
| scorecards | num_cores | Number of cores to use if parallel=TRUE. |
num_cores: 1 |
| scorecards | plot_signif | As well as the scorecard, plot the actual score difference as a function of leadtime for each parameter and score (provided that scorecards:create_scrd is TRUE). Statistical significance of the score difference is also indicated. Uses fn_plot_signif_diff.R. |
plot_signif: TRUE |
Typically the vfld and vobs files need to be converted to sqlite tables before starting the verification process.
The pre-processing of vfld files in particular can take some time, especially for ensemble experiments. However, the tables only need to be generated once. As such, this step can be skipped once the tables are in place. The vfld2sql.R script takes the following command line inputs (required arguments in bold, optional arguments are in italics):
- -config_file: The config file in the
config_filesdirectory (no default). - -start_date: The first forecast cycle to process (in YYYYMM, YYYYMMDD, or YYYYMMDDHH format (preferred), with no default). If YYYYMM is given then the first day and cycle of the month (i.e. 01/00Z) is assumed. If YYYYMMDD is given then the first cycle (i.e. 00Z) of YYYYMMDD is assumed.
- -end_date: The last forecast cycle to process (in YYYYMM, YYYYMMDD, or YYYYMMDDHH format (preferred), with no default). If YYYYMM is given then the last day and cycle of the month (i.e. lastday/23Z) is assumed. If YYYYMMDD is given then the last cycle (i.e. 23Z) of YYYYMMDD is assumed.
- -use_custom_asl: Logical flag to use a custom HARMONIE-AROME allsynop.list (as specified in the config file) when converting the vfld (default=FALSE). This is useful when your vfld files contain more stations than in harp's default list of stations (
harpCore::station_list). The script will automatically convert the allsynop.list into something readable by harp. If set toFALSE, or not specified in the config file, harp's default station list is used. - -remove_m_elev: Logical flag to remove writing model elevation to the sqlite files (default=FALSE). This may be useful for ensemble experiments when members have different model elevations.
- -params_list: A comma separated list of parameters to convert (default="MISSING"). This will override whatever is set for
pre:paramsin your config file, which may be useful in certain circumstances. If this argument is not specified thenpre:paramswill be used (i.e. the default behaviour). - -use_obs_elev: Logical flag to use the station elevation from the OBSTABLE (i.e.
verif:obs_path) when performing the height correction for T2m or Ps (surface pressure). This option is included as the observation station elevation can differ significantly from that in a stations file (e.g. an "allsynop.list" file or harp's defaultharpCore::station_list). Set to FALSE by default i.e. just use the elevation from the stations file.
Typical usage:
./vfld2sql.R -config_file config_files/config_det_example.yml -start_date YYYYMMDDHH -end_date YYYYMMDDHH -use_custom_asl TRUE -remove_m_elev TRUE
This converts all cycles from start_date to end_date in steps of pre:vfld_by (in the config file). By default the output sqlite files are stored in:
{verif:fcst_path}/{pre:fcst_model}/{YYYY}/{MM}/FCTABLE_{parameter}_{YYYY}{MM}_{HH}.sqlite
for each {parameter} in pre:params (or for every parameter in the vfld if pre:params = NULL).
Similarly, to create the observation sqlite tables run:
./vobs2sql.R -config_file config_files/config_det_example.yml -start_date YYYYMMDDHH -end_date YYYYMMDDHH
Note that the start and end dates here correspond to valid times (and thus end_date should extend beyond the last forecast cycle converted). By default the output sqlite files are stored in:
{verif:obs_path}/OBSTABLE_{YYYY}.sqlite
This OBSTABLE will contain all observations available in the vobs files.
Once the configuration file is set and the sqlite tables are created, the point verification can be carried out.
This parameter list file is used to specify the parameters considered by the point_verif.R script and their associated options, in particular:
- scale_fcst: Forecast scaling (e.g. Kelvin to degress). If you only want to apply the forecast scaling to certain models in
verif:fcst_modelfor certain paramters, this can be controlled by the flagmodels_to_scalein your parameter list file (e.g. set_params.R, see below). - scale_obs: Observation scaling.
- thresholds: Thresholds used when computing threshold skill scores.
- obsmin/max_val: Max/min observation values allowed.
- fctmax_val: Max forecast values allowed (experimental).
- error_sd: Number of standard deviations used in
harpPoint::check_obs_against_fcst. - models_to_scale: What specific models to scale using
scale_fcstfor this parameter. If missing or NULL for a given parameter, the same scaling will be applied to all models specified byverif:fcst_modelin the config file ifscale_fcstis specified. Ifmodels_to_scalecontains a model which is not found in the forecast data, thepoint_verif.Rscript will abort. For example, suppose you read in two models ("Model_A" and "Model_B") and you only want to scale "Model_A" for T2m and "Model_B" for S10m. Then you should addmodels_to_scale = c("Model_A")under the T2m list andmodels_to_scale=c("Model_B")under the S10m list. For all other parameters wherescale_fcstis specified, the same scaling will be applied to both "Model_A" and "Model_B". - vc: The vertical coordinate. This is required to recognise upper-air variables over multiple levels e.g.
vc=pressurefor parameterT. For surface variables (e.g. T2m) or for a single level (e.g. T925), there is no need to setvc.
Typically this file does not need to be changed. By default point_verif.R reads parameter options from this file, but a custom parameter file can also be used by passing the -params_file option to point_verif.R.
This script takes the following command line inputs (required arguments in bold, optional arguments are in italics):
- -config_file: The config file in the
config_filesdirectory (no default). - -start_date: The first forecast cycle to process (in YYYYMM, YYYYMMDD, or YYYYMMDDHH format (preferred), with no default). If YYYYMM is given then the first day and cycle of the month (i.e. 01/00Z) is assumed. If YYYYMMDD is given then the first cycle (i.e. 00Z) of YYYYMMDD is assumed.
- -end_date: The last forecast cycle to process (in YYYYMM, YYYYMMDD, or YYYYMMDDHH format (preferred), with no default). If YYYYMM is given then the last day and cycle of the month (i.e. lastday/23Z) is assumed. If YYYYMMDD is given then the last cycle (i.e. 23Z) of YYYYMMDD is assumed.
- -params_file: The parameter list file containing parameter scalings, thresholds, etc. (default="verification/set_params.R"). Note: if you are making use of the
models_to_scaleoption in the parameter list file, it is best to create a new parameter list file and explicitly call this when you are running point_verif.R. Sharing a parameter list file with specific model scalings across different projects may be dangerous due to common models e.g. Model A is scaled for T2m in project X, but it is not scaled for T2m in project Y. Unfortunately creating new parameter lists for specific projects does introduce some code duplication. - -params_list: Which parameters for verify (default="T2m"). This should be a comma separated string of parameters, for example "T2m,S10m,T,S,T925". Generally these parameters must exist in the parameter list file, otherwise they will be skipped. The only excpetion to this is for parameters at a single upper-air level e.g.
T925. In this case, if these parameters do not appear explicitly in the parameter list file, the scripting will use information from the corresponding upper-air parameters in the parameter list file e.g.T925will take its settings fromT,S500fromS, etc.. This does not happen if e.g.T925appears explicitly in the parameter list file. Ifparams_listis not specified, it just defaults to "T2m". Note that forparams_list="All"all parameters in the parameter list file are considered in the verification (this is NOT recommended in general). - -dynamic_sid_gen: A logical flag to generate SID lists corresponding to the
verif:domainsorverif:doamins_UAduring the verification process (default=TRUE). Differentdomainsoptions are defined infn_station_selection.pm. This flag replaces the old methodology of reading SID lists from a static file (i.e.verification/sid_lists.rds). This old (now deprecated) method can be activated by switching this flag to "FALSE". - -plot_dynamic_sid: A logical flag to plot a map of the stations used for each domain and parameter (default=FALSE). This is only relevant when "dynamic_sid_gen=TRUE".
- -mod_def_rds: A logical flag to prepend the project name to harp's default rds filenames (default=FALSE). Not generally required.
- -add_proj_png: A logical flag to prepend the project name to the default png filenames (default=FALSE). Not generally required.
- -rolling_verif: A logcial flag to indicate "rolling" verification (default=FALSE). If TRUE, rolling verification will produce a reduced set of png files and will not produce rds files or scorecards. Generally rolling verification is restricted to a "short" (e.g. 7 days) near-real time period. This option is not compatible with
gen_sc_only=TRUE. - -gen_sc_only: A logical flag to run scorecard generation (and plotting) only (default=FALSE). This may be useful in cases where point verification results have already been generated. This option is not compatible with
rolling_verif=TRUE. - -use_fixed_dates: A logical flag to use the input
start_dateandend_datewhen naming the directories and png files associated with this verification (default=TRUE). If set to FALSE, the data generated will use start and end dates corresponding to the first and lastfcst_dttm, respectively, used in the verification for a given parameter. Therefore if set to FALSE, data may be stored in different directories for different parameters if the first and lastfcst_dttmdiffers (this can happen in particular for precipitation). This option does not change the start/end dates in the rds filenames. - -skip_sid_verif: A logical flag to skip the SID verification used for generating map scores (default=FALSE). Useful for quick runs.
- -skip_thresh_verif: A logical flag to skip the threshold verification (default=FALSE). This will override whatever is set for thresholds in your parameters file. Useful for quick runs.
Alternatively, the script can be sourced directly from within R/RStudio. In this case, the arguments will be read from the verification/source_options.R file. This interactive mode is useful for interrogating the data.
Typical usage:
./point_verif.R -config_file config_files/config_det_example.yml -start_date YYYYMMDDHH -end_date YYYYMMDDHH -params_file verification/set_params.R -params_list "T2m,Q"
This will run the verifcation for using cycles from start_date to end_date in steps of verif:by_step for parameters T2m and Q.
The following QC checks of the forecast and observation data are carried out by the script:
- Forcast values above
fctmax_valare removed (if this variable is set). - Observation values above/below the
obsmin_val/obsmax_valare removed (if these variables are set). harpPoint::check_obs_against_fcstis run to remove observations which are more thanerror_sdstandard devations away from the forecast.- Station report frequency is computed and stations in the bottom 1% are removed. This acts to discard stations with very few observation reports.
By default the script assumes the following verificaiton for surface and upper-air variables (see harpCore::make_verif_groups() for more information):
- Surface: Data is grouped by
fcst_cycleand the SID grouping specified by theverif:domainsoption in the config file (this is stored under the variablestation_groupin the scripts). Verification scores are then computed as a function of leadtime, valid date, and valid hour. - Upper air: Data is group by the SID grouping specified by the
verif:domains_UAoption and scores computed as a function of leadtime and valid hour. Ifverif:domains_UAis missing from the config (or set ot NULL), thenverif:domainswill be used for the SID grouping.
point_verif.R will produce standard harp .rds files which contain the full suite of verification scores available in harp by default. These files will be stored in:
{verif:verif_path}/{verif:project_name}/harpPointVerif.harp.{parameter}.harp.{start_date}-{end_date}.harp.{forecast_model_1}.model.{forecast_model_2).model...{forecast_model_N}.rds
Typically the filenames for the harp rds files should not be changed as the harp shiny app assumes a set format. Note that while the filenames do not contain information about the verif:domains or verif:doamins_UA considered, the domain selection is included in the rds files under the station_group variable.
If post:create_png: TRUE, then a suite of standard verification scores are plotted as png files for local visualisation. These local files will also include plots which are not available in harp's shiny app, such as forecast timeseries and station bias/rmse maps. These files should appear in:
{post:plot_output}/{verif:project_name}/{start_date}-{end_date}/long_file_names.png
(the filenames are somewhat convoluted and should not be changed as a strict structure is assumed in the local shiny app). See fn_png_name in fn_plot_helpers.R for more information on the convention used.
If you are also generating a scorecard (scorecards:create_scrd: TRUE), then all the underlying scorecard data (for each domain) will be saved to:
{verif:verif_path}/{verif:project_name}/harpScData-{start_date}-{end_date}-{scorecards:ref_model}-{scorecards:fcst_model).rds
Scorecard plots will also be available in
{post:plot_output}/{verif:project_name}/{start_date}-{end_date}/*scard*.png
The scorecard data can be passed to fn_scorecard_signif.R for visualisation if desired. If scorecards:plot_signif: TRUE, these images will be available in the same directory (search for *sdiff*.png).
The rds files can be visualised by using harp's built-in shiny app:
shiny_plot_point_verif(start_dir={verif:verif_path}/{verif:project_name})
For example, when loading in a surface parameter, you should see groups for station_group and fcst_cycle and different options for the "Time" axis.
To browse the png images easily, a simple shiny app is provided with a similar interface to the "monitor" tool. This shiny app is defined in visualization/visapp/app.R. Some small edits to app.R may need to be taken in order to run the app locally, as detailed below.
- The variable
img_diris set to NULL by default, which just looks for files in thesample_imagesdirectory provided in visapp. Setimg_dir <- {plot_output}instead to point to the images available there after verification. - Optional: By default the variable
app_diris set tohere("visualization/visapp")and this should correctly point to thevisappdirectory provided thathere()starts in this repository. If this is not the case for some reason, alternatively you can hardcodeimg_dirto the location of thevisappdirectory on your system. - Optional: Add your project name to
all_experiment_names.Rin thevisappdirectory, following the convention given, in order to change the display name of the project in the app. If your project is not listed inall_experiment_names.R, it will still appear in the app but with a default display name. - Optional: There is a
smr_indvariable which changes the date display in the app to cater for dates categrorsied into "Rolling", "Monthly", and "Seasonal" periods. This assumes that data inplot_outputis stored under directories following the strcutureMonthly/project_name/start_date-end_date, and the same forRollingandSeasonal. This switch can generally be left as FALSE.
Note that you can have images for multiple projects in plot_output and switch easily between projects in the app. The app will read all directories which exist in plot_output, and therefore you need to remove the project_name directory from plot_output in order to remove the project from the app. The app can then be launched by opening app.R in Rstudio and hitting "Run App". From a terminal, cd into the visualization directory, open R, and use:
library(shiny)
runApp("visapp")In many cases you may want to associate a given forecast model with a fixed colour in the png plots e.g. "model_A" is always coloured red. To do this, search for model_names in visualization/fn_plot_helpers.R, which defaults to:
# Some fixed model names which the verification scripts know about.
# The default below considers 20 known models, corresponding to the number
# available from "trubetskoy" in the pals packages (less last two colours).
# The number of colours available in RColorBrewer is typically less, while
# pals offers more options.
model_names <- c("c1",
"c2",
"c3",
"c4",
"c5",
"c6",
"c7",
"c8",
"c9",
"c10",
"c11",
"c12",
"c13",
"c14",
"c15",
"c16",
"c17",
"c18",
"c19",
"c20")Then take a look at your chosen colour pallete (as specfied by cmap in your config file) from the pals or RColorBrewer packages and replace "cX" by your model name for the colour you want. Note that the index of your model in model_names is assoicated with the colour for that index in the palette. For example, if you use cmap:trubetskoy in your config file and set:
model_names <- c("model_A",
"c2",
"c3",
"model_B",
...then "model_A" will be always be in red and "model_B" will always be in blue i.e. the 1st and 4th colours of the trubetskoy pallete. If some or all of your forecast models are not listed in model_names, the colours assoicated with these models will default to the first ones available in the palette. For example, suppose you have forecast models "model_A", "model_B", and "model_C" and model_names is set as per the example above. In this case "model_A" will be in red, "model_B" be in blue", and "model_C" will be in green (i.e. the 2nd colour of the trubetskoy pallete, as the 1st colour is already taken by "model_A").
It is possible to run the shiny web server directly on atos and open a local browser
to visualized the results img_dir using the launch_visapp_atos.R script.
- Soft link the path with the image results in the
sample_imagesdirectory.
cd visapp/sample_images/
ln -sf some_path_somewhere local_name_of_the_path
cd ../..
module load R #mind the version where you installed harp
Rscript launch_visapp_atos.R
- Open a new terminal and ssh to one of the atos login nodes
ssh -L 9999:localhost:9999 ac6-102
- Open the browser on
http://127.0.0.1:9999/that should show up after doing step 1.
or simply click on the link displayed in the terminal.
This will open a local browser in your local pc showing the shiny app there while the png files are in the remote server.
Yo might need to install the shinyWidgets app.
The same is valid for visualizing the results interactively using the rds files.
In this case use the launch_dynamicapp_atos.R instead of launch_visapp_atos.R in step 1.
The SID lists used during verification are controlled by the verif:domains and verif:domains_UA options in the config file (the latter applying to upper-air variables only if specified). These domains variables are essentially equivalent to SURFSELECTION and TEMPSELECTION in monitor. Setting:
domains:
- "All"
will use all stations available in your data, and this will generally be sufficient for many cases. However there may be cases where verification over a specific subset of SIDs is desired. In monitor, various SID lists are defined in scr/selection.pm, whereas here the SID lists are controlled by verification/fn_station_selection. Information regarding the default domains/station lists available, and how to add new ones, is given below. Note that you can verify a single station by simply passing it's SID as an option to verif:domains or verif:domains_UA e.g. using
domains:
- "3953"
- "3969"
will verify the individual stations 3953 (Valentia) and 3969 (Dublin Airport). Passing a list of stations e.g. c("3953","3969") directly in verif:domains will not work - you should add this SID list to verification/fn_station_selection as per the instructions below.
Note: A new method to handle station selection has been added in which SID lists are generated on the fly during the verification process. The old methodology read SID lists from a static file, however this method is no longer recommended and is now deprecated. The documentation below cover both methodologies, starting with the new recommended method.
There are a number of default domains/lists already available in fn_station_selection.R which can be used for the option verif:domains or verif:domains_UA. These are categorised as follows:
These lists are specfified explicitly in fn_station_selection.R, e.g.
DKlist <- stats::setNames(tibble::as_tibble(
c(06030,06041,06043,06049,06052,06058,06060,06070,06072,06073,
06074,06079,06080,06081,06096,06102,06104,06110,06116,06118,
06119,06120,06123,06126,06135,06138,06141,06149,06151,06154,
06156,06165,06168,06170,06180,06181,06190,06193)),
"SID")The current specific lists available are (see the function for SIDs used):
- DKlist
- DKland
- DKcoast
- DKupdated
- Greenland
- NorthSea
- NNS (North NorthSea)
- EstoniaSynop
- LatviaSynop
- LithuaniaSynop
- GreenlandIcecap
- NEUcoast
- EWGLAM
- RussiaCoast
- RussiaInland
- NorthAmericaInland
- AirportSynop
- Coastal_All
To use any of these in your config file, simply use e.g.
domains:
- "DKlist"
- "EWGLAM"Note:The Coastal_All list is indicative only and should be changed by users based on their needs. At the moment it only contains stations pertinent to UWC-W.
Domains based on lat/lon bounding boxes are defined in the variable latlon_reserved_names, i.e.:
latlon_reserved_names <- c("DINI","IE_EN","IS","NL","NL_OP","DK","IRL","SCD",
"FR","DE","Alps")The bounding boxes themselves can be found in the function (see therein for north, south, east, and west), e.g.:
} else if (domain == "IE_EN") { #Ireland+UK bounding box
slat <- 50.0
wlon <- -11.0
nlat <- 60.0
elon <- 2.0To use any of these in your config file, simply use e.g.
domains:
- "IE_EN"
- "IRL"Various country-specific WMO SID ranges are already defined in the function, e.g. "FranceSynop" and "GermanySynop". These are listed in the variable synop_reserved_names, i.e.:
synop_reserved_names <- paste0(c("France","Germany","Ireland","Norway",
"Sweden","Finland","Iceland","Greenland",
"Denmark","Netherlands","Spain",
"Switzerland","UnitedKingdom"),
"Synop")The SID ranges are defined in the subfunction filter_stations as:
if (grepl("France",domain,fixed = "TRUE")) {
SID_min <- 7001
SID_max <- 7998
} else if (grepl("Germany",domain,fixed = "TRUE")) {
SID_min <- 10001
SID_max <- 10998To use any of these in your config file, simply use e.g.
domains:
- "IrelandSynop"
- "SpainSynop"You can also generate SID lists based on polygon files (as in monitor). The polygon files should be stored under directory verification/poly_files. To use a polgon file called DOMIAN.poly as your verification domain, you simply need to:
- Ensure
DOMAIN.polyexists in `verification/poly_files. - Use the following in your config file:
domains:
- "DOMAIN"In certain cases you may encounter a message saying that the specified poly file is not valid, in which case an SID list will not be generated for that domain.
Several quality control checks are carried out to ensure consistency of SID lat/lon in the observations and forecast:
- A given SID is checked to see if it has multiple lat/lon positions in the observation data frame (i.e. the OBSTABLE). If this is the case, and if the lat/lon values differ significantly this station will not be used in the verification.
- A given SID is checked to see if the lat/lon position in the observation data frame (i.e. the OBSTABLE) matches that in the forecast data frame (i.e. the FCTABLE). If the observation and forecast position differ significantly, then this station will not be used in the verificaiton. This is controlled by the logical flag "multlatlon_rmv".
You can define SIDs which should always be removed from the verification process by adding it to the stations_always_rmv variable in fn_station_selection.R. In addition, you can also define stations which should be removed for certain parameters, as controlled by the variable stations_param_rmv. For example, if you wan to remove SID=1234 for T2m and SID=5678 for Pmsl, use:
} else if (param == "T2m") {
stations_param_rmv <- c(1234)
} else if (param == "Pmsl") {
stations_param_rmv <- c(5678)Filtering for additional parameters can be added as required.
For a given domain/SID list, denoted as X, you can select stations which have elevation above or below a given height Y by using the following convention:
- "X_ELEVA_Y": This will use all stations in domain/SID list X above (>) elevation Y m.
- "X_ELEVB_Y": This will use all stations in domain/SID list X below (<=) elevation Y m. Note that stations with missing elevation (typically denoted as -99, which comes from the vobs files) will always be removed in this case. If no stations are found which meet the criteria "X_ELEVA/B_Y" then this domain will be skipped.
For a given domain/SID list, denoted as X, you can filter to coastal and inland stations by using the following convention:
- "X_COASTAL": This will use all stations in domain/SID list X which are found in the "Coastal_All" SID list (see function for SIDs used).
- "X_INLAND": This will use all stations in domain/SID list X which are not found in the "Coastal_All" SID list. If no stations are found which meet the "X_COASTAL" condition then domain "X_COASTAL" will be skipped.
You can add a new domain/SID list definition by editing fn_station_selection.R as follows. Note that the name of your new domain should not conflict with any existing domain/list name!
Simply add it to fn_station_selection.R following the same convention used.
To add a new domain based on a lat/lon bounding box, the domain needs to be added to fn_station_selection.R with values for slat, nlat and wlon, elon, following the convetnion used. The new domain name should also be added to the variable latlon_reserved_names (desirable but not strictly required for this methodology).
A new SID list based on a SID range can be added by:
- Including "NewList" in
synop_reserved_namesby adding it to the vector concatenated with "Synop" e.g.paste0(c("France","Germany","Ireland","Norway","NewList". - Add SID_min/max corresponding to "NewList" to
filter_stationsfollowing the convention used.
A new domain based on a polygon file can be used simply by adding the polygon definition to verification/poly_files. The new domain name should also be added to the variable poly_reserved_names (desirable but not strictly required for this methodology).
When this methodology is used, the SID lists used for each parameter will be saved to files in the verification directory with names "dynamic_param_sid_lists.rds". This allows users to see exactly what stations were used in the verification process. You can also plot a map of the stations used for each variable by using the option "-plot_dynamic_sid TRUE" when running point_verif.R. These will also be stored in the verification directory under the names "dynamic_param_domain_stationmap.png".
WARNING: ALL INFORMATION HENCEFORTH IS DEPRECATED
To view the default stationlists available, use:
source(here(verification/fn_station_selection.R))
sid_lists <- fn_station_selction("all_domains")There are 39 SID lists currently available (names matching *rmv* can be ignored here):
> names(sid_lists)
[1] "DKlist" "DKland" "DKcoast" "DKupdated" "Greenland"
[6] "NorthSea" "NNS" "EstoniaSynop" "LatviaSynop" "LithuaniaSynop"
[11] "mll_rmv_obstable" "mll_rmv_vfldlist" "DINI" "IE_EN" "IS"
[16] "NL" "NL_OP" "DK" "IRL" "SCD"
[21] "FR" "DE" "Alps" "FranceSynop" "GermanySynop"
[26] "IrelandSynop" "NorwaySynop" "SwedenSynop" "FinlandSynop" "IcelandSynop"
[31] "GreenlandSynop" "DenmarkSynop" "NetherlandsSynop" "SpainSynop" "SwitzerlandSynop"
[36] "UnitedKingdomSynop" "Baltex" "Gotaland" "Latvia_Lithuania" "Norrland"
[41] "Sweden" "stations_to_rmv"
> sid_lists$DKlist
# A tibble: 38 × 1
SID
<dbl>
1 6030
2 6041
3 6043
4 6049
5 6052
6 6058
7 6060
8 6070
9 6072
10 6073
# ℹ 28 more rows
# ℹ Use `print(n = ...)` to see more rows
> sid_lists$IRL
# A tibble: 113 × 5
SID lat lon elev elevmap
<int> <dbl> <dbl> <dbl> <chr>
1 3901 54.3 -7.59 72 2: 10-100m
2 3904 54.7 -7.58 49 2: 10-100m
3 3907 55.2 -6.95 6 1: <10m
4 3911 54.7 -6.82 225 3: 100-250m
5 3915 54.9 -6.46 64 2: 10-100m
6 3916 55.2 -6.15 156 3: 100-250m
7 3917 54.7 -6.23 63 2: 10-100m
8 3918 54.5 -6.33 18 2: 10-100m
9 3923 54.2 -6.50 161 3: 100-250m
10 3951 51.5 -9.42 21 2: 10-100m
# ℹ 103 more rows
# ℹ Use `print(n = ...)` to see more rowsThese names correspond to the possible values for verif:domains (along with the "All" option). To visualise the SID lists, you can use:
fn_station_selction("all_domains",plot_domains = TRUE)This will produce pngs in the verification directory of this repo for each SID list.
To add a specific list of SIDs to use as a new verif:domains option, simply add it to fn_station_selection.R following the same convention, e.g.:
#=================================================#
# USER-DEFINED STATIONLISTS (AS IN MONITOR)
# ADD SPECIFIC LISTS HERE AS REQUIRED
#=================================================#
# Denmark lists
DKlist <- stats::setNames(tibble::as_tibble(
c(06030,06041,06043,06049,06052,06058,06060,06070,06072,06073,
06074,06079,06080,06081,06096,06102,06104,06110,06116,06118,
06119,06120,06123,06126,06135,06138,06141,06149,06151,06154,
06156,06165,06168,06170,06180,06181,06190,06193)),
"SID")
# Add this to all_stations_list
all_station_lists[["DKlist"]] <- DKlist
# New list
newlist <- stats::setNames(tibble::as_tibble(
c(1234,5678),
"SID")
all_station_lists[["newlist"]] <- newlist
You can then immediately use:
domains:
- "newlist"
in your config file. Note that the name of your new list should not conflict with any existing station list name!
You can define SIDs which should always be removed from the verification process by adding it to the stations_always_rmv variable in fn_station_selection.R. In addition, you can also define stations which should be removed for certain parameters, as controlled by the variable stations_param_rmv. For example, if you wan to remove SID=1234 for T2m and SID=5678 for Pmsl, use:
} else if (param == "T2m") {
stations_param_rmv <- c(1234)
} else if (param == "Pmsl") {
stations_param_rmv <- c(5678)Filtering for additional parameters can be added as required.
Generally the easiest way of updating or adding a SID list based on a geographic region is to raise an issue in the repo. However some details are given below.
fn_station_selection.R can also generate SID lists based on lat/lon bounding boxes, which are contained in verification/sid_lists.rds. The lat/lon bounding boxes used can be viewed in fn_station_selection.R, e.g.:
} else if (domain == "IRL") {
slat <- 51
wlon <- -11
nlat <- 55.5
elon <- -6Generation of these SID lists requires the HARMONIE-AROME allsynop.list and alltemp.list files used when producing the vfld. The location of these allsynop/temp.list files is specified by the sl_dir argument to the script. Optionally a user can also pass to the script the OBSTABLE used in the verificaiton, as specified by the argument sql_file. If an OBSTABLE is provided, then only common stations in the OBSTABLE and allsynop/temp.list will be used when generating the SID lists (which is generally recommended). Finally, an extra quality control check can be carried out by comparing the SID lat/lon in the OBSTABLE against that in the allsynop/temp.list for consistency (and omits the SID if the position differs significantly). This is controlled by the multlatlon_rmv argument to the script.
To add a new SID list based on a lat/lon bounding box, the domain first needs to be added to the script with values for slat, nlat and wlon, elon. The new domain name also needs to be added to the variable latlon_reserved_names. Then to generate the list, or indeed to update an existing SID list in verification/sid_lists.rds, you can run:
fn_station_selection("all_domains",
generate_domains = TRUE,
domains_to_gen = "All",
plot_domains = TRUE,
sl_dir = "/path/to/allsynop.list",
sql_file = "/path/to/OBSTABLE",
domain_file = "sid_lists.rds",
multlatlon_rmv = TRUE)
Providing an OBSTABLE and setting multlatlon_rmv=TRUE is generally recommended. The new SID lists will be saved to a temporary file for comparison with the existing sid_lists.rds, which can then be overwritten if everything looks fine. Plots of the stations in each domain will also be generated in the verification directory. Note that if an OBSTABLE is used in the generation process (i.e. sl_dir is specified), the SID lists will only contain stations which are appearing in your observation database. Therefore if new stations are added to your OBSTABLE at a later point, you may need to regenerate the SID lists and update the file sid_lists.rds.
An alternative to using a lat/lon bounding box is to use a country-specific WMO SID range. This is already done for multiple SID lists, e.g. "FranceSynop" and "GermanySynop" are listed in synop_reserved_names. These lists are stored in sid_lists.rds and the SID ranges are defined in the subfunction filter_stations as:
if (grepl("France",domain,fixed = "TRUE")) {
SID_min <- 7001
SID_max <- 7998
} else if (grepl("Germany",domain,fixed = "TRUE")) {
SID_min <- 10001
SID_max <- 10998A new SID list based on a SID range can be added by:
- Include "NewList" in
synop_reserved_namesby adding it to the vector concatenated with "Synop" e.g.paste0(c("France","Germany","Ireland","Norway","NewList". - Add SID_min/max corresponding to "NewList" to
filter_stationsas per the samples given. - Regenerate
sid_lists.rds(see the example for lat/lon bounding boxes above).
You can also generate SID lists based on polygon files. These SID lists are also stored in sid_lists.rds, which already contains some defaults (e.g. Baltex). To do so, you need to:
- Create a "NewList.poly" file (e.g. the Baltex.poly in monitor). A simple sample file is given in the repo.
- Specify the path to the poly file via the argument
poly_dir(which defaults to theverificationdirectory). - Include your "NewList" in
poly_reserved_namesin the script. - Regenerate
sid_lists.rds, e.g.
fn_station_selection("Baltex",
generate_domains = TRUE,
domains_to_gen = "Baltex",
plot_domains = TRUE,
sl_dir = "/path/to/allsynop.list",
poly_dir = "/path/to/poly_files",
sql_file = "/path/to/OBSTABLE",
domain_file = "sid_lists.rds",
multlatlon_rmv = TRUE)
In certain cases you may enctounter a message saying that the specified poly file is not valid, in which case an SID list will not be generated for that domain.
Finally, by default point_verif.R does not run SID list generation based on bounding boxes or SID ranges and instead calls fn_station_selection.R which reads existing lists in sid_lists.rds. If you want to read from a different filename, say tmp_sid_list.rds, simply change the function call in point_verif.R to:
# Get domains from station_selection
cs_list <- fn_station_selection(base::setdiff(domains_to_run,"All"),
param = prm_name,
domain_file = "tmp_sid_list.rds")
)