-
Notifications
You must be signed in to change notification settings - Fork 6
Post Processing Drafted Manuscript Records
Title: Instructions for Executing Manuscript Post-Processing Author: William L. Potter Date: 2021-12-07
In order to promote rapid development, encoders initially create stripped-down, skeleton XML records that contain only data which could not be machine generated. These are stored in files under /data/3_drafts/EncoderName/ (cf. the page describing encoder workflow). An XQuery script converts these XML snippets into valid TEI manuscript records, containing all the requisite project metadata, etc. This page walks you through the process of executing this XQuery script. The final section overviews the remaining steps in the data pipeline once records have been post-processed.
N.B. all files and directories are specified using a relative path from the base wright-catalogue repository, the location of which may change depending on where you stored the clone of this repository on your local machine.
This overview is provided as a quick-reference of the process. Fuller details follow below.
- Set BaseX to preserve whitespace using global option:
set chop false - Set directory inputs and outputs in
/parameters/config.xml - Open and execute post-processing driver script,
/drivers/mss-post-processing-driver.xq - Check for script errors or remaining validation errors; commit changes to GitHub
- Move ms part records for separate editing; move processed records to
/data/5_finalizedfor editing- optional: copy processed records from
/data/5_finalizedto shared Box folder for editing
- optional: copy processed records from
- Once edited (and returned from Box if needed), move edited records to https://github.com/srophe/srophe-app-data/. Commit changes on both repos
To ensure the post-processing script preserves white space in mixed-content situations, the global BaseX options need to be changed.
- Ensure that the Input Bar is open (under the menu
View > Input Bar). - Into the Input Bar, type
set chop false.
The file /parameters/config.xml allows you to specify both the input directory where the files you want to post-process are stored as well as the output directory where you want to store the results. You can also specify a list of directories you want the script to check to ensure that you do not reprocess a record, but this list has already been set and should only be edited in rare circumstances.
- Set the input directory. xpath:
/config/inputDirectory. Use a relative path from the wright-catalogue repository, e.g./data/3_drafts/WilliamPotter. - Set the output directory. xpath:
/config/outputDirectory. Use a relative path from the wright-catalogue repository, e.g./data/4_to_be_checked/postProcessingOutputs/. - (Optional) edit the ignored directory list. Add, edit, or delete directories from which the script should pull a list of already-processed records. xpath:
/config/ignoredDirectoryList/directory. Use a relative path from the wright-catalogue repository, e.g./data/5_finalized/.
The file /parameters/config-proj.xml contains project-level metadata that rarely changes but may need to be updated. A separate page will detail how to configure these data.
The post-processing script may now be run.
- Open the driver script in BaseX,
/drivers/mss-post-processing-driver.xq - Execute this script
- Once script errors are addressed (see below), commit these new files to GitHub
If the script encounters an error, it will save an error report rather than a TEI XML file. Errors should be reported using the repository's issue tracker. Tag these issues with the bug and post-processing labels and assign @wlpotter. Please also include the following information (these can be attached to the issue or pasted into the comments):
- the configuration file
paramters/config.xml - the error report(s) created by the script
- other information as relevant
In addition to script errors, it may be the case that validation errors remain after manuscript post-processing. These validation errors should be caught and corrected before manuscript editing.
- Using oXygen's project window (menu
Window > Project), navigate to the directory where the processed records are stored - Right click/ctrl+click and select
Validate > Validate. - If the script catches validation errors, correct them and commit those changes.
After the post-processing script has successfully executed and validation errors have been corrected, the manuscript records are ready for editing.
- Manuscript part records are currently split off for separate processing (cf. Nesting Manuscript Part Records into a Composite File). At present these are stored in
/data/4_to_be_checked/need-ms-parts/ - Full manuscript records can be moved to
/data/5_finalized - Assign editors using the Wright Decoder who will edit the records
- Note: for ease of access, manuscripts which are ready to edit may be copied into a Box or Dropbox folder, edited there, and copied back to the GitHub directory. Provided the files in GitHub were not changed in any other way, replacing them with the edited files will generate a .diff report that may be used to verify the accuracy of the proofreader's edits.
- Edited records may be moved to the Srophe data repository. They should also be moved from
/data/5_finalizedto/data/5_finalized/TransferredToDev. Commits should be made on both repositories.- Note: at present, the manuscript repository on the Srophe data repository only exists on the dev branch.