Post Processing Drafted Manuscript Records

Title: Instructions for Executing Manuscript Post-Processing Author: William L. Potter Date: 2021-12-07

Executing the Post-Processing Scripts for Drafted Manuscript Records

In order to promote rapid development, encoders initially create stripped-down, skeleton XML records that contain only data which could not be machine generated. These are stored in files under /data/3_drafts/EncoderName/ (cf. the page describing encoder workflow). An XQuery script converts these XML snippets into valid TEI manuscript records, containing all the requisite project metadata, etc. This page walks you through the process of executing this XQuery script. The final section overviews the remaining steps in the data pipeline once records have been post-processed.

N.B. all files and directories are specified using a relative path from the base wright-catalogue repository, the location of which may change depending on where you stored the clone of this repository on your local machine.

Overview of Post-Processing Steps

This overview is provided as a quick-reference of the process. Fuller details follow below.

Set BaseX to preserve whitespace using global option: set chop false
Set directory inputs and outputs in /parameters/config.xml
Open and execute post-processing driver script, /drivers/mss-post-processing-driver.xq
Check for script errors or remaining validation errors; commit changes to GitHub
Move ms part records for separate editing; move processed records to /data/5_finalized for editing
- optional: copy processed records from /data/5_finalized to shared Box folder for editing
Once edited (and returned from Box if needed), move edited records to https://github.com/srophe/srophe-app-data/. Commit changes on both repos

Configurations and Setup

BaseX White Space Options

To ensure the post-processing script preserves white space in mixed-content situations, the global BaseX options need to be changed.

Ensure that the Input Bar is open (under the menu View > Input Bar).
Into the Input Bar, type set chop false.

Editing the Configuration Files

The file /parameters/config.xml allows you to specify both the input directory where the files you want to post-process are stored as well as the output directory where you want to store the results. You can also specify a list of directories you want the script to check to ensure that you do not reprocess a record, but this list has already been set and should only be edited in rare circumstances.

Set the input directory. xpath: /config/inputDirectory. Use a relative path from the wright-catalogue repository, e.g. /data/3_drafts/WilliamPotter.
Set the output directory. xpath: /config/outputDirectory. Use a relative path from the wright-catalogue repository, e.g. /data/4_to_be_checked/postProcessingOutputs/.
(Optional) edit the ignored directory list. Add, edit, or delete directories from which the script should pull a list of already-processed records. xpath: /config/ignoredDirectoryList/directory. Use a relative path from the wright-catalogue repository, e.g. /data/5_finalized/.

The file /parameters/config-proj.xml contains project-level metadata that rarely changes but may need to be updated. A separate page will detail how to configure these data.

Open and Execute the Post-Processing Driver Script

The post-processing script may now be run.

Open the driver script in BaseX, /drivers/mss-post-processing-driver.xq
Execute this script
Once script errors are addressed (see below), commit these new files to GitHub

Error and Validation Checking

If the script encounters an error, it will save an error report rather than a TEI XML file. Errors should be reported using the repository's issue tracker. Tag these issues with the bug and post-processing labels and assign @wlpotter. Please also include the following information (these can be attached to the issue or pasted into the comments):

the configuration file paramters/config.xml
the error report(s) created by the script
other information as relevant

In addition to script errors, it may be the case that validation errors remain after manuscript post-processing. These validation errors should be caught and corrected before manuscript editing.

Using oXygen's project window (menu Window > Project), navigate to the directory where the processed records are stored
Right click/ctrl+click and select Validate > Validate.
If the script catches validation errors, correct them and commit those changes.

Post- Post-Processing

After the post-processing script has successfully executed and validation errors have been corrected, the manuscript records are ready for editing.

Manuscript part records are currently split off for separate processing (cf. Nesting Manuscript Part Records into a Composite File). At present these are stored in /data/4_to_be_checked/need-ms-parts/
Full manuscript records can be moved to /data/5_finalized
Assign editors using the Wright Decoder who will edit the records
- Note: for ease of access, manuscripts which are ready to edit may be copied into a Box or Dropbox folder, edited there, and copied back to the GitHub directory. Provided the files in GitHub were not changed in any other way, replacing them with the edited files will generate a .diff report that may be used to verify the accuracy of the proofreader's edits.
Edited records may be moved to the Srophe data repository. They should also be moved from /data/5_finalized to /data/5_finalized/TransferredToDev. Commits should be made on both repositories.
- Note: at present, the manuscript repository on the Srophe data repository only exists on the dev branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post Processing Drafted Manuscript Records

Title: Instructions for Executing Manuscript Post-Processing Author: William L. Potter Date: 2021-12-07

Executing the Post-Processing Scripts for Drafted Manuscript Records

Overview of Post-Processing Steps

Configurations and Setup

BaseX White Space Options

Editing the Configuration Files

Open and Execute the Post-Processing Driver Script

Error and Validation Checking

Post- Post-Processing

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally