You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you don't already have `git subrepo` installed, follow the [git subrepo installation instructions](https://github.com/ingydotnet/git-subrepo#installation).
19
-
Then add the latest ingest scripts to the pathogen repo by running:
19
+
Then add the latest shared scripts to the pathogen repo by running:
Check the parent commit hash in the `ingest/vendored/.gitrepo` file and make
52
+
Check the parent commit hash in the `shared/vendored/.gitrepo` file and make
53
53
sure the commit exists in the commit history. Update to the appropriate parent
54
54
commit hash if needed.
55
55
@@ -84,39 +84,49 @@ approach to "ingest" has been discussed in various internal places, including:
84
84
85
85
## Scripts
86
86
87
-
Scripts for supporting ingest workflow automation that don’t really belong in any of our existing tools.
87
+
Scripts for supporting workflow automation that don’t really belong in any of our existing tools.
88
88
89
-
-[notify-on-diff](notify-on-diff) - Send Slack message with diff of a local file and an S3 object
90
-
-[notify-on-job-fail](notify-on-job-fail) - Send Slack message with details about failed workflow job on GitHub Actions and/or AWS Batch
91
-
-[notify-on-job-start](notify-on-job-start) - Send Slack message with details about workflow job on GitHub Actions and/or AWS Batch
92
-
-[notify-on-record-change](notify-on-recod-change) - Send Slack message with details about line count changes for a file compared to an S3 object's metadata `recordcount`.
89
+
-[assign-colors](scripts/assign-colors) - Generate colors.tsv for augur export based on ordering, color schemes, and what exists in the metadata. Used in the phylogenetic or nextclade workflows.
90
+
-[notify-on-diff](scripts/notify-on-diff) - Send Slack message with diff of a local file and an S3 object
91
+
-[notify-on-job-fail](scripts/notify-on-job-fail) - Send Slack message with details about failed workflow job on GitHub Actions and/or AWS Batch
92
+
-[notify-on-job-start](scripts/notify-on-job-start) - Send Slack message with details about workflow job on GitHub Actions and/or AWS Batch
93
+
-[notify-on-record-change](scripts/notify-on-record-change) - Send Slack message with details about line count changes for a file compared to an S3 object's metadata `recordcount`.
93
94
If the S3 object's metadata does not have `recordcount`, then will attempt to download S3 object to count lines locally, which only supports `xz` compressed S3 objects.
94
-
-[notify-slack](notify-slack) - Send message or file to Slack
95
-
-[s3-object-exists](s3-object-exists) - Used to prevent 404 errors during S3 file comparisons in the notify-* scripts
96
-
-[trigger](trigger) - Triggers downstream GitHub Actions via the GitHub API using repository_dispatch events.
97
-
-[trigger-on-new-data](trigger-on-new-data) - Triggers downstream GitHub Actions if the provided `upload-to-s3` outputs do not contain the `identical_file_message`
95
+
-[notify-slack](scripts/notify-slack) - Send message or file to Slack
96
+
-[s3-object-exists](scripts/s3-object-exists) - Used to prevent 404 errors during S3 file comparisons in the notify-* scripts
97
+
-[trigger](scripts/trigger) - Triggers downstream GitHub Actions via the GitHub API using repository_dispatch events.
98
+
-[trigger-on-new-data](scripts/trigger-on-new-data) - Triggers downstream GitHub Actions if the provided `upload-to-s3` outputs do not contain the `identical_file_message`
98
99
A hacky way to ensure that we only trigger downstream phylogenetic builds if the S3 objects have been updated.
99
100
101
+
100
102
NCBI interaction scripts that are useful for fetching public metadata and sequences.
101
103
102
-
-[fetch-from-ncbi-entrez](fetch-from-ncbi-entrez) - Fetch metadata and nucleotide sequences from [NCBI Entrez](https://www.ncbi.nlm.nih.gov/books/NBK25501/) and output to a GenBank file.
104
+
-[fetch-from-ncbi-entrez](scripts/fetch-from-ncbi-entrez) - Fetch metadata and nucleotide sequences from [NCBI Entrez](https://www.ncbi.nlm.nih.gov/books/NBK25501/) and output to a GenBank file.
103
105
Useful for pathogens with metadata and annotations in custom fields that are not part of the standard [NCBI Datasets](https://www.ncbi.nlm.nih.gov/datasets/) outputs.
104
106
105
-
Historically, some pathogen repos used the undocumented NCBI Virus API through [fetch-from-ncbi-virus](https://github.com/nextstrain/ingest/blob/c97df238518171c2b1574bec0349a55855d1e7a7/fetch-from-ncbi-virus) to fetch data. However we've opted to drop the NCBI Virus scripts due to https://github.com/nextstrain/ingest/issues/18.
107
+
Historically, some pathogen repos used the undocumented NCBI Virus API through [fetch-from-ncbi-virus](https://github.com/nextstrain/shared/blob/c97df238518171c2b1574bec0349a55855d1e7a7/fetch-from-ncbi-virus) to fetch data. However we've opted to drop the NCBI Virus scripts due to https://github.com/nextstrain/shared/issues/18.
106
108
107
109
Potential Nextstrain CLI scripts
108
110
109
-
-[sha256sum](sha256sum) - Used to check if files are identical in upload-to-s3 and download-from-s3 scripts.
110
-
-[cloudfront-invalidate](cloudfront-invalidate) - CloudFront invalidation is already supported in the [nextstrain remote command for S3 files](https://github.com/nextstrain/cli/blob/a5dda9c0579ece7acbd8e2c32a4bbe95df7c0bce/nextstrain/cli/remote/s3.py#L104).
111
+
-[sha256sum](scripts/sha256sum) - Used to check if files are identical in upload-to-s3 and download-from-s3 scripts.
112
+
-[cloudfront-invalidate](scripts/cloudfront-invalidate) - CloudFront invalidation is already supported in the [nextstrain remote command for S3 files](https://github.com/nextstrain/cli/blob/a5dda9c0579ece7acbd8e2c32a4bbe95df7c0bce/nextstrain/cli/remote/s3.py#L104).
111
113
This exists as a separate script to support CloudFront invalidation when using the upload-to-s3 script.
112
-
-[upload-to-s3](upload-to-s3) - Upload file to AWS S3 bucket with compression based on file extension in S3 URL.
114
+
-[upload-to-s3](scripts/upload-to-s3) - Upload file to AWS S3 bucket with compression based on file extension in S3 URL.
113
115
Skips upload if the local file's hash is identical to the S3 object's metadata `sha256sum`.
114
116
Adds the following user defined metadata to uploaded S3 object:
115
-
-`sha256sum` - hash of the file generated by [sha256sum](sha256sum)
117
+
-`sha256sum` - hash of the file generated by [sha256sum](scripts/sha256sum)
116
118
-`recordcount` - the line count of the file
117
-
-[download-from-s3](download-from-s3) - Download file from AWS S3 bucket with decompression based on file extension in S3 URL.
119
+
-[download-from-s3](scripts/download-from-s3) - Download file from AWS S3 bucket with decompression based on file extension in S3 URL.
118
120
Skips download if the local file already exists and has a hash identical to the S3 object's metadata `sha256sum`.
119
121
122
+
## Snakemake
123
+
124
+
Snakemake workflow functions that are shared across many pathogen workflows that don’t really belong in any of our existing tools.
125
+
126
+
-[config.smk](snakemake/config.smk) - Shared functions for handling workflow configs.
127
+
-[remote_files.smk](snakemake/remote_files.smk) - Exposes the `path_or_url` function which will use Snakemake's storage plugins to download/upload files to remote providers as needed.
128
+
129
+
120
130
## Software requirements
121
131
122
132
Some scripts may require Bash ≥4. If you are running these scripts on macOS, the builtin Bash (`/bin/bash`) does not meet this requirement. You can install [Homebrew's Bash](https://formulae.brew.sh/formula/bash) which is more up to date.
0 commit comments