InDel_Toolkit

Overview

Useful scripts for dealing with sequencing data from insertional and deletional mutational scanning experiments.

Features

stickleback: A python script for mapping long-reads sequencing data with engineered insertion sequences.
smelt: A python script for parsing and counting insertions and deletions of defined size in sam files.

Installation

Prerequisites

Python 3.0+
And python packages, via pip:

pip install numpy
pip install pandas
pip install multiprocessing
pip install Bio
pip install Levenshtein

Module Versions Tested:

numpy: '1.26.4' * Note: this was developed and run prior to release of numpy 2.+, however no conflicts are expected. Please notify us of any usage issues.
pandas: '2.2.2'
multiprocessing: '2.6.2.1'
Bio: '1.7.1'
levenshtein: '0.25.1'

To install these tested module versions, via pip:

pip install numpy==1.26.4
pip install pandas==2.2.2
pip install multiprocessing==2.6.2.1
pip install Bio==1.7.1
pip install Levenshtein==0.25.1

Installing

Clone the repository:

git clone https://github.com/QVEU/InDel_Toolkit.git

Navigate to the project directory:
```
cd InDel_Toolkit
```

Command-Line Usage

stickleback Example

stickleback was design to map long-read sequencing libraries from nanopore or pacbio containing engineered libraries containing any defined insertion introduced via the SPINE pipeline.

python stickleback.py <pathto/input.sam (str)> <query sequence (str)> </path/to/templateFasta (str)> [Min Read Length (int)] [Max Read Length (int)]

# <> = req'd argument, []= optional argument

To identify reads with insertions from an example SAM:

> cd InDel_Toolkit/
> python lib/stickleback.py test/stickleback_test.sam agcgggagaccggggtctctgagcg lib/templates/puc19-ev71-twtainan1998_4643-bsmbi-and-bsai-free-deleted-1-annotations-1-7471.fasta 


----------------=============-----------------
--==--==--==--==   ><```º>   ==--==--==--==--=
==--==--==--==-- stickleback --==--==--==--==-
----------------=============-----------------


Query Length: 25
Template Length: 7471
Template-Query Distance: 9
Mapping Reads of Size 25 to 7471 with a cutoff of 7

Loading SAM: test/stickleback_test.sam
Total Candidate Reads: 250

Outfile: test/stickleback_test_stickleback.csv

1. Computing minimum distance hit position for 247 reads.
Query: AGCGGGAGACCGGGGTCTCTGAGCG
2. Mapping minimum distance site on template sequence...
done.

Mapped hits in 36 reads.
Done in 0.03501284917195638 minutes.
Wrote test/stickleback_test_stickleback.csv.

smelt Example

smelt compliments Stickleback, and is used to tabulate deletions across coding sequences. It takes an SAM file (from Illumina Sequencing, or other high-accuracy NGS methods), reads the cigar strings, and identifies reads with deletions of a given size relative to the reference. It identifies these deletions on the translated sequence relative to a translated nucelotide reference to resolve ambiguities, and is therefore specifically designed for engineered libraries where specific codons are deleted.

python smelt.py <input_file.sam (str)> <Deletion Size (int)> <output.csv (str)>

# <> = req'd argument

To identify reads with 3 bp deletions from an example SAM:

cd InDel_Toolkit/lib/
python smelt.py ../test/smelt_test.sam 3 ../test/smelt_test.csv

Referencing

Please cite: Bakhache, W., Orr, W., McCormick, L. & Dolan, P., Uncovering Structural Plasticity of Enterovirus A through Deep Insertional and Deletional Scanning. Research Square (Preprint) (2024).

Contributing

Steps to contribute:

Fork the repository.
Create a new branch (git checkout -b feature-branch).
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Contact

Email: [email protected]

Twitter: @drptdolan

GitHub Issues: Submit an issue

Acknowledgements

The authors are thankful for minimap2.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
lib		lib
test		test
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
Computational_Explanation.png		Computational_Explanation.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InDel_Toolkit

Overview

Features

Installation

Prerequisites

Installing

Command-Line Usage

stickleback Example

smelt Example

Referencing

Contributing

License

Contact

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InDel_Toolkit

Overview

Features

Installation

Prerequisites

Installing

Command-Line Usage

stickleback Example

smelt Example

Referencing

Contributing

License

Contact

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages