Skip to content

Commit 6de68a5

Browse files
update
1 parent 1d3b64e commit 6de68a5

8 files changed

Lines changed: 406 additions & 8 deletions

File tree

.github/copilot-instructions.md

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
2+
3+
4+
# ProForma Notation - Basic Summary
5+
6+
1 - Never make summary documentation unles specifically asked.
7+
2 - check makfile for commands
8+
9+
## Documentation & Comments
10+
11+
### Docstring Format
12+
13+
Use **Google-style docstrings** but keep them minimal - type hints handle the rest.
14+
15+
**Simple function:**
16+
```python
17+
def calculate_mass(sequence: str, charge: int = 1) -> float:
18+
"""Calculate the mass-to-charge ratio of a peptide."""
19+
```
20+
21+
**When you need more detail:**
22+
```python
23+
def find_isotopes(mz: float, tolerance: float = 0.01) -> list[Peak]:
24+
"""Find isotopic peaks within the tolerance window.
25+
26+
Uses a greedy algorithm to identify the most intense peaks first,
27+
then searches for their isotopic patterns.
28+
"""
29+
```
30+
31+
**Classes:**
32+
```python
33+
class Peptide:
34+
"""Represents a peptide sequence with ProForma modifications."""
35+
```
36+
37+
### What to Document
38+
39+
- **One-line summary** for all public functions/classes
40+
- **Additional details** only when the implementation is non-obvious
41+
- **Don't repeat** what's already in type hints
42+
- **Private functions** (`_name`) can skip docstrings if obvious
43+
44+
### Building Docs
45+
```bash
46+
cd docs
47+
make html
48+
# View at docs/_build/html/index.html
49+
```
50+
51+
see **proforma.schema.json** for the full ProForma 2.0 json object specification.
52+
53+
## What is ProForma?
54+
55+
ProForma is a **standardized text notation for representing peptides and proteins with modifications**. It's designed to be both human-readable and machine-parsable, allowing scientists to precisely describe modified peptide sequences in mass spectrometry data.
56+
57+
## Core Concept
58+
59+
Think of it as a way to write: **"amino acid sequence + where modifications are located + what those modifications are"**
60+
61+
## Basic Examples
62+
63+
### 1. Simple Unmodified Peptide
64+
```
65+
PEPTIDE
66+
```
67+
Just amino acids using standard one-letter codes (A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y)
68+
69+
### 2. Peptide with Modification
70+
```
71+
PEM[Oxidation]TIDE
72+
```
73+
- Methionine (M) is oxidized
74+
- Modifications go in square brackets `[]` right after the modified amino acid
75+
76+
### 3. Multiple Modifications
77+
```
78+
PEM[Oxidation]TIS[Phospho]DE
79+
```
80+
- M is oxidized
81+
- S is phosphorylated
82+
83+
### 4. Terminal Modifications
84+
```
85+
[Acetyl]-PEPTIDE
86+
[iTRAQ4plex]-PEPTIDE-[Amidated]
87+
```
88+
- N-terminal modifications: `[mod]-` before sequence
89+
- C-terminal modifications: `-[mod]` after sequence
90+
91+
## Ways to Specify Modifications
92+
93+
ProForma supports multiple ways to describe the same modification:
94+
95+
```
96+
EM[Oxidation]TIDE # By name (Unimod)
97+
EM[UNIMOD:35]TIDE # By accession number
98+
EM[+15.995]TIDE # By mass change
99+
EM[Formula:O]TIDE # By chemical formula
100+
```
101+
102+
## Key Advanced Features
103+
104+
### Ambiguous Modification Position
105+
When you know a modification exists but not exactly where:
106+
```
107+
[Phospho]?PEPTIDE # Phospho is somewhere, location unknown
108+
```
109+
110+
### Multiple Possible Sites
111+
```
112+
PEP[Phospho#g1]TIS[#g1]DE # Phospho is on either T or S
113+
```
114+
115+
### Labile Modifications
116+
Modifications that fall off during fragmentation:
117+
```
118+
{Glycan:Hex}PEPTIDE # Glycan present but lost in MS2
119+
```
120+
121+
### Cross-linked Peptides
122+
This is somewhat handled at the parsing level but will not will not be implmented in the codebased. Dont worry about this too much.
123+
```
124+
PEPTK[#XL1]IDE//SEQK[#XL1] # Two peptides linked together
125+
```
126+
127+
### Chimeric Spectra
128+
Multiple peptides in same spectrum:
129+
This is somewhat handled at the parsing level but will not will not be implmented int eh codebased. Dont worry about this too much.
130+
```
131+
PEPTIDE+SEQUENCE # Two co-eluting peptides
132+
```
133+
134+
### Charge States
135+
```
136+
PEPTIDE/2 # Charge state +2
137+
```
138+
139+
### Charge Adducts
140+
```
141+
PEPTIDE/[Na+:z+1] # Sodium adduct with +1 charge
142+
PEPTIDE/[Na+:z+1^2] # added 2 times (total charge: +2)
143+
EPT[Formula:Zn:z+2]IDE/[Na:z+1^2] # total +4
144+
145+
```
146+
147+
both charge and charge adduct cannot occur simultaneously.
148+
149+
```
150+
PEPTIDE/[Na+:z+1^2] # Sodium adduct with +1 charge 2 times
151+
```
152+
153+
154+
155+
## Compliance Levels
156+
157+
ProForma has different levels of complexity:
158+
159+
1. **Base-ProForma** - Simple sequences with basic modifications
160+
2. **Level 2-ProForma** - Adds ambiguity, formulas, delta masses
161+
3. **Extensions** - Specialized features for:
162+
- Top-down proteomics
163+
- Cross-linking
164+
- Glycoproteomics
165+
- Advanced complexity
166+
167+
## Common Use Cases
168+
169+
### Bottom-up Proteomics
170+
```
171+
[Acetyl]-EM[Oxidation]EVTSES[Phospho]PEK
172+
```
173+
Typical tryptic peptide with PTMs
174+
175+
### Top-down Proteomics
176+
```
177+
<[Oxidation]@M>FULLPROTEINSEQUENCE...
178+
```
179+
Full protein with fixed modifications
180+
181+
### Glycopeptide
182+
```
183+
NEEYN[Glycan:Hex5HexNAc4]K
184+
```
185+
N-glycosylation site
186+
187+
### Cross-linking
188+
```
189+
PEPTK[XLMOD:02001#XL1]IDE//SEQK[#XL1]
190+
```
191+
DSS cross-link between two lysines
192+
193+
## Why ProForma?
194+
195+
**Before ProForma:** Everyone used different formats to describe modified peptides
196+
- Hard to share data
197+
- Hard to write software that works with different tools
198+
- Ambiguous representations
199+
200+
**With ProForma:** Standard notation means:
201+
- Data can be easily exchanged between labs
202+
- Software tools can interoperate
203+
- Unambiguous communication of results
204+
- Integration with databases (Unimod, PSI-MOD, etc.)
205+
206+
## Key Design Principles
207+
208+
1. **Human readable** - Scientists can read and understand it
209+
2. **Machine parsable** - Software can reliably parse it
210+
3. **Extensible** - Can add new features as needs evolve
211+
4. **Precise** - Captures uncertainty and ambiguity when present
212+
5. **Standards-based** - Uses controlled vocabularies (Unimod, PSI-MOD, etc.)
213+
214+

.github/workflows/draft-pdf.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Draft PDF
2+
on:
3+
push:
4+
branches:
5+
- paper
6+
paths:
7+
- 'paper/**'
8+
workflow_dispatch:
9+
jobs:
10+
paper:
11+
runs-on: ubuntu-latest
12+
name: Paper Draft
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v4
16+
- name: Build draft PDF
17+
uses: openjournals/openjournals-draft-action@master
18+
with:
19+
journal: joss
20+
# This should be the path to the paper within your repo.
21+
paper-path: paper/paper.md
22+
- name: Upload
23+
uses: actions/upload-artifact@v4
24+
with:
25+
name: paper
26+
# This is the output path where Pandoc will write the compiled
27+
# PDF. Note, this should be the same directory as the input
28+
# paper.md
29+
path: paper/paper.pdf
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
3+
4+
name: Python package
5+
6+
on:
7+
push:
8+
paths:
9+
- 'src/**'
10+
- 'tests/**'
11+
workflow_dispatch:
12+
13+
jobs:
14+
build:
15+
16+
runs-on: ubuntu-latest
17+
strategy:
18+
fail-fast: false
19+
matrix:
20+
python-version: ["3.12"]
21+
22+
steps:
23+
- uses: actions/checkout@v4
24+
- name: Set up Python ${{ matrix.python-version }}
25+
uses: actions/setup-python@v5
26+
with:
27+
python-version: ${{ matrix.python-version }}
28+
- name: Install uv
29+
uses: astral-sh/setup-uv@v4
30+
- name: Install just
31+
uses: extractions/setup-just@v2
32+
- name: Install dependencies
33+
run: just install-all
34+
- name: Lint with ruff
35+
run: just lint
36+
- name: Type check with ty
37+
run: just check
38+
- name: Test with pytest
39+
run: just test-cov codecov-tests
40+
- name: Upload coverage reports to Codecov
41+
uses: codecov/codecov-action@v5
42+
with:
43+
token: ${{ secrets.CODECOV_TOKEN }}
44+
slug: tacular-omics/paftacular
45+
fail_ci_if_error: false
46+
- name: Upload test results to Codecov
47+
if: ${{ !cancelled() }}
48+
uses: codecov/codecov-action@v5
49+
with:
50+
token: ${{ secrets.CODECOV_TOKEN }}
51+
slug: tacular-omics/paftacular
52+
report_type: test_results
53+
fail_ci_if_error: false
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# This workflow will upload a Python Package using Twine when a release is created
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
3+
4+
# This workflow uses actions that are not certified by GitHub.
5+
# They are provided by a third-party and are governed by
6+
# separate terms of service, privacy policy, and support
7+
# documentation.
8+
9+
name: Upload Python Package
10+
11+
on:
12+
release:
13+
types: [published]
14+
15+
permissions:
16+
contents: read
17+
18+
jobs:
19+
deploy:
20+
21+
runs-on: ubuntu-latest
22+
23+
steps:
24+
- uses: actions/checkout@v4
25+
- name: Set up Python
26+
uses: actions/setup-python@v5
27+
with:
28+
python-version: '3.x'
29+
- name: Install uv
30+
uses: astral-sh/setup-uv@v4
31+
- name: Build package with uv
32+
run: uv build
33+
- name: Publish package
34+
uses: pypa/gh-action-pypi-publish@release/v1
35+
with:
36+
user: __token__
37+
password: ${{ secrets.PYPI_API_TOKEN }}

.gitignore

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,47 @@
1+
# Python-generated files
2+
__pycache__/
3+
*.py[oc]
4+
build/
5+
dist/
6+
wheels/
7+
*.egg-info
8+
9+
# IDEs and editors
10+
.idea/
11+
.vscode/
12+
.zed/
13+
14+
# Virtual environments
15+
.venv
16+
17+
docs/_build/*
18+
19+
# Testing and coverage reports
20+
try*.py
21+
.coverage
22+
.ruff_cache/
23+
.mypy_cache/
24+
.pytest_cache/
25+
.ipynb_checkpoints/
26+
htmlcov/
27+
uv.lock
28+
29+
# Other
30+
coverage.xml
31+
htmlcov/
32+
.coverage
33+
*.cover
34+
.pytest_cache/
35+
junit.xml
36+
37+
38+
*GNOme.obo
39+
*PSI-MOD.obo
40+
*UNIMOD.obo
41+
*XLMod.obo
42+
43+
*try_*.py
44+
145
# Byte-compiled / optimized / DLL files
246
__pycache__/
347
*.py[cod]
@@ -113,3 +157,11 @@ dmypy.json
113157
# editors
114158
.vscode/
115159
.idea/
160+
161+
# Other
162+
coverage.xml
163+
htmlcov/
164+
.coverage
165+
*.cover
166+
.pytest_cache/
167+
junit.xml

0 commit comments

Comments
 (0)