Skip to content

Update data model, regex, grammar, and JSON schema and add graphic generation code for isotopic fine structure#18

Merged
mobiusklein merged 5 commits intomainfrom
feature/isotopic_fine_structure
Mar 28, 2025
Merged

Update data model, regex, grammar, and JSON schema and add graphic generation code for isotopic fine structure#18
mobiusklein merged 5 commits intomainfrom
feature/isotopic_fine_structure

Conversation

@mobiusklein
Copy link
Copy Markdown
Contributor

@mobiusklein mobiusklein commented Mar 6, 2025

This covers the discussion from the previous two sessions on isotopic fine structure support. It also adds notebook code for generating a figure to demonstrate the effect of resolution on peak shape and isotopic fine structure decomposition.

image

Current TODOs:

  • Finish the JSON serialization and deserialization logic for isotopic variants
  • Revisit the topic of isotopic decomposition or otherwise abdicate responsibility for defining how to compute fine structure average weighting for +xiA

@mobiusklein mobiusklein marked this pull request as ready for review March 7, 2025 01:55
@mobiusklein mobiusklein requested a review from edeutsch March 7, 2025 01:55
@mobiusklein
Copy link
Copy Markdown
Contributor Author

mobiusklein commented Mar 7, 2025

@douweschulte this implements the annotation data model and parsing. If you want to go into how you saw doing the calculation should be done, go ahead.

Otherwise this should be ready to go.

@edeutsch
Copy link
Copy Markdown
Contributor

edeutsch commented Mar 7, 2025

The figure looks excellent! Are you able to compute and annotate the difference in ppm between the centroid and 13C for the resolving power 50000 peak?

Copy link
Copy Markdown
Contributor

@edeutsch edeutsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd suggest allowing any amino acid letter A-Z, but otherwise great, thanks!

Comment thread specification/grammars/annotation.lark Outdated
@@ -38,6 +38,14 @@ NUMBER : DIGIT ("." (DIGIT)+)?

AMINO_ACID : "A" | "R" | "N" | "D" | "C" | "E" | "Q" | "G" | "H" | "K" | "M" | "F" | "P" | "S" | "T" | "W" | "Y" | "V" | "I" | "L"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a strong reason to limit ourselves to these? The next most common would be selenocysteine (U). Plenty of human proteins have selenocysteine, e.g.:
https://www.uniprot.org/uniprotkb/P07203/entry#sequences
UWPR also defines O:
https://proteomicsresource.washington.edu/protocols06/masses.php
J is defined as I or L:
https://www.bioinformatics.org/sms2/iupac.html
for which you can calculate an exact mass.
One could argue that the rest you can't calculate a mass for because they're ambiguous.
But still, why not allow them?
Oh, I think ProForma formally encourages X to have a 0 mass and you can do X[+123.0222] to specify some other artificial amino acid (of which there are plenty?) that might be common in synthetic peptides.

Why not just allow all A-Z?

@edeutsch edeutsch requested a review from henryhlam March 7, 2025 15:45
@mobiusklein
Copy link
Copy Markdown
Contributor Author

I missed the first comment and the question in it.

In this example, the $\delta\ \text{m/z}$ at 50,000 resolution between iA and i13C is 0.0002 Da. For larger molecules and different compositions, the $\delta$ will change.

For instance, if you were to use a different composition like C23H14Cl2N6O8S2 instead, you'd get:

image

which has a $\delta\ \text{m/z}$ of -0.0004 Da for iA vs i13C.

image
image

So the difference is vanishingly small, but in specific scenarios at high resolution there is information available that would be lost otherwise. Computing the $\delta$ being left as "an exercise for the reader" is a little frustrating because it makes it difficult to know you're using exactly the same number that the writer used, but it's also impractical to encode.

Of course, if you're using a text encoding of a floating point number, assuming everything beyond the fourth or fifth significant figure is noise is pragmatic and is unlikely to make a substantial difference in any case.

@mobiusklein
Copy link
Copy Markdown
Contributor Author

Sleep deprivation, request for clarification: did you want the change in m/z between the average peak and the isotopic peak to be included in the figure directly? It'll be ugly, I'd rather show that as a table. Too much line noise in small text disturbing the smooth beauty of the spectrum itself.

Resolution Isotope Delta m/z Averge Peak m/z Isotope m/z
50000 $^{13}$C -0.00022 512.1789 512.1792
50000 $^{2}$H -0.00315 512.1789 512.1821
50000 $^{15}$N 0.006096 512.1789 512.1729
50000 $^{17}$O -0.00109 512.1789 512.18
50000 $^{33}$S 0.003743 512.1789 512.1752
100000 $^{13}$C -8.60E-06 512.1792 512.1792
100000 $^{2}$H -0.00293 512.1792 512.1821
100000 $^{15}$N 0.006311 512.1792 512.1729
100000 $^{17}$O -0.00087 512.1792 512.18
100000 $^{33}$S 0.003958 512.1792 512.1752
200000 $^{13}$C 5.17E-06 512.1792 512.1792
200000 $^{2}$H -0.00292 512.1792 512.1821
200000 $^{15}$N 8.37E-05 512.1729 512.1729
200000 $^{17}$O -0.00086 512.1792 512.18
200000 $^{33}$S -0.00227 512.1729 512.1752
1000000 $^{13}$C 1.49E-09 512.1792 512.1792
1000000 $^{2}$H -0.00292 512.1792 512.1821
1000000 $^{15}$N -1.02E-10 512.1729 512.1729
1000000 $^{17}$O -0.00086 512.1792 512.18
1000000 $^{33}$S -0.00235 512.1729 512.1752

@edeutsch
Copy link
Copy Markdown
Contributor

edeutsch commented Mar 10, 2025

I was just thinking of something like this:
image

which I think illustrates the approximate scale of the difference between +i and +iA for the specific molecule selected as an example (which might be considered typical of a ~y6+i fragment of a peptide?)

Comment thread specification/annotation-schema.json Outdated
Comment thread specification/grammars/annotation.lark Outdated
@mobiusklein
Copy link
Copy Markdown
Contributor Author

image

@mobiusklein
Copy link
Copy Markdown
Contributor Author

image

@edeutsch
Copy link
Copy Markdown
Contributor

Hi @mobiusklein , it might be stacking the deck a bit with extra sulfurs and nitrogens, but how about something like y4{CMNR}, which I think is something like C18H35N8O6S2? I wonder if that delta may be as much as 1 ppm?

@mobiusklein
Copy link
Copy Markdown
Contributor Author

I had been planning something a bit less extreme, taken from the IgG peptide GLEWVAVMSYNGNNK:
image

Here's your y ion:
image

I added a flavor of the annotation code that is less magic numbers-heavy. This demonstrates nicely that the same resolving power for a smaller m/z gets you narrower peaks than for a larger m/z. I will address the notes about the JSON schema before pushing the code changes.

@edeutsch
Copy link
Copy Markdown
Contributor

great, thanks. I'd favor the more extreme so that it's clearer that it can make a substantial difference, but whatever you prefer is fine. thanks!

@mobiusklein mobiusklein merged commit 3ad9e36 into main Mar 28, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants