Deprecate get_most_severe_consequence_for_summary in favor of more flexible get_most_severe_csq_from_multiple_csq_lists#714
Conversation
- Allow lof_flags to be missing in addition to lof_flags == "" for lof == "HC" to have no flag penalty - Pass `csq_order` to `add_most_severe_consequence_to_consequence`
…ble to use in `default_generate_gene_lof_matrix`
… into jg/fix_process_consequences
…nsequence_to_consequence more flexible
…sq_lists and make use of it in process_consequences
… into jg/fix_process_consequences # Conflicts: # gnomad/utils/vep.py
…s` to `filter_to_most_severe_consequences` and clean them up
…f `get_most_severe_consequence_for_summary`
…ttps://github.com/broadinstitute/gnomad_methods into jg/deprecate_get_most_severe_consequence_for_summary
…ttps://github.com/broadinstitute/gnomad_methods into jg/deprecate_get_most_severe_consequence_for_summary
… if it's empty after filtering
|
|
||
| - most_severe_consequence: Most severe consequence for variant. | ||
| - lof: Whether the variant is a loss-of-function variant. | ||
| - no_lof_flags: Whether the variant has any LOFTEE flags (True if no flags). |
There was a problem hiding this comment.
this returns either True or None right? why not True or False?
There was a problem hiding this comment.
I made some changes, so I'm not sure what the result was before, but now in my test on 2 partitions of the exomes result HT, I get {False: 21, True: 580, None: 78785}. Let me know if you are still seeing no False
tests/utils/test_vep.py
Outdated
| ) | ||
|
|
||
| # Test the function | ||
| result = get_most_severe_csq_from_multiple_csq_lists(vep_expr) |
There was a problem hiding this comment.
also tested with setting prioritize_loftee_no_flags to True or False -- does it make sense that got the same result regardless of how set this param was set?
There was a problem hiding this comment.
With the current vep_expr, yes. I added some changes to it, and a test for it
| ) | ||
|
|
||
| # Build the case expression to determine the most severe consequence. | ||
| ms_csq_expr = hl.case(missing_false=True) |
There was a problem hiding this comment.
why use case expressions?
There was a problem hiding this comment.
Is there a reason not to? If you have a clearer solution, I'm happy to make modifications, just let me know what your thinking
gnomad/utils/vep.py
Outdated
| ) | ||
|
|
||
| # Initialize the lof struct with missing values. | ||
| lof_expr = hl.struct(lof=hl.missing(hl.tstr), no_lof_flags=hl.missing(hl.tbool)) |
There was a problem hiding this comment.
so lof and no_lof_flags will always be None if prioritize_loftee/prioritize_loftee_no_flags are False, even if lof flags are present?
There was a problem hiding this comment.
I modified this a bit, let me know if the changes make sense
… into jg/deprecate_get_most_severe_consequence_for_summary
| @pytest.mark.parametrize( | ||
| "prioritize_protein_coding, prioritize_loftee, prioritize_loftee_no_flags, additional_order_field, additional_order, expected_most_severe_csq, expected_polyphen_prediction", | ||
| [ | ||
| (False, False, False, None, None, None, None), |
There was a problem hiding this comment.
why only set expected_most_severe_csq if it's "stop_gained"?
| else polyphen_prediction | ||
| ), | ||
| ) | ||
|
|
There was a problem hiding this comment.
| # Define csq, protein_coding, lof, no_lof_flags, and polyphen_prediction. |
| additional_order=additional_order, | ||
| ) | ||
|
|
||
| expected_dict = hl.Struct( |
There was a problem hiding this comment.
how were the values for the expected dict decided?
| + (["lof"] if prioritize_loftee else []) | ||
| + ( | ||
| ["no_lof_flags"] | ||
| if prioritize_loftee_no_flags or prioritize_loftee |
There was a problem hiding this comment.
i find it confusing when no_lof_flags is True in cases where prioritize_loftee is True and prioritize_loftee_no_flags is False and there are flags present
| :return: ArrayExpression with of the consequences that match the most severe | ||
| consequence. | ||
| """ | ||
| # Get the dtype of the csq_expr ArrayExpression elements |
There was a problem hiding this comment.
| # Get the dtype of the csq_expr ArrayExpression elements | |
| # Get the dtype of the csq_expr ArrayExpression elements. |
| (True, True, False, *polyphen_params, None, "possibly_damaging"), | ||
| (True, False, True, *polyphen_params, None, "possibly_damaging"), | ||
| (False, True, True, *polyphen_params, "stop_gained", None), | ||
| # Need to figure out class too large error |
Also adds:
filter_to_most_severe_consequences, which is used byget_most_severe_csq_from_multiple_csq_listsloftee_labelsandno_lof_flagsparameters tofilter_vep_transcript_csqs_exprfor filtering by loftee labels and flags.Depends on #713
updates_to_get_most_severe_consequence_for_summary.html.zip testing to make sure the same results are returned for
get_most_severe_consequence_for_summary