Skip to content

Simultaneous Batch Correction and DESeq Comparisons between Cell Subsets across Batches #376

@Willt1128

Description

@Willt1128

Hello, I am attempting to use PyDESeq2 with scRNA seq data from 4 separate sequencing batches. I am attempting to do comparisons between individual cell types (which have 3 replicates per cell type per batch, derived by pseudobulking data from 3 separate organoids per batch) across batches. However, I also want to do batch correction. I am finding that I cannot do both batch correction and statistical comparisons between different cell types across batches.

From other forum posts regarding DESeq2 and PyDESeq2, I understand that batch effects must be accounted for by including batch in one's design (e.g., design = "~batch + cell_type"). I also understand that batch correction cannot be done before running PyDESeq2 because it requires the raw counts data as an input.

One thought I had was to make a separate column in my adata.obs instance called comparison_group, which combines both batch and cell type (e.g., for astrocytes in batch D250WT, "D250WT_astrocytes"). Then I could either run PyDESeq2 with design = "~comparison_group" or design = "~batch + comparison_group". Unsurprisingly, using design = "~batch + comparison_group" produces a Singular Matrix error, preventing the DESeq from being completed. Using design = "~comparison_group" allows the DESeq to run, but I am concerned that this will not have a comparable effect to simply modeling 'batch' as a covariate by including it in the design, given that there are many different cell types within each batch.

I was also considering subsetting adata prior to DESeq to exclusively include pseudobulked samples belonging to one cell type, then iteratively performing DESeq for each cell type, but I don't believe this is a good solution because its batch correction's effectiveness is dependent upon the batch effect being homogenous across cell types.

Does anyone know how I can do an effective batch correction while also doing a DESeq (including statistical comparisons) between specific cell types (i.e., subsets of each batch) across different batches?

Please let me know if anything needs clarification. In case it is helpful, I have included an example screenshot of my metadata for 2 batches. Thank you very much.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions