Skip to content

Explore/fix IterativeCleanup multivalue handling limitation and document #231

@kspurgin

Description

@kspurgin

The problem

Initial worksheet to client:

| field_collection_site | site_qualifier | sitenumberorname_orig                        |
|-----------------------+----------------+----------------------------------------------|
| CA-PLA-2907           |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |
| CA-PLA-2908           |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |
| CA-PLA-38?            |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |
| CA-YUB-5              |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |

Returned worksheet looks like:

| field_collection_site | site_qualifier | sitenumberorname_orig                        |
|-----------------------+----------------+----------------------------------------------|
| CA-PLA-2907           |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |
| CA-PLA-2908           |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |
| CA-PLA-38             | uncertain      | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |
| CA-YUB-5              |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 |

*__returned_compiled looks like:

| field_collection_site | site_qualifier | sitenumberorname_orig                         | corrected                            |
|-----------------------+----------------+-----------------------------------------------+--------------------------------------|
| CA-PLA-2907           |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05  |                                      |
| CA-PLA-2908           |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05  |                                      |
| CA-PLA-38             | uncertain      | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05  | field_collection_site|site_qualifier |
| CA-YUB-5              |                | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05  |                                      |

THEN, because *__corrections keeps only rows that actually make corrections, we get:

| field_collection_site | site_qualifier | sitenumberorname_orig                         | corrected                            |
|-----------------------+----------------+-----------------------------------------------+--------------------------------------|
| CA-PLA-38             | uncertain      | CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05  | field_collection_site|site_qualifier |

*__base_job_cleaned merges "CA-PLA-38" in for all 4 rows in the base job (on which worksheet is based):

| sitenumberorname_orig                        | field_collection_site | site_qualifier |
|----------------------------------------------+-----------------------+----------------|
| CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 | CA-PLA-38             | uncertain      |
| CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 | CA-PLA-38             | uncertain      |
| CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 | CA-PLA-38             | uncertain      |
| CA-PLA-2907|CA-PLA-2908|CA-PLA-38?|CA-YUB-05 | CA-PLA-38             | uncertain      |

This is the data returned in *__final, and merged into the migration project from there.

We have lost the three other site values.

Potential solutions

Make *__corrections return not just rows with corrections, but most recent corrected row for the whole-row match

NOPE.

Doesn't change how corrections will get merged in based on the :corrected row by *__base_job_cleaned

Remind self if the collate settings might be useful for this

todo

Redo the format of mod.base_job

todo

This is likely what will need to be done. And this limitation and a workaround pattern needs to be documented.

Fancy magic added to IterativeCleanup mixin

There's probably some way to automagically handle this or add a setting/mode to deal with it. But heck if I have time to dive into all that right now.

Remember, the problem and how you might solve it without creating OTHER problems is complicated by the fact that this needs to deal with additional iterations of a worksheet being returned as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions