catalog_to_dd.py: input and output of dt.cc #516
catalog_to_dd.py: input and output of dt.cc #516erhmestel wants to merge 6 commits intoeqcorrscan:developfrom
Conversation
Correlations from existing_corr_file added to write_correlations & compute_differential_times. Default None skips new sections of code. 1. existing_corr_file read in with new function: _read_correlation_file returning a existing event pairs dictionary keyed to core event, with list of paired events. 2. existing_pairs dictionary used to remove events paired with core event from sub_catalog so they are not recalculated 3. existing_corr_file read in & output into output file, followed by newly calculated correlations. Added option of output_filename to write_correlations. Default "dt.cc" Works as is, could be more completely implemented. Things to do: - implement checking whether there are stations missing from existing correlations - speed up saving existing correlations output (do I need to write out every line individually?) - sort output by event numbers (interleaving existing with new) - check existing correlations for those which wouldn't be calculated otherwise?
Same changes as previous commit, but made to master version of code, rather than older one I was working with.
Comma error removed
| with open(existing_corr_file, "r") as in_f: | ||
| for line in in_f: | ||
| f.write(line) | ||
| ## TODO: speed up saving existing correlations output (do I need to write out every line individually?) |
There was a problem hiding this comment.
If you want to just dump everything you can do something like:
with open(blah) as f:
f.write(["\n".join(l for l in in_f])assuming that your lines don't already have a newline character at the end of them. If they do then you can just ''.join(in_f) and write that string.
|
Thanks for this @erhmestel! Can you update your branch to track develop (press the "Update branch" button underneath all the test ticks, then pull the updated branch back to your local computer) please? This looks good, even if stickler doesn't think so. I like using PyCharm for this kind of big project and I set up a ruler at 80 characters long (I forget how to do this, but this SO answer has it. PyCharm is pretty good at removing trailing whitespace for you and you can set up code linters (flake8 is the one run by stickler I think) which will highlight non pep8 friendly code. I would go for cleaning that up, then work on writing a short test that tests this new behaviour. Once we have a test then we can play around with optimising. If there are any edge cases that you can think of that might result in odd behaviour (e.g. when no new events have been added, or if events are in the correlation file, but not in the catalog provided for the update, ...) then having tests for them would be helpful as well so that we can avoid code crashing as much as possible! I should also use this as an opportunity to better document:
If you have any suggestions of where those docs should go (e.g. where you would look for them) then let me know, otherwise we can put them in in all the relevant places. |
|
@erhmestel - I'm going to try and push out a new release of EQcorrscan next week and I would like to include this in there. Do you know what still needs to be done to this to get it in, and do you need me to do any of those things? |
What does this PR do?
Editing catalog_to_dd.py to allow input of existing correlation measurements and selection of output filename. Defaults keep everything the same as it currently is.
existing_corr_fileoption added towrite_correlations&compute_differential_times. DefaultNoneskips new sections of code.existing_corr_fileread in with new function:_read_correlation_filereturning a existing event pairs dictionary keyed to core event, with list of paired events.existing_pairsdictionary used to remove events paired with core event fromsub_catalogso they are not recalculatedexisting_corr_fileread in & output into output file, followed by newly calculated correlations.output_filenametowrite_correlations. Default"dt.cc"Why was it initiated? Any relevant Issues?
To prevent having to recalculate correlations each time new events are added & to allow the output of different correlation files without overwriting.
Not been extensively tested. In small amount of testing code works as is, but could be more completely implemented. Things to do:
PR Checklist
developbase branch selected?CHANGES.md.CONTRIBUTORS.md.Boxes left unchecked when I don't know