Replace integrate.quad with native spline integration method#299
Replace integrate.quad with native spline integration method#299tomaskontrimas wants to merge 5 commits intomasterfrom
Conversation
|
Omitting this check: skyllh/skyllh/analyses/i3/publicdata_ps/signalpdf.py Lines 86 to 95 in d6a927a would speed up create_analysis further to 9.16s It is kind of nice to have, but I'm not sure if people will try to create their own |
|
Yes, the correct normalization of the signal PDF should be enforced somehow, at least if people want to contribute to the development. So a unittest that ensures normalized PDF sounds like a good alternative to me. I remember that I (or Martin) put it there when I was developing the signal energy PDF for the public data, as it is an annoying sum of splines, and normalization needs to be carefully checked. But now that works, and I think it should always work when you try to build an energy PDF from the smearing matrix... |
| if norm: | ||
| self.norm = integrate.quad(self.__call__, self.x_min, self.x_max, limit=200, full_output=1)[0] | ||
| # The spline is defined only in the `x` (bincenters) interval by construction. | ||
| # We chose to not extrapolate oor values. |
There was a problem hiding this comment.
Why use bin centers instead of bin edges here? Is it because of PchipInterpolator?
There was a problem hiding this comment.
Yes, the def __call__(self, x, oor_value=0): method sets values outside of bin centers range to 0, but calling native self.spl_f.integrate method (that provides this speedup) fails outside of bounds. I think the self.norm result is the same by definition though.
We could use the extrapolate option:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PchipInterpolator.integrate.html#scipy.interpolate.PchipInterpolator.integrate
but then it wouldn't match the pdf evaluation setup, so they both would need to be updated.
There was a problem hiding this comment.
So, in other words, the current method of integration extends to the bin edges, but it's just adding zeros, whereas the new one simply integrates up to where values might not be zero?
There was a problem hiding this comment.
The fit of NGC 1068 could reveal any difference, if any, because there are many signal events. So if there is some discrepancy due to normalization, it will add up there
There was a problem hiding this comment.
So, in other words, the current method of integration extends to the bin edges, but it's just adding zeros, whereas the new one simply integrates up to where values might not be zero?
yes
|
Blocked by #302 |
There was a problem hiding this comment.
Pull request overview
This PR speeds up the public-data point source analysis initialization (create_analysis) by removing expensive scipy.integrate.quad calls and replacing them with spline-native integration routines.
Changes:
- Switch
FctSpline1Dnormalization fromintegrate.quadtoPchipInterpolator.integrate. - Switch effective-area integration in
PDAeff.get_detection_prob_for_decnufromquad(splev)tointerpolate.splint. - Remove a
quad-based normalization integrity check inPDSignalEnergyPDF(and related SciPy imports).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
skyllh/analyses/i3/publicdata_ps/utils.py |
Uses PchipInterpolator.integrate for faster spline normalization. |
skyllh/analyses/i3/publicdata_ps/signalpdf.py |
Drops quad import and removes quad-based normalization integrity check. |
skyllh/analyses/i3/publicdata_ps/aeff.py |
Replaces quad-based spline integration with interpolate.splint for detection probabilities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| spl = interpolate.splrep(x, y, k=1, s=0) | ||
|
|
||
| def _eval_spl_func(x): | ||
| return interpolate.splev(x, spl, der=0, ext=1) | ||
|
|
||
| norm = integrate.quad(_eval_spl_func, enu_range_min, enu_range_max, limit=200, full_output=1)[0] | ||
| norm = interpolate.splint(enu_range_min, enu_range_max, spl) | ||
|
|
||
| enu_min = np.atleast_1d(enu_min) | ||
| enu_max = np.atleast_1d(enu_max) | ||
|
|
||
| det_prob = np.empty((len(enu_min),), dtype=np.double) | ||
| for i in range(len(enu_min)): | ||
| integral = integrate.quad(_eval_spl_func, enu_min[i], enu_max[i], limit=200, full_output=1)[0] | ||
|
|
||
| det_prob[i] = integral / norm | ||
| det_prob[i] = interpolate.splint(enu_min[i], enu_max[i], spl) / norm |
There was a problem hiding this comment.
From docs: splint silently assumes that the spline function is zero outside the data interval (a, b).
which I think is better than the suggested implementation and it still pass existing analysis tests
| # Add the PDF axes. | ||
| self.add_axis(PDFAxis(name='log_energy', vmin=self.log10_reco_e_min, vmax=self.log10_reco_e_max)) | ||
|
|
There was a problem hiding this comment.
Covered it in the integration test
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
The d6a927a change speeds up create_analysis call from 16.5s to 12.6s