Skip to content

Replace integrate.quad with native spline integration method#299

Open
tomaskontrimas wants to merge 5 commits intomasterfrom
update_integrals
Open

Replace integrate.quad with native spline integration method#299
tomaskontrimas wants to merge 5 commits intomasterfrom
update_integrals

Conversation

@tomaskontrimas
Copy link
Copy Markdown
Collaborator

The d6a927a change speeds up create_analysis call from 16.5s to 12.6s

@tomaskontrimas tomaskontrimas self-assigned this Apr 13, 2026
@tomaskontrimas tomaskontrimas added the performance Performance could be improved label Apr 13, 2026
@tomaskontrimas
Copy link
Copy Markdown
Collaborator Author

Omitting this check:

# Check integrity.
integral = (
integrate.quad(
self.f_e_spl.evaluate, self.log10_reco_e_min, self.log10_reco_e_max, limit=200, full_output=1
)[0]
/ self.f_e_spl.norm
)
if not np.isclose(integral, 1):
raise ValueError(f'The integral over log10_reco_e of the energy term must be unity! But it is {integral}!')

would speed up create_analysis further to 9.16s

It is kind of nice to have, but I'm not sure if people will try to create their own PDSignalEnergyPDF, otherwise it should always pass by construction from PDSignalEnergyPDFSet, I think. Maybe we could cover it with create_analysis tests for both 10yr and 14yr datasets?

@chiarabellenghi
Copy link
Copy Markdown
Member

Yes, the correct normalization of the signal PDF should be enforced somehow, at least if people want to contribute to the development. So a unittest that ensures normalized PDF sounds like a good alternative to me.

I remember that I (or Martin) put it there when I was developing the signal energy PDF for the public data, as it is an annoying sum of splines, and normalization needs to be carefully checked. But now that works, and I think it should always work when you try to build an energy PDF from the smearing matrix...

if norm:
self.norm = integrate.quad(self.__call__, self.x_min, self.x_max, limit=200, full_output=1)[0]
# The spline is defined only in the `x` (bincenters) interval by construction.
# We chose to not extrapolate oor values.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use bin centers instead of bin edges here? Is it because of PchipInterpolator?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the def __call__(self, x, oor_value=0): method sets values outside of bin centers range to 0, but calling native self.spl_f.integrate method (that provides this speedup) fails outside of bounds. I think the self.norm result is the same by definition though.

We could use the extrapolate option:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.PchipInterpolator.integrate.html#scipy.interpolate.PchipInterpolator.integrate
but then it wouldn't match the pdf evaluation setup, so they both would need to be updated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in other words, the current method of integration extends to the bin edges, but it's just adding zeros, whereas the new one simply integrates up to where values might not be zero?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fit of NGC 1068 could reveal any difference, if any, because there are many signal events. So if there is some discrepancy due to normalization, it will add up there

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, in other words, the current method of integration extends to the bin edges, but it's just adding zeros, whereas the new one simply integrates up to where values might not be zero?

yes

@tomaskontrimas
Copy link
Copy Markdown
Collaborator Author

Blocked by #302

@tomaskontrimas tomaskontrimas marked this pull request as ready for review April 16, 2026 19:04
Copilot AI review requested due to automatic review settings April 16, 2026 19:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up the public-data point source analysis initialization (create_analysis) by removing expensive scipy.integrate.quad calls and replacing them with spline-native integration routines.

Changes:

  • Switch FctSpline1D normalization from integrate.quad to PchipInterpolator.integrate.
  • Switch effective-area integration in PDAeff.get_detection_prob_for_decnu from quad(splev) to interpolate.splint.
  • Remove a quad-based normalization integrity check in PDSignalEnergyPDF (and related SciPy imports).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
skyllh/analyses/i3/publicdata_ps/utils.py Uses PchipInterpolator.integrate for faster spline normalization.
skyllh/analyses/i3/publicdata_ps/signalpdf.py Drops quad import and removes quad-based normalization integrity check.
skyllh/analyses/i3/publicdata_ps/aeff.py Replaces quad-based spline integration with interpolate.splint for detection probabilities.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 324 to +333
spl = interpolate.splrep(x, y, k=1, s=0)

def _eval_spl_func(x):
return interpolate.splev(x, spl, der=0, ext=1)

norm = integrate.quad(_eval_spl_func, enu_range_min, enu_range_max, limit=200, full_output=1)[0]
norm = interpolate.splint(enu_range_min, enu_range_max, spl)

enu_min = np.atleast_1d(enu_min)
enu_max = np.atleast_1d(enu_max)

det_prob = np.empty((len(enu_min),), dtype=np.double)
for i in range(len(enu_min)):
integral = integrate.quad(_eval_spl_func, enu_min[i], enu_max[i], limit=200, full_output=1)[0]

det_prob[i] = integral / norm
det_prob[i] = interpolate.splint(enu_min[i], enu_max[i], spl) / norm
Copy link
Copy Markdown
Collaborator Author

@tomaskontrimas tomaskontrimas Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From docs: splint silently assumes that the spline function is zero outside the data interval (a, b).

which I think is better than the suggested implementation and it still pass existing analysis tests

Comment on lines 82 to 84
# Add the PDF axes.
self.add_axis(PDFAxis(name='log_energy', vmin=self.log10_reco_e_min, vmax=self.log10_reco_e_max))

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covered it in the integration test

Comment thread skyllh/analyses/i3/publicdata_ps/utils.py Outdated
Co-authored-by: Copilot Autofix powered by AI <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Performance could be improved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants