astrophot.fit package
Submodules
astrophot.fit.base module
- class astrophot.fit.base.BaseOptimizer(model: AstroPhot_Model, initial_state: Sequence = None, relative_tolerance: float = 0.001, fit_window: Window | None = None, **kwargs)[source]
Bases:
object
Base optimizer object that other optimizers inherit from. Ensures consistent signature for the classes.
- Parameters:
model – an AstroPhot_Model object that will have its (unlocked) parameters optimized [AstroPhot_Model]
initial_state – optional initialization for the parameters as a 1D tensor [tensor]
max_iter – maximum allowed number of iterations [int]
relative_tolerance – tolerance for counting success steps as: 0 < (Chi2^2 - Chi1^2)/Chi1^2 < tol [float]
- static chi2contour(n_params: int, confidence: float = 0.682689492137) float [source]
Calculates the chi^2 contour for the given number of parameters.
- Parameters:
n_params (int) – The number of parameters.
confidence (float, optional) – The confidence interval (default is 0.682689492137).
- Returns:
The calculated chi^2 contour value.
- Return type:
float
- Raises:
RuntimeError – If unable to compute the Chi^2 contour for the given number of parameters.
- chi2min() float [source]
Returns the minimum value of chi^2 loss in the loss history.
- Returns:
Minimum value of chi^2 loss.
- Return type:
float
- fit() BaseOptimizer [source]
- Raises:
NotImplementedError – Error is raised if this method is not implemented in a subclass of BaseOptimizer.
astrophot.fit.gp module
astrophot.fit.gradient module
- class astrophot.fit.gradient.Grad(model: AstroPhot_Model, initial_state: Sequence = None, **kwargs)[source]
Bases:
BaseOptimizer
A gradient descent optimization wrapper for AstroPhot_Model objects.
The default method is “NAdam”, a variant of the Adam optimization algorithm. This optimizer uses a combination of gradient descent and Nesterov momentum for faster convergence. The optimizer is instantiated with a set of initial parameters and optimization options provided by the user. The fit method performs the optimization, taking a series of gradient steps until a stopping criteria is met.
- Parameters:
model (AstroPhot_Model) – an AstroPhot_Model object with which to perform optimization.
initial_state (torch.Tensor, optional) – an optional initial state for optimization.
method (str, optional) – the optimization method to use for the update step. Defaults to “NAdam”.
patience (int or None, optional) – the number of iterations without improvement before the optimizer will exit early. Defaults to None.
optim_kwargs (dict, optional) – a dictionary of keyword arguments to pass to the pytorch optimizer.
- model
the AstroPhot_Model object to optimize.
- Type:
- current_state
the current state of the parameters being optimized.
- Type:
torch.Tensor
- iteration
the number of iterations performed during the optimization.
- Type:
int
- loss_history
the history of loss values at each iteration of the optimization.
- Type:
list
- lambda_history
the history of parameter values at each iteration of the optimization.
- Type:
list
- optimizer
the PyTorch optimizer object being used.
- Type:
torch.optimizer
- patience
the number of iterations without improvement before the optimizer will exit early.
- Type:
int or None
- method
the optimization method being used.
- Type:
str
- optim_kwargs
the dictionary of keyword arguments passed to the PyTorch optimizer.
- Type:
dict
- fit() BaseOptimizer [source]
Perform an iterative fit of the model parameters using the specified optimizer.
The fit procedure continues until a stopping criteria is met, such as the maximum number of iterations being reached, or no improvement being made after a specified number of iterations.
astrophot.fit.hmc module
- class astrophot.fit.hmc.HMC(model: AstroPhot_Model, initial_state: Sequence | None = None, max_iter: int = 1000, **kwargs)[source]
Bases:
BaseOptimizer
Hamiltonian Monte-Carlo sampler wrapper for the Pyro package.
This MCMC algorithm uses gradients of the Chi^2 to more efficiently explore the probability distribution. Consider using the NUTS sampler instead of HMC, as it is generally better in most aspects.
More information on HMC can be found at: https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo, https://arxiv.org/abs/1701.02434, and http://www.mcmchandbook.net/HandbookChapter5.pdf
- Parameters:
model (AstroPhot_Model) – The model which will be sampled.
initial_state (Optional[Sequence], optional) – A 1D array with the values for each parameter in the model. These values should be in the form of “as_representation” in the model. Defaults to None.
max_iter (int, optional) – The number of sampling steps to perform. Defaults to 1000.
epsilon (float, optional) – The length of the integration step to perform for each leapfrog iteration. The momentum update will be of order epsilon * score. Defaults to 1e-5.
leapfrog_steps (int, optional) – Number of steps to perform with leapfrog integrator per sample of the HMC. Defaults to 20.
inv_mass (float or array, optional) – Inverse Mass matrix (covariance matrix) which can tune the behavior in each dimension to ensure better mixing when sampling. Defaults to the identity.
progress_bar (bool, optional) – Whether to display a progress bar during sampling. Defaults to True.
prior (distribution, optional) – Prior distribution for the parameters. Defaults to None.
warmup (int, optional) – Number of warmup steps before actual sampling begins. Defaults to 100.
hmc_kwargs (dict, optional) – Additional keyword arguments for the HMC sampler. Defaults to {}.
mcmc_kwargs (dict, optional) – Additional keyword arguments for the MCMC process. Defaults to {}.
astrophot.fit.iterative module
- class astrophot.fit.iterative.Iter(model: ~astrophot.models.core_model.AstroPhot_Model, method: ~astrophot.fit.base.BaseOptimizer = <class 'astrophot.fit.lm.LM'>, initial_state: ~numpy.ndarray | None = None, max_iter: int = 100, method_kwargs: ~typing.Dict[str, ~typing.Any] = {}, **kwargs: ~typing.Dict[str, ~typing.Any])[source]
Bases:
BaseOptimizer
Optimizer wrapper that performs optimization iteratively.
This optimizer applies a different optimizer to a group model iteratively. It can be used for complex fits or when the number of models to fit is too large to fit in memory.
- Parameters:
model – An AstroPhot_Model object to perform optimization on.
method – The optimizer class to apply at each iteration step.
initial_state – Optional initial state for optimization, defaults to None.
max_iter – Maximum number of iterations, defaults to 100.
method_kwargs – Keyword arguments to pass to method.
**kwargs – Additional keyword arguments.
- ndf
Degrees of freedom of the data.
- method
The optimizer class to apply at each iteration step. Default: Levenberg-Marquardt
- method_kwargs
Keyword arguments to pass to method.
- iteration
The number of iterations performed.
- lambda_history
A list of the states at each iteration step.
- loss_history
A list of the losses at each iteration step
- fit() BaseOptimizer [source]
Fit the models to the target.
- sub_step(model: AstroPhot_Model) None [source]
Perform optimization for a single model.
- Parameters:
model – The model to perform optimization on.
- class astrophot.fit.iterative.Iter_LM(model: AstroPhot_Model, initial_state: Sequence | None = None, chunks: int | tuple = 50, max_iter: int = 100, method: str = 'random', LM_kwargs: dict = {}, **kwargs: Dict[str, Any])[source]
Bases:
BaseOptimizer
Optimization wrapper that call LM optimizer on subsets of variables.
Iter_LM takes the full set of parameters for a model and breaks them down into chunks as specified by the user. It then calls Levenberg-Marquardt optimization on the subset of parameters, and iterates through all subsets until every parameter has been optimized. It cycles through these chunks until convergence. This method is very powerful in situations where the full optimization problem cannot fit in memory, or where the optimization problem is too complex to tackle as a single large problem. In full LM optimization a single problematic parameter can ripple into issues with every other parameter, so breaking the problem down can sometimes make an otherwise intractable problem easier. For small problems with only a few models, it is likely better to optimize the full problem with LM as, when it works, LM is faster than the Iter_LM method.
- Parameters:
chunks (Union[int, tuple]) – Specify how to break down the model parameters. If an integer, at each iteration the algorithm will break the parameters into groups of that size. If a tuple, should be a tuple of tuples of strings which give an explicit pairing of parameters to optimize, note that it is allowed to have variable size chunks this way. Default: 50
method (str) – How to iterate through the chunks. Should be one of: random, sequential. Default: random
astrophot.fit.lm module
- class astrophot.fit.lm.LM(model, initial_state: Sequence | None = None, max_iter: int = 100, relative_tolerance: float = 1e-05, **kwargs)[source]
Bases:
BaseOptimizer
The LM class is an implementation of the Levenberg-Marquardt optimization algorithm. This method is used to solve non-linear least squares problems and is known for its robustness and efficiency.
The Levenberg-Marquardt (LM) algorithm is an iterative method used for solving non-linear least squares problems. It can be seen as a combination of the Gauss-Newton method and the gradient descent method: it works like the Gauss-Newton method when the parameters are approximately near a minimum, and like the gradient descent method when the parameters are far from their optimal values.
The cost function that the LM algorithm tries to minimize is of the form:
\[f(\boldsymbol{\beta}) = \frac{1}{2}\sum_{i=1}^{N} r_i(\boldsymbol{\beta})^2\]where \(\boldsymbol{\beta}\) is the vector of parameters, \(r_i\) are the residuals, and \(N\) is the number of observations.
The LM algorithm iteratively performs the following update to the parameters:
\[\boldsymbol{\beta}_{n+1} = \boldsymbol{\beta}_{n} - (J^T J + \lambda diag(J^T J))^{-1} J^T \boldsymbol{r}\]- where:
\(J\) is the Jacobian matrix whose elements are \(J_{ij} = \frac{\partial r_i}{\partial \beta_j}\),
\(\boldsymbol{r}\) is the vector of residuals \(r_i(\boldsymbol{\beta})\),
\(\lambda\) is a damping factor which is adjusted at each iteration.
When \(\lambda = 0\) this can be seen as the Gauss-Newton method. In the limit that \(\lambda\) is large, the \(J^T J\) matrix (an approximation of the Hessian) becomes subdominant and the update essentially points along \(J^T \boldsymbol{r}\) which is the gradient. In this scenario the gradient descent direction is also modified by the \(\lambda diag(J^T J)\) scaling which in some sense makes each gradient unitless and further improves the step. Note as well that as \(\lambda\) gets larger the step taken will be smaller, which helps to ensure convergence when the initial guess of the parameters are far from the optimal solution.
Note that the residuals \(r_i\) are typically also scaled by the variance of the pixels, but this does not change the equations above. For a detailed explanation of the LM method see the article by Henri Gavin on which much of the AstroPhot LM implementation is based:
@article{Gavin2019, title={The Levenberg-Marquardt algorithm for nonlinear least squares curve-fitting problems}, author={Gavin, Henri P}, journal={Department of Civil and Environmental Engineering, Duke University}, volume={19}, year={2019} }
as well as the paper on LM geodesic acceleration by Mark Transtrum:
@article{Tanstrum2012, author = {{Transtrum}, Mark K. and {Sethna}, James P.}, title = "{Improvements to the Levenberg-Marquardt algorithm for nonlinear least-squares minimization}", year = 2012, doi = {10.48550/arXiv.1201.5885}, adsurl = {https://ui.adsabs.harvard.edu/abs/2012arXiv1201.5885T}, }
The damping factor \(\lambda\) is adjusted at each iteration: it is effectively increased when we are far from the solution, and decreased when we are close to it. In practice, the algorithm attempts to pick the smallest \(\lambda\) that is can while making sure that the \(\chi^2\) decreases at each step.
The main advantage of the LM algorithm is its adaptability. When the current estimate is far from the optimum, the algorithm behaves like a gradient descent to ensure global convergence. However, when it is close to the optimum, it behaves like Gauss-Newton, which provides fast local convergence.
In practice, the algorithm is typically implemented with various enhancements to improve its performance. For example, the Jacobian may be approximated with finite differences, geodesic acceleration can be used to speed up convergence, and more sophisticated strategies can be used to adjust the damping factor \(\lambda\).
The exact performance of the LM algorithm will depend on the nature of the problem, including the complexity of the function f(x), the quality of the initial estimate x0, and the distribution of the data.
The LM class implemented for AstroPhot takes a model, initial state, and various other optional parameters as inputs and seeks to find the parameters that minimize the cost function.
- Parameters:
model – The model to be optimized.
initial_state (Sequence) – Initial values for the parameters to be optimized.
max_iter (int) – Maximum number of iterations for the algorithm.
relative_tolerance (float) – Tolerance level for relative change in cost function value to trigger termination of the algorithm.
fit_parameters_identity – Used to select a subset of parameters. This is mostly used internally.
verbose – Controls the verbosity of the output during optimization. A higher value results in more detailed output. If not provided, defaults to 0 (no output).
max_step_iter (optional) – The maximum number of steps while searching for chi^2 improvement on a single Jacobian evaluation. Default is 10.
curvature_limit (optional) – Controls how cautious the optimizer is for changing curvature. It should be a number greater than 0, where smaller is more cautious. Default is 1.
Ldn (Lup and) – These adjust the step sizes for the damping parameter. Default is 5 and 3 respectively.
L0 (optional) – This is the starting damping parameter. For easy problems with good initialization, this can be set lower. Default is 1.
acceleration (optional) – Controls the use of geodesic acceleration, which can be helpful in some scenarios. Set 1 for full acceleration, 0 for no acceleration. Default is 0.
Here is some basic usage of the LM optimizer:
import astrophot as ap # build model # ... # Initialize model parameters model.initialize() # Fit the parameters result = ap.fit.lm(model, verbose = 1) # Check that a minimum was found print(result.message) # See the minimum chi^2 value print(f"min chi2: {result.res_loss()}") # Update parameter uncertainties result.update_uncertainty() # Extract multivariate Gaussian of uncertainties mu = result.res() cov = result.covariance_matrix
- property covariance_matrix: Tensor
The covariance matrix for the model at the current parameters. This can be used to construct a full Gaussian PDF for the parameters using: \(\mathcal{N}(\mu,\Sigma)\) where \(\mu\) is the optimized parameters and \(\Sigma\) is the covariance matrix.
- fit() BaseOptimizer [source]
This performs the fitting operation. It iterates the LM step function until convergence is reached. Includes a message after fitting to indicate how the fitting exited. Typically if the message returns a “success” then the algorithm found a minimum. This may be the desired solution, or a pathological local minimum, this often depends on the initial conditions.
- step(chi2) Tensor [source]
Performs one step of the LM algorithm. Computes Jacobian, infers hessian and gradient, solves for step vector and iterates on damping parameter magnitude until a step with some improvement in chi2 is found. Used internally.
- update_hess_grad(natural=False) None [source]
Updates the stored hessian matrix and gradient vector. This can be used to compute the quantities in thier natural parameter represntation. During normal optimization the hessian and gradient are computed in a re-mapped parameter space where parameters are defined form -inf to inf.
astrophot.fit.mhmcmc module
- class astrophot.fit.mhmcmc.MHMCMC(model: AstroPhot_Model, initial_state: Sequence | None = None, max_iter: int = 1000, **kwargs)[source]
Bases:
BaseOptimizer
Metropolis-Hastings Markov-Chain Monte-Carlo sampler, based on: https://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm . This is a naive implimentation of a standard MCMC, it is far from optimal and should not be used for anything but the most basic scenarios.
- Parameters:
model (AstroPhot_Model) – The model which will be sampled.
initial_state (Optional[Sequence]) – A 1D array with the values for each parameter in the model. Note that these values should be in the form of “as_representation” in the model.
max_iter (int) – The number of sampling steps to perform. Default 1000
epsilon (float or array) – The random step length to take at each iteration. This is the standard deviation for the normal distribution sampling. Default 1e-2
- static accept(log_alpha)[source]
Evaluates randomly if a given proposal is accepted. This is done in log space which is more natural for the evaluation in the step.
- property acceptance
Returns the ratio of accepted states to total states sampled.
astrophot.fit.minifit module
- class astrophot.fit.minifit.MiniFit(model: ~astrophot.models.core_model.AstroPhot_Model, downsample_factor: int = 1, max_pixels: int = 10000, method: ~astrophot.fit.base.BaseOptimizer = <class 'astrophot.fit.lm.LM'>, initial_state: ~numpy.ndarray | None = None, method_kwargs: ~typing.Dict[str, ~typing.Any] = {}, **kwargs: ~typing.Dict[str, ~typing.Any])[source]
Bases:
BaseOptimizer
- fit() BaseOptimizer [source]
- Raises:
NotImplementedError – Error is raised if this method is not implemented in a subclass of BaseOptimizer.
astrophot.fit.nuts module
- class astrophot.fit.nuts.NUTS(model: AstroPhot_Model, initial_state: Sequence | None = None, max_iter: int = 1000, **kwargs)[source]
Bases:
BaseOptimizer
No U-Turn Sampler (NUTS) implementation for Hamiltonian Monte Carlo (HMC) based MCMC sampling.
This is a wrapper for the Pyro package: https://docs.pyro.ai/en/stable/index.html
The NUTS class provides an implementation of the No-U-Turn Sampler (NUTS) algorithm, which is a variation of the Hamiltonian Monte Carlo (HMC) method for Markov Chain Monte Carlo (MCMC) sampling. This implementation uses the Pyro library to perform the sampling. The NUTS algorithm utilizes gradients of the target distribution to more efficiently explore the probability distribution of the model.
More information on HMC and NUTS can be found at: https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo, https://arxiv.org/abs/1701.02434, and http://www.mcmchandbook.net/HandbookChapter5.pdf
- Parameters:
model (AstroPhot_Model) – The model which will be sampled.
initial_state (Optional[Sequence], optional) – A 1D array with the values for each parameter in the model. These values should be in the form of “as_representation” in the model. Defaults to None.
max_iter (int, optional) – The number of sampling steps to perform. Defaults to 1000.
epsilon (float, optional) – The step size for the NUTS sampler. Defaults to 1e-3.
inv_mass (Optional[Tensor], optional) – Inverse Mass matrix (covariance matrix) for the Hamiltonian system. Defaults to None.
progress_bar (bool, optional) – If True, display a progress bar during sampling. Defaults to True.
prior (Optional[Distribution], optional) – Prior distribution for the model parameters. Defaults to None.
warmup (int, optional) – Number of warmup (or burn-in) steps to perform before sampling. Defaults to 100.
nuts_kwargs (Dict[str, Any], optional) – A dictionary of additional keyword arguments to pass to the NUTS sampler. Defaults to {}.
mcmc_kwargs (Dict[str, Any], optional) – A dictionary of additional keyword arguments to pass to the MCMC function. Defaults to {}.
- fit(state
Optional[torch.Tensor] = None, nsamples: Optional[int] = None, restart_chain: bool = True) -> ‘NUTS’: Performs the MCMC sampling using a NUTS HMC and records the chain for later examination.
astrophot.fit.oldlm module
- class astrophot.fit.oldlm.LM_Constraint(constraint_func: Callable[[Tensor, Any], Tensor], constraint_args: tuple = (), representation_parameters: bool = False, out_len: int = 1, reduce_ndf: float = 0.0, weight: Tensor | None = None, reference_value: Tensor | None = None, **kwargs)[source]
Bases:
object
Add an arbitrary constraint to the LM optimization algorithm.
Expresses a constraint between parameters in the LM optimization routine. Constraints may be used to bias parameters to have certain behaviour, for example you may require the radius of one model to be larger than that of another, or may require two models to have the same position on the sky. The constraints defined in this object are fuzzy constraints and so can be broken to some degree, the amount of constraint breaking is determined my how informative the data is and how strong the constraint weight is set. To create a constraint, first construct a function which takes as argument a 1D tensor of the model parameters and gives as output a real number (or 1D tensor of real numbers) which is zero when the constraint is satisfied and non-zero increasing based on how much the constraint is violated. For example:
- def example_constraint(P):
return (P[1] - P[0]) * (P[1] > P[0]).int()
which enforces that parameter 1 is less than parameter 0. Note that we do not use any control flow “if” statements and instead incorporate the condition through multiplication, this is important as it allows pytorch to compute derivatives through the expression and performs far faster on GPU since no communication is needed back and forth to handle the if-statement. Keep this in mind while constructing your constraint function. Also, make sure that any math operations are performed by pytorch so it can construct a computational graph. Bayond the requirement that the constraint be differentiable, there is no limitation on what constraints can be built with this system.
- Parameters:
constraint_func (Callable[torch.Tensor, torch.Tensor]) – python function which takes in a 1D tensor of parameters and generates real values in a tensor.
constraint_args (Optional[tuple]) – An optional tuple of arguments for the constraint function that will be unpacked when calling the function.
weight (torch.Tensor) – The weight of this constraint in the range (0,inf). Smaller values mean a stronger constraint, larger values mean a weaker constraint. Default 1.
representation_parameters (bool) – if the constraint_func expects the parameters in the form of their representation or their standard value. Default False
out_len (int) – the length of the output tensor by constraint_func. Default 1
reference_value (torch.Tensor) – The value at which the constraint is satisfied. Default 0.
reduce_ndf (float) – Amount by which to reduce the degrees of freedom. Default 0.
- jacobian(model: AstroPhot_Model)[source]
- class astrophot.fit.oldlm.oldLM(model: AstroPhot_Model, initial_state: Sequence = None, max_iter: int = 100, fit_parameters_identity: tuple | None = None, **kwargs)[source]
Bases:
BaseOptimizer
based heavily on: @article{gavin2019levenberg,
title={The Levenberg-Marquardt algorithm for nonlinear least squares curve-fitting problems}, author={Gavin, Henri P}, journal={Department of Civil and Environmental Engineering, Duke University}, volume={19}, year={2019}
}
The Levenberg-Marquardt algorithm bridges the gap between a gradient descent optimizer and a Newton’s Method optimizer. The Hessian for the Newton’s Method update is too complex to evaluate with automatic differentiation (memory scales roughly as parameters^2 * pixels^2) and so an approximation is made using the Jacobian of the image pixels wrt to the parameters of the model. Automatic differentiation provides an exact Jacobian as opposed to a finite differences approximation.
Once a Hessian H and gradient G have been determined, the update step is defined as h which is the solution to the linear equation:
(H + L*I)h = G
where L is the Levenberg-Marquardt damping parameter and I is the identity matrix. For small L this is just the Newton’s method, for large L this is just a small gradient descent step (approximately h = grad/L). The method implimented is modified from Gavin 2019.
- Parameters:
model (AstroPhot_Model) – object with which to perform optimization
initial_state (Optional[Sequence]) – an initial state for optimization
epsilon4 (Optional[float]) – approximation accuracy requirement, for any rho < epsilon4 the step will be rejected. Default 0.1
epsilon5 (Optional[float]) – numerical stability factor, added to the diagonal of the Hessian. Default 1e-8
constraints (Optional[Union[LM_Constraint,tuple[LM_Constraint]]]) – Constraint objects which control the fitting process.
L0 (Optional[float]) – initial value for L factor in (H +L*I)h = G. Default 1.
Lup (Optional[float]) – amount to increase L when rejecting an update step. Default 11.
Ldn (Optional[float]) – amount to decrease L when accetping an update step. Default 9.
- property covariance_matrix: Tensor
- fit()[source]
- Raises:
NotImplementedError – Error is raised if this method is not implemented in a subclass of BaseOptimizer.
- update_J_AD() None [source]
Update the jacobian using automatic differentiation, produces an accurate jacobian at the current state.
- update_J_Broyden(h, Yp, Yph) None [source]
Use the Broyden update to approximate the new Jacobian tensor at the current state. Less accurate, but far faster.
- update_J_natural() None [source]
Update the jacobian using automatic differentiation, produces an accurate jacobian at the current state. Use this method to get the jacobian in the parameter space instead of representation space.
- update_h() Tensor [source]
Solves the LM update linear equation (H + L*I)h = G to determine the proposal for how to adjust the parameters to decrease the chi2.