Supervised machine learning for analysing spectra of exoplanetary atmospheres



The use of machine learning is becoming ubiquitous in astronomy1,2,3, but remains rare in the study of the atmospheres of exoplanets. Given the spectrum of an exoplanetary atmosphere, a multi-parameter space is swept through in real time to find the best-fit model4,5,6. Known as atmospheric retrieval, this technique originates in the Earth and planetary sciences7. Such methods are very time-consuming, and by necessity there is a compromise between physical and chemical realism and computational feasibility. Machine learning has previously been used to determine which molecules to include in the model, but the retrieval itself was still performed using standard methods8. Here, we report an adaptation of the ‘random forest’ method of supervised machine learning9,10, trained on a precomputed grid of atmospheric models, which retrieves full posterior distributions of the abundances of molecules and the cloud opacity. The use of a precomputed grid allows a large part of the computational burden to be shifted offline. We demonstrate our technique on a transmission spectrum of the hot gas-giant exoplanet WASP-12b using a five-parameter model (temperature, a constant cloud opacity and the volume mixing ratios or relative abundances of molecules of water, ammonia and hydrogen cyanide)11. We obtain results consistent with the standard nested-sampling retrieval method. We also estimate the sensitivity of the measured spectrum to the model parameters, and we are able to quantify the information content of the spectrum. Our method can be straightforwardly applied using more sophisticated atmospheric models to interpret an ensemble of spectra without having to retrain the random forest.

Fig. 1: Posterior distributions of the volume mixing ratios, temperature and cloud opacity obtained from the machine-learning retrieval analysis of the WFC3 transmission spectrum of WASP-12b.
Fig. 2: Posterior distributions of the volume mixing ratios, temperature and cloud opacity obtained from the nested-sampling retrieval.
Fig. 3: True versus random forest predicted values of the five parameters in our transmission spectrum model.
Fig. 4: Feature importance plots associated with the machine-learning retrieval analysis of the WFC3 transmission spectrum of WASP-12b.

We acknowledge partial financial support from the Center for Space and Habitability (P.M.-N. and K.H.), the University of Bern International 2021 PhD Fellowship (C.F.), the PlanetS National Center of Competence in Research (K.H.), the Swiss National Science Foundation (R.S., C.F. and K.H.), the European Research Council via a Consolidator Grant (K.H.) and the Swiss-based MERAC Foundation (K.H.).

Author contributions

P.M.-N. led the development of computer codes used for this study, performed the machine-learning-related calculations, participated in the experimental design and made the majority of the figures. C.F. computed the grid of atmospheric models used as the training set, participated in the experimental design and performed the nested-sampling retrievals. R.S. co-led the scientific vision and experimental design and co-wrote the manuscript. K.H. co-led the scientific vision and experimental design and led the writing and typesetting of the manuscript.

