Inferring stellar metallicity and elemental abundances from kinematic and spectroscopic data using machine learning: Implications for exoplanet host stars

Adibekyan, V.; Soares, B. M. T. B.; Sousa, S. G.; Santos, N. C.; Delgado-Mena, E.; Minchev, I.; Chertovskih, R.; Martirosyan, Zh.; Israelian, G.; Hakobyan, A. A.
Bibliographical reference

Astronomy and Astrophysics

Advertised on:
6
2026
Number of authors
10
IAC number of authors
1
Citations
0
Refereed citations
0
Description
Context. Elemental abundances of FGK stars can be derived routinely from high-resolution optical spectra, but this remains considerably more difficult for cooler stars. Even estimates of the chemical composition of planet-host stars are central for studies of planet formation and interior composition, however. Machine-learning methods offer a practical route for inferring otherwise inaccessible abundances from more widely available stellar data. Aims. We investigate how much information on stellar metallicity and selected elemental abundances is encoded in kinematic and orbital properties, and whether these empirical relations can be used to estimate abundances and abundance ratios relevant for the characterisation of exoplanet hosts. Methods. We used a large APOGEE DR17 sample of red giant stars as the main training set and an independent HARPS sample of nearby FGK dwarfs for the external validation. From astrometry and radial velocities, we derived Galactic velocities, orbital parameters, and stellar population-membership probabilities. We benchmarked several machine-learning regressors, optimised the strongest models, and analysed the feature importance using gain-based metrics, permutation importance, single-feature models, and SHapley Additive exPlanations (SHAP) values. We also explored the prediction for C and O from Mg, Si, and [Fe/H] and derived simple empirical relations between selected abundance ratios (Fe/Si, Mg/Si, C/O, and Fe/O) and metallicity. Results. Kinematic information alone recovers only a limited fraction of the variance in stellar metallicity, with a clear performance ceiling at Root Mean Square Error (RMSE) ∼0.20 dex. The most informative predictor is the maximum vertical orbital excursion, Zmax, followed by radial orbital parameters. When [Fe/H] is combined with kinematic information, the abundances of C, O, Mg, and Si are predicted significantly more accurately than with the baseline approximation [X/H] = [Fe/H]. In contrast, when C and O are predicted from Mg, Si, and [Fe/H], most of the predictive power is already contained in the elemental abundances themselves, with Mg being the dominant contributor, and the addition of kinematic information provides little improvement. The trained models reproduced the main abundance trends associated with Galactic chemical evolution. We found that the slopes of the relations between Fe/Si, Mg/Si, C/O, and Fe/O and metallicity differ slightly between the HARPS and APOGEE samples, with fractional differences generally below 17%. Conclusions. Stellar kinematics and orbital parameters contain only limited information on stellar metallicity, and the predictive performance is primarily constrained by the intrinsic information content of the input features and not by the choice of the model architecture. Nevertheless, when combined with metallicity, these parameters enable accurate predictions of elemental abundances, including those that are difficult to determine directly. This provides a practical framework for extending chemical characterisation to stars for which detailed abundance determinations are challenging.
Type