Carrasco-Davis, R., Reyes, E., Valenzuela, C., Forster, F., Estevez, P. A., Pignata, G., et al. (2021). Alert Classification for the ALeRCE Broker System: The Real-time Stamp Classifier. Astron. J., 162(6), 231.
Abstract: We present a real-time stamp classifier of astronomical events for the Automatic Learning for the Rapid Classification of Events broker, ALeRCE. The classifier is based on a convolutional neural network, trained on alerts ingested from the Zwicky Transient Facility (ZTF). Using only the science, reference, and difference images of the first detection as inputs, along with the metadata of the alert as features, the classifier is able to correctly classify alerts from active galactic nuclei, supernovae (SNe), variable stars, asteroids, and bogus classes, with high accuracy (similar to 94%) in a balanced test set. In order to find and analyze SN candidates selected by our classifier from the ZTF alert stream, we designed and deployed a visualization tool called SN Hunter, where relevant information about each possible SN is displayed for the experts to choose among candidates to report to the Transient Name Server database. From 2019 June 26 to 2021 February 28, we have reported 6846 SN candidates to date (11.8 candidates per day on average), of which 971 have been confirmed spectroscopically. Our ability to report objects using only a single detection means that 70% of the reported SNe occurred within one day after the first detection. ALeRCE has only reported candidates not otherwise detected or selected by other groups, therefore adding new early transients to the bulk of objects available for early follow-up. Our work represents an important milestone toward rapid alert classifications with the next generation of large etendue telescopes, such as the Vera C. Rubin Observatory.
Elorrieta, F., Eyheramendy, S., & Palma, W. (2019). Discrete-time autoregressive model for unequally spaced time-series observations. Astron. Astrophys., 627, 11 pp.
Abstract: Most time-series models assume that the data come from observations that are equally spaced in time. However, this assumption does not hold in many diverse scientific fields, such as astronomy, finance, and climatology, among others. There are some techniques that fit unequally spaced time series, such as the continuous-time autoregressive moving average (CARMA) processes. These models are defined as the solution of a stochastic differential equation. It is not uncommon in astronomical time series, that the time gaps between observations are large. Therefore, an alternative suitable approach to modeling astronomical time series with large gaps between observations should be based on the solution of a difference equation of a discrete process. In this work we propose a novel model to fit irregular time series called the complex irregular autoregressive (CIAR) model that is represented directly as a discrete-time process. We show that the model is weakly stationary and that it can be represented as a state-space system, allowing efficient maximum likelihood estimation based on the Kalman recursions. Furthermore, we show via Monte Carlo simulations that the finite sample performance of the parameter estimation is accurate. The proposed methodology is applied to light curves from periodic variable stars, illustrating how the model can be implemented to detect poor adjustment of the harmonic model. This can occur when the period has not been accurately estimated or when the variable stars are multiperiodic. Last, we show how the CIAR model, through its state space representation, allows unobserved measurements to be forecast.
Elorrieta, F., Eyheramendy, S., Palma, W., & Ojeda, C. (2021). A novel bivariate autoregressive model for predicting and forecasting irregularly observed time series. Mon. Not. Roy. Astron. Soc., 505(1), 1105–1116.
Abstract: In several disciplines, it is common to find time series measured at irregular observational times. In particular, in astronomy there are a large number of surveys that gather information over irregular time gaps and in more than one passband. Some examples are Pan-STARRS, ZTF, and also the LSST. However, current commonly used time series models that estimate the time dependence in astronomical light curves consider the information of each band separately (e.g, CIAR, IAR, and CARMA models) disregarding the dependence that might exist between different passbands. In this paper, we propose a novel bivariate model for irregularly sampled time series, called the Bivariate Irregular Autoregressive (BIAR) model. The BIAR model assumes an autoregressive structure on each time series; it is stationary, and it allows to estimate the autocorrelation, the cross-correlation and the contemporary correlation between two unequally spaced time series. We implemented the BIAR model on light curves, in the g and r bands, obtained from the ZTF alerts processed by the ALeRCE broker. We show that if the light curves of the two bands are highly correlated, the model has more accurate forecast and prediction using the bivariate model than a similar method that uses only univariate information. Further, the estimated parameters of the BIAR are useful to characterize long-period variable stars and to distinguish between classes of stochastic objects, providing promising features that can be used for classification purposes.
Eyheramendy, S., Saa, P. A., Undurraga, E. A., Valencia, C., Lopez, C., Mendez, L., et al. (2021). Screening of COVID-19 cases through a Bayesian network symptoms model and psychophysical olfactory test. iScience, 24(12), 103419.
Abstract: The sudden loss of smell is among the earliest and most prevalent symptoms of COVID-19 when measured with a clinical psychophysical test. Research has shown the potential impact of frequent screening for olfactory dysfunction, but existing tests are expensive and time consuming. We developed a low-cost ($0.50/test) rapid psychophysical olfactory test (KOR) for frequent testing and a model-based COVID-19 screening framework using a Bayes Network symptoms model. We trained and validated the model on two samples: suspected COVID-19 cases in five healthcare centers (n = 926; 33% prevalence, 309 RT-PCR confirmed) and healthy miners (n = 1,365; 1.1% prevalence, 15 RT-PCR confirmed). The model predicted COVID-19 status with 76% and 96% accuracy in the healthcare and miners samples, respectively (healthcare: AUC = 0.79 [0.75-0.82], sensitivity: 59%, specificity: 87%; miners: AUC = 0.71 [0.63-0.79], sensitivity: 40%, specificity: 97%, at 0.50 infection probability threshold). Our results highlight the potential for low-cost, frequent, accessible, routine COVID-19 testing to support society's reopening.
Forster, F., Cabrera-Vives, G., Castillo-Navarrete, E., Estevez, P. A., Sanchez-Saez, P., Arredondo, J., et al. (2021). The Automatic Learning for the Rapid Classification of Events (ALeRCE) Alert Broker. Astron. J., 161(5), 242.
Abstract: We introduce the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker, an astronomical alert broker designed to provide a rapid and self-consistent classification of large etendue telescope alert streams, such as that provided by the Zwicky Transient Facility (ZTF) and, in the future, the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST). ALeRCE is a Chilean-led broker run by an interdisciplinary team of astronomers and engineers working to become intermediaries between survey and follow-up facilities. ALeRCE uses a pipeline that includes the real-time ingestion, aggregation, cross-matching, machine-learning (ML) classification, and visualization of the ZTF alert stream. We use two classifiers: a stamp-based classifier, designed for rapid classification, and a light curve-based classifier, which uses the multiband flux evolution to achieve a more refined classification. We describe in detail our pipeline, data products, tools, and services, which are made public for the community (see ). Since we began operating our real-time ML classification of the ZTF alert stream in early 2019, we have grown a large community of active users around the globe. We describe our results to date, including the real-time processing of 1.5 x 10(8) alerts, the stamp classification of 3.4 x 10(7) objects, the light-curve classification of 1.1 x 10(6) objects, the report of 6162 supernova candidates, and different experiments using LSST-like alert streams. Finally, we discuss the challenges ahead in going from a single stream of alerts such as ZTF to a multistream ecosystem dominated by LSST.
Ko, Y., Peng, E. W., Cote, P., Ferrarese, L., Liu, C. Z., Longobardi, A., et al. (2022). The Next Generation Virgo Cluster Survey. XXXIII. Stellar Population Gradients in the Virgo Cluster Core Globular Cluster System. Astrophys. J., 931(2), 120.
Abstract: We present a study of the stellar populations of globular clusters (GCs) in the Virgo Cluster core with a homogeneous spectroscopic catalog of 692 GCs within a major-axis distance R (maj) = 840 kpc from M87. We investigate radial and azimuthal variations in the mean age, total metallicity, [Fe/H], and alpha-element abundance of blue (metal-poor) and red (metal-rich) GCs using their co-added spectra. We find that the blue GCs have a steep radial gradient in [Z/H] within R (maj) = 165 kpc, with roughly equal contributions from [Fe/H] and [alpha/Fe], and flat gradients beyond. By contrast, the red GCs show a much shallower gradient in [Z/H], which is entirely driven by [Fe/H]. We use GC-tagged Illustris simulations to demonstrate an accretion scenario where more massive satellites (with more metal- and alpha-rich GCs) sink further into the central galaxy than less massive ones, and where the gradient flattening occurs because of the low GC occupation fraction of low-mass dwarfs disrupted at larger distances. The dense environment around M87 may also cause the steep [alpha/Fe] gradient of the blue GCs, mirroring what is seen in the dwarf galaxy population. The progenitors of red GCs have a narrower mass range than those of blue GCs, which makes their gradients shallower. We also explore spatial inhomogeneity in GC abundances, finding that the red GCs to the northwest of M87 are slightly more metal-rich. Future observations of GC stellar population gradients will be useful diagnostics of halo merger histories.
Lardone, M. C., Busch, A. S., Santos, J. L., Miranda, P., Eyheramendy, S., Pereira, A., et al. (2020). A Polygenic Risk Score Suggests Shared Genetic Architecture of Voice Break With Early Markers of Pubertal Onset in Boys. J. Clin. Endocrinol. Metab., 105(3), E349–E357.
Abstract: Context: Voice break, as a landmark of advanced male puberty in genome-wide association studies (GWAS), has revealed that pubertal timing is a highly polygenic trait. Although voice break is easily recorded in large cohorts, it holds quite low precision as a marker of puberty. In contrast, gonadarche and pubarche are early and clinically well-defined measures of puberty onset. Objective: To determine whether a polygenic risk score (PRS) of alleles that confer risk for voice break associates with age at gonadarche (AAG) and age at pubarche (AAP) in Chilean boys. Experimental Design: Longitudinal study. Subjects and Methods: 401 boys from the Growth and Obesity Chilean Cohort Study (n = 1194; 49.2% boys). Main Outcome Measures: Biannual clinical pubertal staging including orchidometry. AAG and AAP were estimated by censoring methods. Genotyping was performed using the Multi-Ethnic Global Array (Illumina). Using GWAS summary statistics from the UK-Biobank, 29 significant and independent single nucleotide polymorphisms associated with age at voice break were extracted. Individual PRS were computed as the sum of risk alleles weighted by the effect size. Results: The PRS was associated with AAG (beta=0.01, P = 0.04) and AAP (beta=0.185, P = 0.0004). In addition, boys within the 20% highest PRS experienced gonadarche and pubarche 0.55 and 0.67 years later than those in the lowest 20%, respectively (P = 0.013 and P = 0.007). Conclusions: Genetic variants identified in large GWAS on age at VB significantly associate with age at testicular growth and pubic hair development, suggesting that these events share a genetic architecture across ethnically distinct populations.
Sanchez-Saez, P., Lira, H., Marti, L., Sanchez-Pi, N., Arredondo, J., Bauer, F. E., et al. (2021). Searching for Changing-state AGNs in Massive Data Sets. I. Applying Deep Learning and Anomaly-detection Techniques to Find AGNs with Anomalous Variability Behaviors. Astron. J., 162(5), 206.
Abstract: The classic classification scheme for active galactic nuclei (AGNs) was recently challenged by the discovery of the so-called changing-state (changing-look) AGNs. The physical mechanism behind this phenomenon is still a matter of open debate and the samples are too small and of serendipitous nature to provide robust answers. In order to tackle this problem, we need to design methods that are able to detect AGNs right in the act of changing state. Here we present an anomaly-detection technique designed to identify AGN light curves with anomalous behaviors in massive data sets. The main aim of this technique is to identify CSAGN at different stages of the transition, but it can also be used for more general purposes, such as cleaning massive data sets for AGN variability analyses. We used light curves from the Zwicky Transient Facility data release 5 (ZTF DR5), containing a sample of 230,451 AGNs of different classes. The ZTF DR5 light curves were modeled with a Variational Recurrent Autoencoder (VRAE) architecture, that allowed us to obtain a set of attributes from the VRAE latent space that describes the general behavior of our sample. These attributes were then used as features for an Isolation Forest (IF) algorithm that is an anomaly detector for a “one class” kind of problem. We used the VRAE reconstruction errors and the IF anomaly score to select a sample of 8809 anomalies. These anomalies are dominated by bogus candidates, but we were able to identify 75 promising CSAGN candidates.
Sanchez-Saez, P., Reyes, I., Valenzuela, C., Forster, F., Eyheramendy, S., Elorrieta, F., et al. (2021). Alert Classification for the ALeRCE Broker System: The Light Curve Classifier. Astron. J., 161(3), 141.
Abstract: We present the first version of the Automatic Learning for the Rapid Classification of Events (ALeRCE) broker light curve classifier. ALeRCE is currently processing the Zwicky Transient Facility (ZTF) alert stream, in preparation for the Vera C. Rubin Observatory. The ALeRCE light curve classifier uses variability features computed from the ZTF alert stream and colors obtained from AllWISE and ZTF photometry. We apply a balanced random forest algorithm with a two-level scheme where the top level classifies each source as periodic, stochastic, or transient, and the bottom level further resolves each of these hierarchical classes among 15 total classes. This classifier corresponds to the first attempt to classify multiple classes of stochastic variables (including core- and host-dominated active galactic nuclei, blazars, young stellar objects, and cataclysmic variables) in addition to different classes of periodic and transient sources, using real data. We created a labeled set using various public catalogs (such as the Catalina Surveys and Gaia DR2 variable stars catalogs, and the Million Quasars catalog), and we classify all objects with >= 6 g-band or >= 6 r-band detections in ZTF (868,371 sources as of 2020 June 9), providing updated classifications for sources with new alerts every day. For the top level we obtain macro-averaged precision and recall scores of 0.96 and 0.99, respectively, and for the bottom level we obtain macro-averaged precision and recall scores of 0.57 and 0.76, respectively. Updated classifications from the light curve classifier can be found at the ALeRCE Explorer website (
Vicuna, L., Barrientos, E., Norambuena, T., Alvares, D., Gana, J. C., Leiva-Yamaguchi, V., et al. (2023). New insights from GWAS on BMI-related growth traits in a longitudinal cohort of admixed children with Native American and European ancestry. iScience, 26(2), 106091.
Abstract: Body-mass index (BMI) is a hallmark of adiposity. In contrast with adulthood, the genetic architecture of BMI during childhood is poorly understood. The few genome-wide association studies (GWAS) on children have been performed almost exclusively in Europeans and at single ages. We performed cross-sectional and longitudinal GWAS for BMI-related traits on 904 admixed children with mostly Mapuche Native American and European ancestries. We found regulatory variants of the immune gene HLA-DQB3 strongly associated with BMI at 1.5 – 2.5 years old. A variant in the sex-determining gene DMRT1 was associated with the age at adiposity rebound (Age-AR) in girls (P = 9.8 x 10(-9)). BMI was significantly higher in Mapuche than in Europeans between 5.5 and 16.5 years old. Finally, Ag
Vicuna, L., Fernandez, M. I., Vial, C., Valdebenito, P., Chaparro, E., Espinoza, K., et al. (2019). Adaptation to Extreme Environments in an Admixed Human Population from the Atacama Desert. Genome Biol. Evol., 11(9), 2468–2479.
Abstract: Inorganic arsenic (As) is a toxic xenobiotic and carcinogen associated with severe health conditions. The urban population from the Atacama Desert in northern Chile was exposed to extremely high As levels (up to 600 μmg/l) in drinking water between 1958 and 1971, leading to increased incidence of urinary bladder cancer (BC), skin cancer, kidney cancer, and coronary thrombosis decades later. Besides, the Andean Native-American ancestors of the Atacama population were previously exposed for millennia to elevated As levels in water (similar to 120 μg/l) for at least 5,000 years, suggesting adaptation to this selective pressure. Here, we performed two genome-wide selection tests-PBSn1 and an ancestry-enrichment test-in an admixed population from Atacama, to identify adaptation signatures to As exposure acquired before and after admixture with Europeans, respectively. The top second variant selected by PBSn1 was associated with LCE4A-C1orf68, a gene that may be involved in the immune barrier of the epithelium during BC. We performed association tests between the top PBSn1 hits and BC occurrence in our population. The strongest association (P = 0.012) was achieved by the LCE4A-C1orf68 variant. The ancestry-enrichment test detected highly significant signals (P = 1.3 x 10(-9)) mapping MAK16, a gene with important roles in ribosome biogenesis during the G1 phase of the cell cycle. Our results contribute to a better understanding of the genetic factors involved in adaptation to the pathophysiological consequences of As exposure.
Vicuna, L., Klimenkova, O., Norambuena, T., Martinez, F. I., Fernandez, M. I., Shchur, V., et al. (2020). Postadmixture Selection on Chileans Targets Haplotype Involved in Pigmentation, Thermogenesis and Immune Defense against Pathogens. Genome Biol. Evol., 12(8), 1459–1470.
Abstract: Detection of positive selection signatures in populations around the world is helping to uncover recent human evolutionary history as well as the genetic basis of diseases. Most human evolutionary genomic studies have been performed in European, African, and Asian populations. However, populations with Native American ancestry have been largely underrepresented. Here, we used a genome-wide local ancestry enrichment approach complemented with neutral simulations to identify postadmixture adaptations underwent by admixed Chileans through gene flow from Europeans into local Native Americans. The top significant hits (P=2.4x10(-7)) are variants in a region on chromosome 12 comprising multiple regulatory elements. This region includes rs12821256, which regulates the expression of KITLG, a well-known gene involved in lighter hair and skin pigmentation in Europeans as well as in thermogenesis. Another variant from that region is associated with the long noncoding RNA RP11-13A1.1, which has been specifically involved in the innate immune response against infectious pathogens. Our results suggest that these genes were relevant for adaptation in Chileans following the Columbian exchange.
Vicuna, L., Norambuena, T., Miranda, J. P., Pereira, A., Mericq, V., Ongaro, L., et al. (2021). Novel loci and mapuche genetic ancestry are associated with pubertal growth traits in Chilean boys. Hum. Genet., 140(12), 1651–1661.
Abstract: Puberty is a complex developmental process that varies considerably among individuals and populations. Genetic factors explain a large proportion of the variability of several pubertal traits. Recent genome-wide association studies (GWAS) have identified hundreds of variants involved in traits that result from body growth, like adult height. However, they do not capture many genetic loci involved in growth changes over distinct growth phases. Further, such GWAS have been mostly performed in Europeans, but we do not know how these findings relate to other continental populations. In this study, we analyzed the genetic basis of three pubertal traits; namely, peak height velocity (PV), age at PV (APV) and height at APV (HAPV). We analyzed a cohort of 904 admixed Chilean children and adolescents with European and Mapuche Native American ancestries. Height was measured on roughly a 6-month basis from childhood to adolescence between 2006 and 2019. We predict that the difference in HAPV between an European and a Mapuche adolescent is 4.3 cm higher in the European (P = 0.042) and APV is 0.73 years later for the European compared with the Mapuche adolescent on average (P = 0.023). Further, by performing a GWAS on 774, 433 single-nucleotide polymorphisms, we identified a genetic signal harboring 3 linked variants significantly associated with PV in boys (P < 5 x 10(-8)). This signal has never been associated with growth-related traits.