|
Aranis, A., de la Cruz, R., Montenegro, C., Ramirez, M., Caballero, L., Gomez, A., et al. (2022). Meta-Estimation of Araucanian Herring, Strangomera bentincki (Norman, 1936), Biological Indicators in the Central-South Zone of Chile (32 degrees-47 degrees LS). Front. Mar. Sci., 9, 886321.
Abstract: Araucanian herring, Strangomera bentincki, is ecologically and economically important. Its complexity, like that of other pelagic fish, arises from seasonal population changes related to distribution with different spatial dynamics and demographic fractions, subject to strong environmental and fishing exploitation variations. This implies the necessity for a thorough understanding of biological processes, which are interpreted with the help of various activities, and directly or indirectly allow to infer and deliver adequate indicators. These activities facilitate a correct technical analysis and consistent conclusions for resource management and administration. In this context, the present study identified and addressed the need to integrate information on Araucanian herring lengths made available in historical series from commercial fleet fishing and sources such as special monitoring, hydroacoustic cruises, and monitoring during closed seasons. The study focused on methodologies widely used in biostatistics that allow analyzing the feasibility of integrating data from different origins, focused on evaluating the correct management of size structures that vary by origin, sample size, and volumes extracted. We call this tool meta-estimation. It estimates the integration of biological-fishery size indicators that originated mainly from commercial fishing and research fisheries for central-south pelagic fishery with data of catch between January and July 2018.
|
|
|
Blanco, K., Salcidua, S., Orellana, P., Sauma, T., Leon, T., Lopez-Steinmetz, L. C., et al. (2023). Systematic review: fluid biomarkers and machine learning methods to improve the diagnosis from mild cognitive impairment to Alzheimers disease. Alzheimer's Res. Ther., Early Access.
Abstract: Mild cognitive impairment ( AQ1 MCI) is often considered an early stage of dementia, with estimated rates of progression to dementia up to 80�90% after approximately 6 years from the initial diagnosis. Diagnosis of cognitive impairment in dementia is typically based on clinical evaluation, neuropsychological assessments, cerebrospinal fluid (CSF) biomarkers, and neuroimaging. The main goal of diagnosing MCI is to determine its cause, particularly whether it is due to Alzheimer�s disease (AD). However, only a limited percentage of the population has access to etiological confirmation, which has led to the emergence of peripheral fluid biomarkers as a diagnostic tool for dementias, including MCI due to AD. Recent advances in biofluid assays have enabled the use of sophisticated statistical models and multimodal machine learning (ML) algorithms for
the diagnosis of MCI based on fluid biomarkers from CSF, peripheral blood, and saliva, among others. This approach has shown promise for identifying specific causes of MCI, including AD. After a PRISMA analysis, 29 articles revealed a trend
towards using multimodal algorithms that incorporate additional biomarkers such as neuroimaging, neuropsychological tests, and genetic information. Particularly, neuroimaging is commonly used in conjunction with fluid biomarkers for both crosssectional and longitudinal studies. Our systematic review suggests that cost-effective longitudinal multimodal monitoring data, representative of diverse cultural populations and utilizing white-box ML algorithms, could be a valuable contribution to the development of diagnostic models for AD due to MCI. Clinical assessment and biomarkers, together with ML techniques, could prove pivotal in improving diagnostic tools for MCI due to AD.
|
|
|
Celis, P., de la Cruz, R., Fuentes, C., & Gomez, H. W. (2021). Survival and Reliability Analysis with an Epsilon-Positive Family of Distributions with Applications. Symmetry, 13(5), 908.
Abstract: We introduce a new class of distributions called the epsilon-positive family, which can be viewed as generalization of the distributions with positive support. The construction of the epsilon-positive family is motivated by the ideas behind the generation of skew distributions using symmetric kernels. This new class of distributions has as special cases the exponential, Weibull, log-normal, log-logistic and gamma distributions, and it provides an alternative for analyzing reliability and survival data. An interesting feature of the epsilon-positive family is that it can viewed as a finite scale mixture of positive distributions, facilitating the derivation and implementation of EM-type algorithms to obtain maximum likelihood estimates (MLE) with (un)censored data. We illustrate the flexibility of this family to analyze censored and uncensored data using two real examples. One of them was previously discussed in the literature; the second one consists of a new application to model recidivism data of a group of inmates released from the Chilean prisons during 2007. The results show that this new family of distributions has a better performance fitting the data than some common alternatives such as the exponential distribution.
|
|
|
de la Cruz, R., Fuentes, C., & Padilla, O. (2023). A Bayesian Mixture Cure Rate Model for Estimating Short-Term and Long-Term Recidivism. Entropy, 25(1), 56.
Abstract: Mixture cure rate models have been developed to analyze failure time data where a proportion never fails. For such data, standard survival models are usually not appropriate because they do not account for the possibility of non-failure. In this context, mixture cure rate models assume that the studied population is a mixture of susceptible subjects who may experience the event of interest and non-susceptible subjects that will never experience it. More specifically, mixture cure rate models are a class of survival time models in which the probability of an eventual failure is less than one and both the probability of eventual failure and the timing of failure depend (separately) on certain individual characteristics. In this paper, we propose a Bayesian approach to estimate parametric mixture cure rate models with covariates. The probability of eventual failure is estimated using a binary regression model, and the timing of failure is determined using a Weibull distribution. Inference for these models is attained using Markov Chain Monte Carlo methods under the proposed Bayesian framework. Finally, we illustrate the method using data on the return-to-prison time for a sample of prison releases of men convicted of sexual crimes against women in England and Wales and we use mixture cure rate models to investigate the risk factors for long-term and short-term survival of recidivism.
|
|
|
de la Cruz, R., Meza, C., Narria, N., & Fuentes, C. (2022). A Bayesian Change Point Analysis of the USD/CLP Series in Chile from 2018 to 2020: Understanding the Impact of Social Protests and the COVID-19 Pandemic. Mathematics, 10(18), 3380.
Abstract: Exchange rates are determined by factors such as interest rates, political stability, confidence, the current account on balance of payments, government intervention, economic growth and relative inflation rates, among other variables. In October 2019, an increased climate of citizen discontent with current social policies resulted in a series of massive protests that ignited important political changes in Chile. This event along with the global COVID-19 pandemic were two major factors that affected the value of the US dollar and produced sudden changes in the typically stable USD/CLP (Chilean Peso) exchange rate. In this paper, we use a Bayesian approach to detect and locate change points in the currency exchange rate process in order to identify and relate these points with the important dates related to the events described above. The implemented method can successfully detect the onset of the social protests, the beginning of the COVID-19 pandemic in Chile and the economic reactivation in the US and Europe. In addition, we evaluate the performance of the proposed MCMC algorithms using a simulation study implemented in Python and R.
|
|
|
de la Cruz, R., Padilla, O., Valle, M. A., & Ruz, G. A. (2021). Modeling Recidivism through Bayesian Regression Models and Deep Neural Networks. Mathematics, 9(6), 639.
Abstract: This study aims to analyze and explore criminal recidivism with different modeling strategies: one based on an explanation of the phenomenon and another based on a prediction task. We compared three common statistical approaches for modeling recidivism: the logistic regression model, the Cox regression model, and the cure rate model. The parameters of these models were estimated from a Bayesian point of view. Additionally, for prediction purposes, we compared the Cox proportional model, a random survival forest, and a deep neural network. To conduct this study, we used a real dataset that corresponds to a cohort of individuals which consisted of men convicted of sexual crimes against women in 1973 in England and Wales. The results show that the logistic regression model tends to give more precise estimations of the probabilities of recidivism both globally and with the subgroups considered, but at the expense of running a model for each moment of the time that is of interest. The cure rate model with a relatively simple distribution, such as Weibull, provides acceptable estimations, and these tend to be better with longer follow-up periods. The Cox regression model can provide the most biased estimations with certain subgroups. The prediction results show the deep neural network's superiority compared to the Cox proportional model and the random survival forest.
|
|
|
de la Cruz, R., Salinas, H. S., & Meza, C. (2022). Reliability Estimation for Stress-Strength Model Based on Unit-Half-Normal Distribution. Symmetry, 14(4), 837.
Abstract: Many lifetime distribution models have successfully served as population models for risk analysis and reliability mechanisms. We propose a novel estimation procedure of stress-strength reliability in the case of two independent unit-half-normal distributions can fit asymmetrical data with either positive or negative skew, with different shape parameters. We obtain the maximum likelihood estimator of the reliability, its asymptotic distribution, and exact and asymptotic confidence intervals. In addition, confidence intervals of model parameters are constructed by using bootstrap techniques. We study the performance of the estimators based on Monte Carlo simulations, the mean squared error, average bias and length, and coverage probabilities. Finally, we apply the proposed reliability model in data analysis of burr measurements on the iron sheets.
|
|
|
Gaskins, J. T., Fuentes, C., & De la Cruz, R. (2022). A Bayesian nonparametric model for classification of longitudinal profiles. Biostatistics, Early Access.
Abstract: Across several medical fields, developing an approach for disease classification is an important challenge. The usual procedure is to fit a model for the longitudinal response in the healthy population, a different model for the longitudinal response in the diseased population, and then apply Bayes' theorem to obtain disease probabilities given the responses. Unfortunately, when substantial heterogeneity exists within each population, this type of Bayes classification may perform poorly. In this article, we develop a new approach by fitting a Bayesian nonparametric model for the joint outcome of disease status and longitudinal response, and then we perform classification through the clustering induced by the Dirichlet process. This approach is highly flexible and allows for multiple subpopulations of healthy, diseased, and possibly mixed membership. In addition, we introduce an Markov chain Monte Carlo sampling scheme that facilitates the assessment of the inference and prediction capabilities of our model. Finally, we demonstrate the method by predicting pregnancy outcomes using longitudinal profiles on the human chorionic gonadotropin beta subunit hormone levels in a sample of Chilean women being treated with assisted reproductive therapy.
|
|
|
Hernandez-Rocha, C., Chahuan, J., Uslar, T., Salas, R., Sepúlveda, I., Pavez, C., et al. (2024). Relative survival and cause-specific mortality of a Chilean Inflammatory Bowel Disease cohort. In Journal of Crohns and Colitis (Vol. 18, p. I2016).
|
|
|
Marquez, M., Meza, C., Lee, D. J., & De la Cruz, R. (2023). Classification of longitudinal profiles using semi-parametric nonlinear mixed models with P-Splines and the SAEM algorithm. Stat. Med., Early Access.
Abstract: In this work, we propose an extension of a semiparametric nonlinear mixed-effects model for longitudinal data that incorporates more flexibility with penalized splines (P-splines) as smooth terms. The novelty of the proposed approach consists of the formulation of the model within the stochastic approximation version of the EM algorithm for maximum likelihood, the so-called SAEM algorithm. The proposed approach takes advantage of the formulation of a P-spline as a mixed-effects model and the use of the computational advantages of the existing software for the SAEM algorithm for the estimation of the random effects and the variance components. Additionally, we developed a supervised classification method for these non-linear mixed models using an adaptive importance sampling scheme. To illustrate our proposal, we consider two studies on pregnant women where two biomarkers are used as indicators of changes during pregnancy. In both studies, information about the women's pregnancy outcomes is known. Our proposal provides a unified framework for the classification of longitudinal profiles that may have important implications for the early detection and monitoring of pregnancy-related changes and contribute to improved maternal and fetal health outcomes. We show that the proposed models improve the analysis of this type of data compared to previous studies. These improvements are reflected both in the fit of the models and in the classification of the groups.
|
|
|
Ruiz, E., Yushimito, W. F., Aburto, L., & de la Cruz, R. (2024). Predicting passenger satisfaction in public transportation using machine learning models. Transp. Res. A Policy Pract., 181, 103995.
Abstract: Enhancing the understanding of passenger satisfaction in public transportation is crucial for operators to refine transit services and to establish and elevate quality standards. While many researchers have tackled this issue using diverse tools and methods, the prevalent approach involves surveys with discrete choice models or structural equations. However, a common limitation of these models lies in their inherent assumptions and predefined relationships between dependent and independent variables. To address these limitations, we introduce a novel perspective by harnessing machine learning (ML) models to gauge and predict passenger satisfaction. ML models are advantageous when dealing with complex, non-linear relationships and massive datasets, and do not rely on predefined assumptions. Thus, in this paper, we evaluate four ML models for the prediction of ratings of the quality of transit service. These models were calibrated using data from the Transantiago bus system in Chile. Among the ML models, the Random Forest model emerges as the most effective, showcasing its ability to analyze and predict passengers' satisfaction levels. We delve deeper into its capabilities by examining the impact of three pivotal variables on passengers' score ratings: waiting time, bus occupation, and bus speed. The Random Forest model is able to capture threshold values for these variables that significantly influence or have no effect on passenger preferences.
|
|