|
Aranis, A., de la Cruz, R., Montenegro, C., Ramirez, M., Caballero, L., Gomez, A., et al. (2022). Meta-Estimation of Araucanian Herring, Strangomera bentincki (Norman, 1936), Biological Indicators in the Central-South Zone of Chile (32 degrees-47 degrees LS). Front. Mar. Sci., 9, 886321.
Abstract: Araucanian herring, Strangomera bentincki, is ecologically and economically important. Its complexity, like that of other pelagic fish, arises from seasonal population changes related to distribution with different spatial dynamics and demographic fractions, subject to strong environmental and fishing exploitation variations. This implies the necessity for a thorough understanding of biological processes, which are interpreted with the help of various activities, and directly or indirectly allow to infer and deliver adequate indicators. These activities facilitate a correct technical analysis and consistent conclusions for resource management and administration. In this context, the present study identified and addressed the need to integrate information on Araucanian herring lengths made available in historical series from commercial fleet fishing and sources such as special monitoring, hydroacoustic cruises, and monitoring during closed seasons. The study focused on methodologies widely used in biostatistics that allow analyzing the feasibility of integrating data from different origins, focused on evaluating the correct management of size structures that vary by origin, sample size, and volumes extracted. We call this tool meta-estimation. It estimates the integration of biological-fishery size indicators that originated mainly from commercial fishing and research fisheries for central-south pelagic fishery with data of catch between January and July 2018.
|
|
|
Araya, H., Bahamonde, N., Fermin, L., Roa, T., & Torres, S. (2023). ON THE CONSISTENCY OF LEAST SQUARES ESTIMATOR IN MODELS SAMPLED AT RANDOM TIMES DRIVEN BY LONG MEMORY NOISE: THE JITTERED CASE. Stat. Sin., 33(1), 331–351.
Abstract: In numerous applications, data are observed at random times. Our main purpose is to study a model observed at random times that incorporates a longmemory noise process with a fractional Brownian Hurst exponent H. We propose a least squares estimator in a linear regression model with long-memory noise and a random sampling time called “jittered sampling”. Specifically, there is a fixed sampling rate 1/N, contaminated by an additive noise (the jitter) and governed by a probability density function supported in [0, 1/N]. The strong consistency of the estimator is established, with a convergence rate depending on N and the Hurst exponent. A Monte Carlo analysis supports the relevance of the theory and produces additional insights, with several levels of long-range dependence (varying the Hurst index) and two different jitter densities.
|
|
|
Araya, H., Bahamonde, N., Fermin, L., Roa, T., & Torres, S. (2023). ON THE CONSISTENCY OF THE LEAST SQUARES ESTIMATOR IN MODELS SAMPLED AT RANDOM TIMES DRIVEN BY LONG MEMORY NOISE: THE RENEWAL CASE. Stat. Sin., 33(1), 1–26.
Abstract: In this study, we prove the strong consistency of the least squares estimator in a random sampled linear regression model with long-memory noise and an independent set of random times given by renewal process sampling. Additionally, we illustrate how to work with a random number of observations up to time T = 1. A simulation study is provided to illustrate the behavior of the different terms, as well as the performance of the estimator under various values of the Hurst parameter H.
|
|
|
de la Cruz, R., Padilla, O., Valle, M. A., & Ruz, G. A. (2021). Modeling Recidivism through Bayesian Regression Models and Deep Neural Networks. Mathematics, 9(6), 639.
Abstract: This study aims to analyze and explore criminal recidivism with different modeling strategies: one based on an explanation of the phenomenon and another based on a prediction task. We compared three common statistical approaches for modeling recidivism: the logistic regression model, the Cox regression model, and the cure rate model. The parameters of these models were estimated from a Bayesian point of view. Additionally, for prediction purposes, we compared the Cox proportional model, a random survival forest, and a deep neural network. To conduct this study, we used a real dataset that corresponds to a cohort of individuals which consisted of men convicted of sexual crimes against women in 1973 in England and Wales. The results show that the logistic regression model tends to give more precise estimations of the probabilities of recidivism both globally and with the subgroups considered, but at the expense of running a model for each moment of the time that is of interest. The cure rate model with a relatively simple distribution, such as Weibull, provides acceptable estimations, and these tend to be better with longer follow-up periods. The Cox regression model can provide the most biased estimations with certain subgroups. The prediction results show the deep neural network's superiority compared to the Cox proportional model and the random survival forest.
|
|
|
Henriquez, P. A., & Ruz, G. A. (2017). Extreme learning machine with a deterministic assignment of hidden weights in two parallel layers. Neurocomputing, 226, 109–116.
Abstract: Extreme learning machine (ELM) is a machine learning technique based on competitive single-hidden layer feedforward neural network (SLFN). However, traclitional ELM and its variants are only based on random assignment of hidden weights using a uniform distribution, and then the calculation of the weights output using the least-squares method. This paper proposes a new architecture based on a non-linear layer in parallel by another non-linear layer and with entries of independent weights. We explore the use of a deterministic assignment of the hidden weight values using low-discrepancy sequences (LDSs). The simulations are performed with Halton and Sobol sequences. The results for regression and classification problems confirm the advantages of using the proposed method called PL-ELM algorithm with the deterministic assignment of hidden weights. Moreover, the PL-ELM algorithm with the deterministic generation using LDSs can be extended to other modified ELM algorithms.
|
|
|
Henriquez, P. A., & Ruz, G. A. (2018). A non-iterative method for pruning hidden neurons in neural networks with random weights. Appl. Soft. Comput., 70, 1109–1121.
Abstract: Neural networks with random weights have the advantage of fast computational time in both training and testing. However, one of the main challenges of single layer feedforward neural networks is the selection of the optimal number of neurons in the hidden layer, since few/many neurons lead to problems of underfitting/overfitting. Adapting Garson's algorithm, this paper introduces a new efficient and fast non-iterative algorithm for the selection of neurons in the hidden layer for randomization based neural networks. The proposed approach is divided into three steps: (1) train the network with h hidden neurons, (2) apply Garson's algorithm to the matrix of the hidden layer, and (3) perform pruning reducing hidden layer neurons based on the harmonic mean. Our experiments in regression and classification problems confirmed that the combination of the pruning technique with these types of neural networks improved their predictive performance in terms of mean square error and accuracy. Additionally, we tested our proposed pruning method with neural networks trained under sequential learning algorithms, where Random Vector Functional Link obtained, in general, the best predictive performance compared to online sequential versions of extreme learning machines and single hidden layer neural network with random weights. (C) 2018 Elsevier B.V. All rights reserved.
|
|
|
Henriquez, P. A., & Ruz, G. A. (2019). Noise reduction for near-infrared spectroscopy data using extreme learning machines. Eng. Appl. Artif. Intell., 79, 13–22.
Abstract: The near infrared (NIR) spectra technique is an effective approach to predict chemical properties and it is typically applied in petrochemical, agricultural, medical, and environmental sectors. NIR spectra are usually of very high dimensions and contain huge amounts of information. Most of the information is irrelevant to the target problem and some is simply noise. Thus, it is not an easy task to discover the relationship between NIR spectra and the predictive variable. However, this kind of regression analysis is one of the main topics of machine learning. Thus machine learning techniques play a key role in NIR based analytical approaches. Pre-processing of NIR spectral data has become an integral part of chemometrics modeling. The objective of the pre-processing is to remove physical phenomena (noise) in the spectra in order to improve the regression or classification model. In this work, we propose to reduce the noise using extreme learning machines which have shown good predictive performances in regression applications as well as in large dataset classification tasks. For this, we use a novel algorithm called C-PL-ELM, which has an architecture in parallel based on a non-linear layer in parallel with another non-linear layer. Using the soft margin loss function concept, we incorporate two Lagrange multipliers with the objective of including the noise of spectral data. Six real-life dataset were analyzed to illustrate the performance of the developed models. The results for regression and classification problems confirm the advantages of using the proposed method in terms of root mean square error and accuracy.
|
|
|
Hughes, S., Moreno, S., Yushimito, W. F., & Huerta-Canepa, G. (2019). Evaluation of machine learning methodologies to predict stop delivery times from GPS data. Transp. Res. Pt. C-Emerg. Technol., 109, 289–304.
Abstract: In last mile distribution, logistics companies typically arrange and plan their routes based on broad estimates of stop delivery times (i.e., the time spent at each stop to deliver goods to final receivers). If these estimates are not accurate, the level of service is degraded, as the promised time window may not be satisfied. The purpose of this work is to assess the feasibility of machine learning techniques to predict stop delivery times. This is done by testing a wide range of machine learning techniques (including different types of ensembles) to (1) predict the stop delivery time and (2) to determine whether the total stop delivery time will exceed a predefined time threshold (classification approach). For the assessment, all models are trained using information generated from GPS data collected in Medellin, Colombia and compared to hazard duration models. The results are threefold. First, the assessment shows that regression-based machine learning approaches are not better than conventional hazard duration models concerning absolute errors of the prediction of the stop delivery times. Second, when the problem is addressed by a classification scheme in which the prediction is aimed to guide whether a stop time will exceed a predefined time, a basic K-nearest-neighbor model outperforms hazard duration models and other machine learning techniques both in accuracy and F-1 score (harmonic mean between precision and recall). Third, the prediction of the exact duration can be improved by combining the classifiers and prediction models or hazard duration models in a two level scheme (first classification then prediction). However, the improvement depends largely on the correct classification (first level).
|
|
|
Munoz-Herrera, S., & Suchan, K. (2022). Constrained Fitness Landscape Analysis of Capacitated Vehicle Routing Problems. Entropy, 24(1), 53.
Abstract: Vehicle Routing Problems (VRP) comprise many variants obtained by adding to the original problem constraints representing diverse system characteristics. Different variants are widely studied in the literature; however, the impact that these constraints have on the structure of the search space associated with the problem is unknown, and so is their influence on the performance of search algorithms used to solve it. This article explores how assignation constraints (such as a limited vehicle capacity) impact VRP by disturbing the network structure defined by the solution space and the local operators in use. This research focuses on Fitness Landscape Analysis for the multiple Traveling Salesman Problem (m-TSP) and Capacitated VRP (CVRP). We propose a new Fitness Landscape Analysis measure that provides valuable information to characterize the fitness landscape's structure under specific scenarios and obtain several relationships between the fitness landscape's structure and the algorithmic performance.
|
|
|
Simon, F., Ordonez, J., Reddy, T. A., Girard, A., & Muneer, T. (2016). Developing multiple regression models from the manufacturer's ground-source heat pump catalogue data. Renew. Energy, 95, 413–421.
Abstract: The performance of ground-source heat pumps (GSHP), often expressed as Power drawn and/or the COP, depends on several operating parameters. Manufacturers usually publish such data in tables for certain discrete values of the operating fluid temperatures and flow rates conditions. In actual applications, such as in dynamic simulations of heat pump system integrated to buildings, there is a need to determine equipment performance under operating conditions other than those listed. This paper describes a simplified methodology for predicting the performance of GSHPs using multiple regression (MR) models as applicable to manufacturer data. We find that fitting second-order MR models with eight statistically significant x-variables from 36 observations appropriately selected in the manufacturer catalogue can predict the system global behavior with good accuracy. For the three studied GSHPs, the external prediction error of the MR models identified following the methodology are 0.2%, 0.9% and 1% for heating capacity (HC) predictions and 2.6%, 4.9% and 3.2% for COP predictions. No correlation is found between residuals and the response, thus validating the models. The operational approach appears to be a reliable tool to be integrated in dynamic simulation codes, as the method is applicable to any GSHP catalogue data. (C) 2016 Elsevier Ltd. All rights reserved.
|
|
|
Song, J. W., Wei, P. F., Valdebenito, M. A., Faes, M., & Beer, M. (2021). Data-driven and active learning of variance-based sensitivity indices with Bayesian probabilistic integration. Mech. Syst. Sig. Process., 163, 108106.
Abstract: Variance-based sensitivity indices play an important role in scientific computation and data mining, thus the significance of developing numerical methods for efficient and reliable estimation of these sensitivity indices based on (expensive) computer simulators and/or data cannot be emphasized too much. In this article, the estimation of these sensitivity indices is treated as a statistical inference problem. Two principle lemmas are first proposed as rules of thumb for making the inference. After that, the posterior features for all the (partial) variance terms involved in the main and total effect indices are analytically derived (not in closed form) based on Bayesian Probabilistic Integration (BPI). This forms a data-driven method for estimating the sensitivity indices as well as the involved discretization errors. Further, to improve the efficiency of the developed method for expensive simulators, an acquisition function, named Posterior Variance Contribution (PVC), is utilized for realizing optimal designs of experiments, based on which an adaptive BPI method is established. The application of this framework is illustrated for the calculation of the main and total effect indices, but the proposed two principle lemmas also apply to the calculation of interaction effect indices. The performance of the development is demonstrated by an illustrative numerical example and three engineering benchmarks with finite element models.
|
|
|
Vega-Briones, J., de Jong, S., Galleguillos, M., & Wanders, N. (2023). Identifying driving processes of drought recovery in the southern Andes natural catchments. J. Hydrol. Reg. Stud., 47, 101369.
Abstract: Study region The natural river basins of Chile.Study focus Drought effects on terrestrial ecosystems produce hydroclimatic stress with variable ex-tensions. Particularly, hydrological drought duration can provide a better understanding of recovery together with catchment characteristics and climatology. This study focuses on the impacts of the multi-year drought experienced in Chile for more than a decade.The recovery of relevant catchment variables to quantify the drought termination (DT) and drought termination duration (DTD) after the hydrological drought is presented. A composite analysis of natural catchments using the CAMELS-CL data set discharge (1988-2020), k-NDVI (2000-2020), and soil moisture (1991-2020) provides the average response of the recovery after severe droughts.New hydrological insights for the region This study demonstrates that local catchment properties can explain the recovery of studied variables after a hydrological drought.Explanatory variables from CAMELS-CL to derive the DT using random forest regression (RFR) were used with a strong correlation of 0.92, 0.84, and 0.89 for discharge, vegetation productivity, and soil moisture, respectively.The discharge patterns show longer recovery over environments dominated by shrublands with less precipitation and higher temperatures, in central Chile, while higher latitudes with higher vegetation cover, increasing precipitation, and lower temperatures present shorter recovery times. The vegetation productivity shows longer recovery over highly vegetated mountains in central Chile. The soil moisture recovery spatial distribution presented patterns that connect them with the discharge recovery. This work enables the identification of drought vulnerability, which is valuable for managing water resources and ecosystems and is helping to predict drought recovery periods in regions with a lack of observations.
|
|