
Hughes, S., Moreno, S., Yushimito, W. F., & HuertaCanepa, G. (2019). Evaluation of machine learning methodologies to predict stop delivery times from GPS data. Transp. Res. Pt. CEmerg. Technol., 109, 289–304.
Abstract: In last mile distribution, logistics companies typically arrange and plan their routes based on broad estimates of stop delivery times (i.e., the time spent at each stop to deliver goods to final receivers). If these estimates are not accurate, the level of service is degraded, as the promised time window may not be satisfied. The purpose of this work is to assess the feasibility of machine learning techniques to predict stop delivery times. This is done by testing a wide range of machine learning techniques (including different types of ensembles) to (1) predict the stop delivery time and (2) to determine whether the total stop delivery time will exceed a predefined time threshold (classification approach). For the assessment, all models are trained using information generated from GPS data collected in Medellin, Colombia and compared to hazard duration models. The results are threefold. First, the assessment shows that regressionbased machine learning approaches are not better than conventional hazard duration models concerning absolute errors of the prediction of the stop delivery times. Second, when the problem is addressed by a classification scheme in which the prediction is aimed to guide whether a stop time will exceed a predefined time, a basic Knearestneighbor model outperforms hazard duration models and other machine learning techniques both in accuracy and F1 score (harmonic mean between precision and recall). Third, the prediction of the exact duration can be improved by combining the classifiers and prediction models or hazard duration models in a two level scheme (first classification then prediction). However, the improvement depends largely on the correct classification (first level).



Henriquez, P. A., & Ruz, G. A. (2019). Noise reduction for nearinfrared spectroscopy data using extreme learning machines. Eng. Appl. Artif. Intell., 79, 13–22.
Abstract: The near infrared (NIR) spectra technique is an effective approach to predict chemical properties and it is typically applied in petrochemical, agricultural, medical, and environmental sectors. NIR spectra are usually of very high dimensions and contain huge amounts of information. Most of the information is irrelevant to the target problem and some is simply noise. Thus, it is not an easy task to discover the relationship between NIR spectra and the predictive variable. However, this kind of regression analysis is one of the main topics of machine learning. Thus machine learning techniques play a key role in NIR based analytical approaches. Preprocessing of NIR spectral data has become an integral part of chemometrics modeling. The objective of the preprocessing is to remove physical phenomena (noise) in the spectra in order to improve the regression or classification model. In this work, we propose to reduce the noise using extreme learning machines which have shown good predictive performances in regression applications as well as in large dataset classification tasks. For this, we use a novel algorithm called CPLELM, which has an architecture in parallel based on a nonlinear layer in parallel with another nonlinear layer. Using the soft margin loss function concept, we incorporate two Lagrange multipliers with the objective of including the noise of spectral data. Six reallife dataset were analyzed to illustrate the performance of the developed models. The results for regression and classification problems confirm the advantages of using the proposed method in terms of root mean square error and accuracy.



Lobos, F., Goles, E., Ruivo, E. L. P., de Oliveira, P. P. B., & Montealegre, P. (2018). Mining a Class of Decision Problems for Onedimensional Cellular Automata. J. Cell. Autom., 13(56), 393–405.
Abstract: Cellular automata are locally defined, homogeneous dynamical systems, discrete in space, time and state variables. Within the context of onedimensional, binary, cellular automata operating on cyclic configurations of odd length, we consider the general decision problem: if the initial configuration satisfies a given property, the lattice should converge to the fixedpoint of all 1s ((1) over right arrow), or to (0) over right arrow, otherwise. Two problems in this category have been widely studied in the literature, the parity problem [1] and the density classification task [4]. We are interested in determining all cellular automata rules with neighborhood sizes of 2, 3, 4 and 5 cells (i.e., radius r of 0.5, 1, 1.5 and 2.5) that solve decision problems of the previous type. We have demonstrated a theorem that, for any given rule in those spaces, ensures the non existence of fixed points other than (0) over right arrow and (1) over right arrow for configurations of size larger than 2(2r), provided that the rule does not support different fixed points for any configuration with size smaller than or equal to 2(2r). In addition, we have a proposition that ensures the convergence to only (0) over right arrow or (1) over right arrow of any initial configuration, if the rule complies with given conditions. By means of theoretical and computational approaches, we determined that: for the rule spaces defined by radius 0.5 and r = 1, only 1 and 2 rules, respectively, converge to (1) over right arrow or (0) over right arrow, to any initial configuration, and both recognize the same language, and for the rule space defined by radius r = 1.5, 40 rules satisfy this condition and recognize 4 different languages. Finally, for the radius 2 space, out of the 4,294,967,296 different rules, we were able to significantly filter it out, down to 40,941 candidate rules. We hope such an extensive mining should unveil new decision problems of the type widely studied in the literature.



Henriquez, P. A., & Ruz, G. A. (2018). A noniterative method for pruning hidden neurons in neural networks with random weights. Appl. Soft. Comput., 70, 1109–1121.
Abstract: Neural networks with random weights have the advantage of fast computational time in both training and testing. However, one of the main challenges of single layer feedforward neural networks is the selection of the optimal number of neurons in the hidden layer, since few/many neurons lead to problems of underfitting/overfitting. Adapting Garson's algorithm, this paper introduces a new efficient and fast noniterative algorithm for the selection of neurons in the hidden layer for randomization based neural networks. The proposed approach is divided into three steps: (1) train the network with h hidden neurons, (2) apply Garson's algorithm to the matrix of the hidden layer, and (3) perform pruning reducing hidden layer neurons based on the harmonic mean. Our experiments in regression and classification problems confirmed that the combination of the pruning technique with these types of neural networks improved their predictive performance in terms of mean square error and accuracy. Additionally, we tested our proposed pruning method with neural networks trained under sequential learning algorithms, where Random Vector Functional Link obtained, in general, the best predictive performance compared to online sequential versions of extreme learning machines and single hidden layer neural network with random weights. (C) 2018 Elsevier B.V. All rights reserved.



MontalvaMedel, M., de Oliveira, P. P. B., & Goles, E. (2018). A portfolio of classification problems by onedimensional cellular automata, over cyclic binary configurations and parallel update. Nat. Comput., 17(3), 663–671.
Abstract: Decision problems addressed by cellular automata have been historically expressed either as determining whether initial configurations would belong to a given language, or as classifying the initial configurations according to a property in them. Unlike traditional approaches in language recognition, classification problems have typically relied upon cyclic configurations and fully paralell (twoway) update of the cells, which render the action of the cellular automaton relatively less controllable and difficult to analyse. Although the notion of cyclic languages have been studied in the wider realm of formal languages, only recently a more systematic attempt has come into play in respect to cellular automata with fully parallel update. With the goal of contributing to this effort, we propose a unified definition of classification problem for onedimensional, binary cellular automata, from which various known problems are couched in and novel ones are defined, and analyse the solvability of the new problems. Such a unified perspective aims at increasing existing knowledge about classification problems by cellular automata over cyclic configurations and parallel update.



Henriquez, P. A., & Ruz, G. A. (2017). Extreme learning machine with a deterministic assignment of hidden weights in two parallel layers. Neurocomputing, 226, 109–116.
Abstract: Extreme learning machine (ELM) is a machine learning technique based on competitive singlehidden layer feedforward neural network (SLFN). However, traclitional ELM and its variants are only based on random assignment of hidden weights using a uniform distribution, and then the calculation of the weights output using the leastsquares method. This paper proposes a new architecture based on a nonlinear layer in parallel by another nonlinear layer and with entries of independent weights. We explore the use of a deterministic assignment of the hidden weight values using lowdiscrepancy sequences (LDSs). The simulations are performed with Halton and Sobol sequences. The results for regression and classification problems confirm the advantages of using the proposed method called PLELM algorithm with the deterministic assignment of hidden weights. Moreover, the PLELM algorithm with the deterministic generation using LDSs can be extended to other modified ELM algorithms.



Ruz, G. A. (2016). Improving the performance of inductive learning classifiers through the presentation order of the training patterns. Expert Syst. Appl., 58, 1–9.
Abstract: Although the development of new supervised learning algorithms for machine learning techniques are mostly oriented to improve the predictive power or classification accuracy, the capacity to understand how the classification process is carried out is of great interest for many applications in business and industry. Inductive learning algorithms, like the Rules family, induce semantically interpretable classification rules in the form of ifthen rules. Although the effectiveness of the Rules family has been studied thoroughly and new and improved versions are constantly been developed, one important drawback is the effect of the presentation order of the training patterns which has not been studied in depth previously. In this paper this issue is addressed, first by studying empirically the effect of random presentation orders in the number of rules and the generalization power of the resulting classifier. Then a presentation order method for the training examples is proposed which combines a clustering stage with a new density measure developed specifically for this problem. The results using benchmark datasets and a real application of wood defect classification show the effectiveness of the proposed method. Also, since the presentation order method is employed as a preprocessing stage, the simplicity of the Rules family is not affected but instead it enables the generation of fewer and more accurate rules, which can have a direct impact in the performance and usefulness of the Rules family in an expert system context. (C) 2016 Elsevier Ltd. All rights reserved.



Henderson, R. G., Verougstraete, V., Anderson, K., Arbildua, J. J., Brock, T. O., Brouwers, T., et al. (2014). Interlaboratory validation of bioaccessibility testing for metals. Regul. Toxicol. Pharmacol., 70(1), 170–181.
Abstract: Bioelution assays are fast, simple alternatives to in vivo testing. In this study, the intra and interlaboratory variability in bioaccessibility data generated by bioelution tests were evaluated in synthetic fluids relevant to oral, inhalation, and dermal exposure. Using one defined protocol, five laboratories measured metal release from cobalt oxide, cobalt powder, copper concentrate, Inconel alloy, leaded brass alloy, and nickel sulfate hexahydrate. Standard deviations of repeatability (Sr) and reproducibility (SR) were used to evaluate the intra and interlaboratory variability, respectively. Examination of the s(R):s(r) ratios demonstrated that, while gastric and lysosomal fluids had reasonably good reproducibility, other fluids did not show as good concordance between laboratories. Relative standard deviation (RSD) analysis showed more favorable reproducibility outcomes for some data sets; overall results varied more between than withinlaboratories. RSD analysis of s(r) showed good withinlaboratory variability for all conditions except some metals in interstitial fluid. In general, these findings indicate that absolute bioaccessibility results in some biological fluids may vary between different laboratories. However, for most applications, measures of relative bioaccessibility are needed, diminishing the requirement for high interlaboratory reproducibility in absolute metal releases. The interlaboratory exercise suggests that the degrees of freedom within the protocol need to be addressed. (C) 2014 Elsevier Inc. All rights reserved.



Pham, D. T., & Ruz, G. A. (2009). Unsupervised training of Bayesian networks for data clustering. Proc. R. Soc. AMath. Phys. Eng. Sci., 465(2109), 2927–2948.
Abstract: This paper presents a new approach to the unsupervised training of Bayesian network classifiers. Three models have been analysed: the Chow and Liu (CL) multinets; the treeaugmented naive Bayes; and a new model called the simple Bayesian network classifier, which is more robust in its structure learning. To perform the unsupervised training of these models, the classification maximum likelihood criterion is used. The maximization of this criterion is derived for each model under the classification expectationmaximization ( EM) algorithm framework. To test the proposed unsupervised training approach, 10 wellknown benchmark datasets have been used to measure their clustering performance. Also, for comparison, the results for the kmeans and the EM algorithm, as well as those obtained when the three Bayesian network classifiers are trained in a supervised way, are analysed. A realworld image processing application is also presented, dealing with clustering of wood board images described by 165 attributes. Results show that the proposed learning method, in general, outperforms traditional clustering algorithms and, in the wood board image application, the CL multinets obtained a 12 per cent increase, on average, in clustering accuracy when compared with the kmeans method and a 7 per cent increase, on average, when compared with the EM algorithm.

