Home  << 1 >> 
Ferran, S., Beghelli, A., HuertaCanepa, G., & Jensen, F. (2018). Correctness assessment of a crowdcoding project in a computer programming introductory course. Comput. Appl. Eng. Educ., 26(1), 162–170.
Abstract: Crowdcoding is a programming model that outsources a software project implementation to the crowd. As educators, we think that crowdcoding could be leveraged as part of the learning path of engineering students from a computer programming introductory course to solve local community problems. The benefits are twofold: on the one hand the students practice the concepts learned in class and, on the other hand, they participate in reallife problems. Nevertheless, several challenges arise when developing a crowdcoding platform, the first one being how to check the correctness of student's code without giving an extra burden to the professors in the course. To overcome this issue, we propose a novel system that does not resort to expert review; neither requires knowing the right answers beforehand. The proposed scheme automatically clusters the student's codes based solely on the output they produce. Our initial results show that the largest cluster contains the same codes selected as correct by the automated and human testing, as long as some conditions apply.

Hughes, S., Moreno, S., Yushimito, W. F., & HuertaCanepa, G. (2019). Evaluation of machine learning methodologies to predict stop delivery times from GPS data. Transp. Res. Pt. CEmerg. Technol., 109, 289–304.
Abstract: In last mile distribution, logistics companies typically arrange and plan their routes based on broad estimates of stop delivery times (i.e., the time spent at each stop to deliver goods to final receivers). If these estimates are not accurate, the level of service is degraded, as the promised time window may not be satisfied. The purpose of this work is to assess the feasibility of machine learning techniques to predict stop delivery times. This is done by testing a wide range of machine learning techniques (including different types of ensembles) to (1) predict the stop delivery time and (2) to determine whether the total stop delivery time will exceed a predefined time threshold (classification approach). For the assessment, all models are trained using information generated from GPS data collected in Medellin, Colombia and compared to hazard duration models. The results are threefold. First, the assessment shows that regressionbased machine learning approaches are not better than conventional hazard duration models concerning absolute errors of the prediction of the stop delivery times. Second, when the problem is addressed by a classification scheme in which the prediction is aimed to guide whether a stop time will exceed a predefined time, a basic Knearestneighbor model outperforms hazard duration models and other machine learning techniques both in accuracy and F1 score (harmonic mean between precision and recall). Third, the prediction of the exact duration can be improved by combining the classifiers and prediction models or hazard duration models in a two level scheme (first classification then prediction). However, the improvement depends largely on the correct classification (first level).
Keywords: Machine learning; Stop delivery time; Classification; Regression; Hazard duration; GPS

Poirrier, M., Moreno, S., HuertaCanepa, G. (2021). Robust hindex. Scientometrics, 126, 1969–1981.
Abstract: The hindex is the most used measurement of impact for researchers.
Sites such as Web of Science, Google Scholar, Microsoft Academic, and Scopus leverage it to show and compare the impact of authors. The hindex can be described in simple terms: it is the highest h for which an authors has h papers with the number of cites more or equal than h. Unfortunately, some researchers, in order to increase their productivity articially, manipulate their hindex using dierent techniques such as selfcitation. Even though it is relatively simple to discard selfcitations, every day appears more sophisticated methods to articially increase this index. One of these methods is collaborative citations, in which a researcher A cites indiscriminately another researcher B, with whom it has a previous collaboration, increasing her/his hindex. This work presents a new robust generalization of the hindex called rh index that minimizes the impact of new collaborative citations, maintaining the importance of their citations previous to their collaborative work. To demonstrate the usefulness of the proposed index, we analyze its eect over 600 Chilean researchers. Our results show that, while some of the most cited researchers were barely aected, demonstrating their robustness, another group of authors show a substantial reduction in comparison to their original hindex. 
Rojas, F., Wanke, P., Coluccio, G., VegaVargas, J., & HuertaCanepa, G. F. (2020). Managing slowmoving item: a zeroinflated truncated normal approach for modeling demand. PeerJ Comput. Sci., 6, 22 pp.
Abstract: This paper proposes a slowmoving management method for a system using of intermittent demand per unit time and lead time demand of items in service enterprise inventory models. Our method uses zeroinflated truncated normal statistical distribution, which makes it possible to model intermittent demand per unit time using mixed statistical distribution. We conducted numerical experiments based on an algorithm used to forecast intermittent demand over fixed lead time to show that our proposed distributions improved the performance of the continuous review inventory model with shortages. We evaluated multicriteria elements (total cost, fillrate, shortage of quantity per cycle, and the adequacy of the statistical distribution of the lead time demand) for decision analysis using the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). We confirmed that our method improved the performance of the inventory model in comparison to other commonly used approaches such as simple exponential smoothing and Croston's method. We found an interesting association between the intermittency of demand per unit of time, the square root of this same parameter and reorder point decisions, that could be explained using classical multiple linear regression model. We confirmed that the parameter of variability of the zeroinflated truncated normal statistical distribution used to model intermittent demand was positively related to the decision of reorder points. Our study examined a decision analysis using illustrative example. Our suggested approach is original, valuable, and, in the case of slowmoving item management for service companies, allows for the verification of decisionmaking using multiple criteria.
