|
Ferran, S., Beghelli, A., Huerta-Canepa, G., & Jensen, F. (2018). Correctness assessment of a crowdcoding project in a computer programming introductory course. Comput. Appl. Eng. Educ., 26(1), 162–170.
Abstract: Crowdcoding is a programming model that outsources a software project implementation to the crowd. As educators, we think that crowdcoding could be leveraged as part of the learning path of engineering students from a computer programming introductory course to solve local community problems. The benefits are twofold: on the one hand the students practice the concepts learned in class and, on the other hand, they participate in real-life problems. Nevertheless, several challenges arise when developing a crowdcoding platform, the first one being how to check the correctness of student's code without giving an extra burden to the professors in the course. To overcome this issue, we propose a novel system that does not resort to expert review; neither requires knowing the right answers beforehand. The proposed scheme automatically clusters the student's codes based solely on the output they produce. Our initial results show that the largest cluster contains the same codes selected as correct by the automated and human testing, as long as some conditions apply.
|
|
|
Hughes, S., Moreno, S., Yushimito, W. F., & Huerta-Canepa, G. (2019). Evaluation of machine learning methodologies to predict stop delivery times from GPS data. Transp. Res. Pt. C-Emerg. Technol., 109, 289–304.
Abstract: In last mile distribution, logistics companies typically arrange and plan their routes based on broad estimates of stop delivery times (i.e., the time spent at each stop to deliver goods to final receivers). If these estimates are not accurate, the level of service is degraded, as the promised time window may not be satisfied. The purpose of this work is to assess the feasibility of machine learning techniques to predict stop delivery times. This is done by testing a wide range of machine learning techniques (including different types of ensembles) to (1) predict the stop delivery time and (2) to determine whether the total stop delivery time will exceed a predefined time threshold (classification approach). For the assessment, all models are trained using information generated from GPS data collected in Medellin, Colombia and compared to hazard duration models. The results are threefold. First, the assessment shows that regression-based machine learning approaches are not better than conventional hazard duration models concerning absolute errors of the prediction of the stop delivery times. Second, when the problem is addressed by a classification scheme in which the prediction is aimed to guide whether a stop time will exceed a predefined time, a basic K-nearest-neighbor model outperforms hazard duration models and other machine learning techniques both in accuracy and F-1 score (harmonic mean between precision and recall). Third, the prediction of the exact duration can be improved by combining the classifiers and prediction models or hazard duration models in a two level scheme (first classification then prediction). However, the improvement depends largely on the correct classification (first level).
|
|
|
Poirrier, M., Moreno, S., Huerta-Canepa, G. (2021). Robust h-index. Scientometrics, 126, 1969–1981.
Abstract: The h-index is the most used measurement of impact for researchers.
Sites such as Web of Science, Google Scholar, Microsoft Academic, and Scopus
leverage it to show and compare the impact of authors. The h-index can be
described in simple terms: it is the highest h for which an authors has h papers
with the number of cites more or equal than h.
Unfortunately, some researchers, in order to increase their productivity
articially, manipulate their h-index using dierent techniques such as selfcitation.
Even though it is relatively simple to discard self-citations, every day
appears more sophisticated methods to articially increase this index. One of
these methods is collaborative citations, in which a researcher A cites indiscriminately
another researcher B, with whom it has a previous collaboration,
increasing her/his h-index.
This work presents a new robust generalization of the h-index called rh-
index that minimizes the impact of new collaborative citations, maintaining
the importance of their citations previous to their collaborative work.
To demonstrate the usefulness of the proposed index, we analyze its eect
over 600 Chilean researchers. Our results show that, while some of the most
cited researchers were barely aected, demonstrating their robustness, another group of authors show a substantial reduction in comparison to their original
h-index.
|
|
|
Rojas, F., Wanke, P., Coluccio, G., Vega-Vargas, J., & Huerta-Canepa, G. F. (2020). Managing slow-moving item: a zero-inflated truncated normal approach for modeling demand. PeerJ Comput. Sci., 6, 22 pp.
Abstract: This paper proposes a slow-moving management method for a system using of intermittent demand per unit time and lead time demand of items in service enterprise inventory models. Our method uses zero-inflated truncated normal statistical distribution, which makes it possible to model intermittent demand per unit time using mixed statistical distribution. We conducted numerical experiments based on an algorithm used to forecast intermittent demand over fixed lead time to show that our proposed distributions improved the performance of the continuous review inventory model with shortages. We evaluated multi-criteria elements (total cost, fill-rate, shortage of quantity per cycle, and the adequacy of the statistical distribution of the lead time demand) for decision analysis using the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). We confirmed that our method improved the performance of the inventory model in comparison to other commonly used approaches such as simple exponential smoothing and Croston's method. We found an interesting association between the intermittency of demand per unit of time, the square root of this same parameter and reorder point decisions, that could be explained using classical multiple linear regression model. We confirmed that the parameter of variability of the zero-inflated truncated normal statistical distribution used to model intermittent demand was positively related to the decision of reorder points. Our study examined a decision analysis using illustrative example. Our suggested approach is original, valuable, and, in the case of slow-moving item management for service companies, allows for the verification of decision-making using multiple criteria.
|
|