Osorio-Valenzuela, L., Pereira, J., Quezada, F., & Vasquez, O. C. (2019). Minimizing the number of machines with limited workload capacity for scheduling jobs with interval constraints. Appl. Math. Model., 74, 512–527.
Abstract: In this paper, we consider a parallel machine scheduling problem in which machines have a limited workload capacity and jobs have deadlines and release dates. The problem is motivated by the operation of energy storage management systems for microgrids under emergency conditions and generalizes some problems that have already been studied in the literature for their theoretical value. In this work, we propose heuristic and exact algorithms to solve the problem. The heuristics are adaptations of classical bin packing heuristics in which additional conditions on the feasibility of a solution are imposed, whereas the exact method is a branch-and-price approach. The results show that the branch-andprice approach is able to optimally solve random instances with up to 250 jobs within a time limit of one hour, while the heuristic procedures provide near optimal solution within reduced running times. Finally, we also provide additional complexity results for a special case of the problem. (C) 2019 Elsevier Inc. All rights reserved.
|
Quezada, F. A., Navarro, C. A., Romero, M., & Aguilera, C. (2023). Modeling GPU Dynamic Parallelism for self similar density workloads. Future Gener. Comput. Syst., 145, 239–253.
Abstract: Dynamic Parallelism (DP) is a GPU programming abstraction that can make parallel computation more efficient for problems that exhibit heterogeneous workloads. With DP, GPU threads can launch kernels with more threads, recursively, producing a subdivision effect where resources are focused on the regions that exhibit more parallel work. Doing an optimal subdivision process is not trivial, as the combination of different parameters play a relevant role in the final performance of DP. Also, the current programming abstraction of DP relies on kernel recursion, which has performance overhead. This work presents a new subdivision cost model for problems that exhibit self similar density (SSD) workloads, useful for finding efficient subdivision schemes. Also, a new subdivision implementation free of recursion overhead is presented, named Adaptive Serial Kernels (ASK). Using the Mandelbrot set as a case study, the cost model shows that optimal performance is achieved when using {g -32, r -2, B -32} for the initial subdivision, recurrent subdivision and stopping size, respectively. Experimental results agree with the theoretical parameters, confirming the usability of the cost model. In terms of performance, the ASK approach runs up to -60% faster than DP in the Mandelbrot set, and up to 12x faster than a basic exhaustive implementation, whereas DP is up to 7.5x faster. In terms of energy efficiency, ASK is up to -2x and -20x more energy efficient than DP and the exhaustive approach, respectively. These results put the subdivision cost model and the ASK approach as useful tools for analyzing the potential improvement of subdivision based approaches and for developing more efficient GPU-based libraries or fine-tune specific codes in research teams.
|