The problem was that the work of gathering all the temperatures for each city (before I could launch the reduction CUDA kernels) required a full parsing through the input data.
My final solution would be slower than the C++ baseline since the baseline already does the full parsing anyways.
The problem was that the work of gathering all the temperatures for each city (before I could launch the reduction CUDA kernels) required a full parsing through the input data.
My final solution would be slower than the C++ baseline since the baseline already does the full parsing anyways.