"The test programs compute the Euclidean norm of a vector of dimension $D=50000000$. The computation is rerun for $N=100$ times and the shortes measured time is reported. Choosing the run with the shortest time models the case when the computation code is already present in the I-cache (the I-cache is hot).\n",

"The test programs compute the Euclidean norm of a vector of dimension $D=50000000$. The computation is rerun for $N=100$ times and the shortest measured time is reported. Choosing the run with the shortest time models the case when the computation code is already present in the I-cache (the I-cache is hot).\n",

"\n",

"### Experiment\n",

"As a baseline for the subsequent measurements and comparisons, we measured the runtime for a single-thread norm computation to be $T_1 = 0.12s$. \n",

...

...

@@ -5242,7 +5242,7 @@

"metadata": {},

"source": [

"### Dynamic schedule\n",

"The plot above shows the running time in the same cases as for dynamic schedule. Adding the nowait flag to a dynamic schedule did not produce any wrong results. I suspect that in this case the ammount of work allocated to each thread is equal between loop iterations so the same thread got to work on the same chunk of the array both in initialization and the norm computation. \n",

"The plot above shows the running time in the same cases as for dynamic schedule. Adding the nowait flag to a dynamic schedule did not produce any wrong results. I suspect that in this case the amount of work allocated to each thread is equal between loop iterations so the same thread got to work on the same chunk of the array both in initialization and the norm computation. \n",

"\n",

"As for the dynamic schedule, the variant using multiple parallel regions is faster. "