\title{A3: Concurrency}
\input{preamble.tex}
In this assignment experiments with doing multithreaded searches in
containers is performed measured and evaluated.

A few attempts at using concepts for some generalized utility
functions were made and any comments on the (success or lack thereof)
on these is highly welcome.

The utility functions are the following:
\begin{itemize}
\item The \texttt{measure} function and \texttt{Measure} class from
  previous assignment, doing time measurement and printing the result.
\item \texttt{get\_random\_vector} which is a templated function that
  generates and returns a \texttt{std::vector} of a given size with
  random values in a specified range.
\item \texttt{get\_random\_strings} which is a function that generates
  a \texttt{std::vector} og \texttt{std::string}s of a specified
  length (all the same length) and with random characters picked from
  a specified range.
\end{itemize}

The two generator function are likely far more generalized than
necessary for this exercise, but the idea is to be able to re-use them
in future exercises, facilitate good readability of the code in the
\texttt{main} function and play around a bit with concepts.

\bigskip

The first task is to implement a \texttt{find\_all} function that
returns a \texttt{std::vector<T*>} of found keys in the supplied input
container.
As the wording later in the assignment are ``How large must the
\textbf{array} be,...'', I decided to make the function work on any
container type by using the range concept instead of hard-coding it to
\texttt{std::vector}. Any comments on how this has been done are more
than welcome.
At the top of the \texttt{main} function there are two examples of
other container types I tried using with the functions (commented
out); \texttt{std::array} and c-style array.

\bigskip

The next task was to write a 10-way parallel \texttt{find\_all} with
the same functionality as the first one but executing 10 threads
working on a 10th of the input container.
I generalized this to \textit{n} threads in my first implementation (I
hadn't seen that this was a requirement further down in the
assignment).

The threads are running through \texttt{std::async(std::launch::async, ...)}
and each thread is initially blocking on a
\texttt{std::condition\_variable} to make it possible to start
measureing the execution time only after spawning all the threads.
Each threads then simply executes the initial non-threaded
implementation of \texttt{find\_all} with their respective sub-ranges
and returns their result in each its own \texttt{std::vector<T*>}
through a \texttt{std::future}.
Finally all sub-range results are accumulated in on
\texttt{std::vector<T*>} and returned to the \texttt{main} function.

\bigskip

Vectors of 10 million elements of both \texttt{int}s and
\texttt{std::string}s are used for the tests and each of the two are
tested in single-threaded \texttt{find\_all}, and multithreaded
\texttt{find\_all} with 1 through 16 threads.

Each run is timed and printed to the console and since the output is
rather large is is attached in the \texttt{console.log} file alongside
this report.

\bigskip

The machine used for the tests is an Intel i7 with 4 cores, each hyper
threaded, so 8 ``virtual'' cores, each sharing parts of the cache with
one of the others.

\bigskip

Looking at the results for \texttt{int}; non-threaded search took
about 100ms and (not surprising really) so did the 1-threaded search. From
there the search performed better and better flattening out.

\begin{figure}
  \caption{\label{ints} Showing execution times vs number of threads
    searching for \texttt{int}s.}
  \includegraphics[scale=0.55]{a3/int.pdf}
\end{figure}

\begin{figure}
  \caption{\label{strings} Showing execution times vs number of threads
    searching for \texttt{std::string}s.}
  \includegraphics[scale=0.55]{a3/string.pdf}
\end{figure}

As seen in figure \ref{ints} the performance increase stagnates
already at around 4 threads, suggesting that the memory pipeline/cache
is a bottleneck that makes the threads do a lot of waiting for data
rather than execute instructions.

\bigskip

Similar results are seen on the \texttt{std::string}
searches. Although there seem to be more calculation going on so the
memory/cache bottleneck is less dominating. This can be seen in figure
\ref{strings} the graph seem to flatten out at bit at around 4 threads
but keeps performing better until all 8 hyperthreads are active.

\end{document}