\title{A4: List vs Vector} \input{preamble.tex} In this exercise I had the idea that I wanted to experiment with co-routine generator for the random numbers but discovered the \texttt{std::generator} is not supplied with my compiler. Instead I found what seemed like a similar implementation here: \texttt{https://en.cppreference.com/w/cpp/language/coroutines}, which I have copied to the \texttt{generator.h} file for use in my program. I am aware that the random number generator already kind-of works similarly to a co-routine generator seen from the outside, so the whole experiment is a bit pointless from a software design perspective. For this exercise I also re-designed my measurement tool from the previous assignments to better support re-use of data across measurements. And, finally, I made a class for storing and outputting measurement points as octave/matlab programs for easier inclusion into the report. This I put in its own file \texttt{octave.cc} with corresponding header \texttt{octave.h}. It is by no means very clever, but made to suit my needs for this assignment. \bigskip \noindent{}The main code for exercise is in the file called \texttt{list\_vs\_vector.cc} \bigskip The random number generation for use in insertion I used the co-routine as described above, inserting, while checking if the number already exists. For the ditto for removal indices, I used a \texttt{uniform\_distribution} thereby being able to gradually reduce the span as the container would be expected to shrink 1 in size for each iteration. Both random number generators were seeded with the iteration number (each size, \texttt{N}, is being tested three times) to make sure the three iterations use unique sets of numbers. For the smaller sizes of \texttt{N} (the number of items to be inserted/removed) i print out the numbers to verify their correctness. These numbers can be seen in the log files in the zip file. The program has been compiled and executed with both a built-in int as the value type of the three containers and a struct padding a single int with chars up to 1k. The log files for the executions are named \texttt{console.log} and \texttt{console-padding.log} respectively. \bigskip \noindent{}For the insertion and removal two templated functions were written: \footnotesize\begin{lstlisting}[language=C++] template void sorted_insert(C& container, T element) { auto it = std::begin(container); while(it != std::end(container) && *it < element) { ++it; } container.insert(it, element); } template void remove_index(C& container, int index) { auto it = std::begin(container); for(int i = 0; i < index; ++i) { ++it; } container.erase(it); } \end{lstlisting}\normalsize With the exception of set insertion, these two functions are used for all the timed container operations, ie. no clever stuff should be going on in the index/iterator lookups. \bigskip In \textit{figure \ref{insert}} the insertion times for \texttt{int}s in the three container types \texttt{std::vector}, \texttt{std::list} and \texttt{std::set} can be seen. I was expecting the \texttt{std::vector} to outperform the othr two because of data locality, and this is indeed the case. I was however surprised how much it outperformed them, but this might be because the compiler could see through my ``slow'' iterator looping and convert that into the more optimized \texttt{vec.begin() + n} random access iterator lookup. At the same time the bad performance of \texttt{std::list} puzzled me. But it might be because the list has to allocate each of its elements on the freestore, which, on the other hand, the \texttt{std::set} also had to do. So I'm not really sure what is going on there. Similarly in \textit{figure \ref{remove}} the performance of the three containers are compared when removing the \texttt{int}s again by randomized index. Again I was expecting the \texttt{std::vector} to outperform the other two, but not as much as it did. I was however expecting the \texttt{std::set} to be slightly slower than the \texttt{std::list} because of the extra needed work for rebalancing the tree. \begin{figure}[!tbp] \makebox[\textwidth][c]{% \subfloat[Insert]{\scalebox{1.7}{\includegraphics[scale=0.35]{a4/insert.pdf}}\label{insert}} \quad \subfloat[Removal]{\scalebox{1.7}{\includegraphics[scale=0.35]{a4/remove.pdf}}\label{remove}} } \caption{Plot of operations on the three container types using regular \texttt{int}s} \end{figure} \clearpage To see how much impact the data locality has on the performance of the three container types, the experiments are repeated with a much bigger value type. I decided to create a \texttt{struct}, \texttt{PaddedInt}, with the needed member functions for it to be storable inside the three containers: \footnotesize\begin{lstlisting}[language=C++] struct PaddedInt { PaddedInt(int i) : i(i) {} int operator=(int other) { i = other; return i; } bool operator<(int other) const { return i < other; } bool operator<(const PaddedInt& other) const { return i < other.i; } int i; char padding[1024 - sizeof(int)]{}; // pad up to 1k }; \end{lstlisting}\normalsize It adds padding to each stored item so that it ends up with a size of 1024 bytes. \begin{figure}[!tbp] \makebox[\textwidth][c]{% \subfloat[Insert of padded int]{\scalebox{1.7}{\includegraphics[scale=0.35]{a4/insert-padded.pdf}}\label{insert-padded}} \quad \subfloat[Removal of padding int]{\scalebox{1.7}{\includegraphics[scale=0.35]{a4/remove-padded.pdf}}\label{remove-padded}} } \caption{Plot of operations on the three container types using structure with and \texttt{int} and padding up to 1kb} \end{figure} With this configuration a decrease in the performance of the \texttt{std::vector} is expected as opposed to the \texttt{std::list} and \texttt{std::set} which only carry pointers to their items. \bigskip Looking at \textit{figure \ref{insert-padded}} the performance of the insertion indeed seem to have become a lot worse for the \texttt{std::vector}. Looking at the absolute performance of the \texttt{std::list} and comparing it to \textit{figure \ref{insert}} it has become $\sim{}10\%$ slower. \texttt{std::set} on the other hand seem to be largely unaffected by the increase in value type size. This is likely because of its $O(log(n))$ lookup/insertion times, which means that it only have to do a fraction of the number of fetches from the free-store compared to the other two (the full data of the \texttt{PaddedInt} \texttt{struct} is needed for each comparison). Removal of the padded elements from the \texttt{std::list} and \texttt{std::set} are expected to be unaffected by the increase if value type size, since they both just store pointers, and because the iterations used to find the index is not needing data from the value type; only the next pointer is needed. On the other hand, \texttt{std::vector} will have to do a lot copying of data (the entire tail will have to be copied one index to the left), when erasing an item in the middle, which amounts to a lot of data. As can be seen in \textit{figure \ref{remove-padded}}, as expected, the performances of \texttt{std::list} and \texttt{std::set} are close to the \texttt{int} counterparts. And, also as expected, the \texttt{std::vector} is really being slowed down. \end{document}