\title{Essay: Applying Contemporary C++ in Systems Without Free-Store}
\documentclass[11pt]{article}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{fancyhdr}
\usepackage{listings}
\usepackage{subfig}
\usepackage{biblatex}
\addbibresource{references.bib}
\renewcommand{\floatpagefraction}{.8}%
\captionsetup[subfigure]{labelformat=simple, labelsep=colon}
\pagestyle{fancy}
\author{Bent Bisballe Nyeng <deva@aasimon.org> University of Aarhus}
\begin{document}
\maketitle

\begin{abstract}
In this essay I want to examine to which extend it is possible to use
free-store allocating constructs from the standard template library
(STL) and C++ core-language in enviroments without access to a
free-store.
\end{abstract}

\section{Introduction}
C++ contains a lot of helpful constructs that can be widely used,
including in environments without a free-store, such
as \texttt{concepts}, \texttt{module}s, \texttt{template}s in general,
and functions from \texttt{algorithm} in particular but some parts of
the language and the STL is off-limits when building applications in
environments without free-store such as the perhaps obvious, but
useful \texttt{std::vector} or \texttt{std::string}, but also the less
obvious co-routines\cite{belson} or storing lambdas
in \texttt{std::function}s\cite{elbeno}.
This also inherently means that RAII cannot be used for managing memory
allocations (such as smart-pointers), but can still be used for
managing other types of resources, such as locks or hardware
peripheral access.

\subsection{Dynamic Memory Allocation}\label{dyn}

There can be many reasons for not allocating on the free-store, either
by convention; ``no allocations allowed after the engines has
started'', or because the hardware or operating system doesn't
have a virtual memory abstraction, ie. doesn't have a memory
management unit (MMU)\cite{tannenbaum}, and therefore, over time, is
at risk of fragmenting the memory ultimately leading to memory
depletion\cite{weis}.

In the case of memory fragmentation one might argue that it is not the
allocation that is the problem but rather the free'ing since this is
when the fragmentation happens. The problem, shown in
figure \ref{frag}, might be possible to work around in singular
concrete cases, but cannot be solved in general without the page
indirections of the virtual memory\cite{weis}.

\begin{figure}
\makebox[\textwidth][c]{%
\includegraphics[scale=0.8]{fragmentation.pdf}}
\caption{\textit{(a) visualizes the full, free, memory of a system.
Then, in (b), 4 equal-sized chunks of memory has been allocated
filling up the whole memory. In (c) chunk 2 and 4 has been free'd and
finally, in (d), a chunk which can fit in the total amount of
free memory, is being allocated but fails because of memory
fragmentation.}}
\label{frag}
\end{figure}

This can to some degree be prevented by monotonic allocations where
each allocation always has the same size and therefore can be re-used
directly after being freed. This might work for some special use-cases,
where objects of similar size are being stored in a pool but is not
applicable in software in general and certainly not for the dynamic
allocations in the STL or the core-language which will have varying
sizes.

In particular, a lambda stored in a \texttt{std::function} might
allocate memory on the free-store if the lambda captures exceeds the
size of the (compiler dependent) small-buffer optimization (SBO)
buffer inside the \texttt{std::functions}\cite{elbeno}. The resulting
allocation will have the size of the captured
data. \texttt{std::string}s works in much the same way with its small
string optimization (SSO) which are also compiler dependent in size.

\subsection{Free-Standing}

Work is being done to improve on the ``free-standing C++''
specification towards, among other things, making it run on systems
without free-store by isolating the parts of the STL that can be used
entirely without allocating along with not supporting exceptions and
without the need for run-time type information\cite{craig}.

Working with the resulting reduce sub-set of the available features,
however, is not well suited for making contemporary C++ applications;
Too many of the core components are simply missing to be able to call it
truly contemporary.
Instead, the ideal solution would be to find ways to be able to use all (or at
least more) of the features, but with a potential known set of restictions or
limitations.

\section{Method}
In the following, 3 methods for managing memory allocations will be
examined, and their suitability for real-life applications be
evaluated:
\begin{itemize}
\item Do nothing special, but try to make the compiler fail
      compilation if an unintentional \texttt{new} or \texttt{delete}
      is being called, at least preventing accidental allocations.
\item Use Custom Allocators for the STL components that supports it.
\item Overloading the \texttt{new} and \texttt{delete} operators to
      use stack allocated memory instead of the free-store for all
      allocation.
\end{itemize}

\noindent{}Each of the three will be evaluated with
\texttt{std::string}, \texttt{std::vector}, \texttt{std::function} and
a simple example of a generator
co-routine\footnote{No \texttt{std::generator} was supplied with my
tool-chain so I had to use a similar implementation
from \cite{generator} which can be found in the \texttt{generator.h}
file.}.\\
The experiments are done on a linux PC using the gcc-11.2 compiler.

\section{Experiments}

This section describes the work done during the three experiments
along with any numerical or analytical results found.

\subsection{Detecting Allocations}\label{detect}

The simplest way of addressing the problem of allocation with no
free-store is to simply only use stack allocation constructs, or store all
objects as static globals. this could be by using
the \texttt{std::array} or by using placement \texttt{new} for
allocating with existing buffers.
But as mentioned in certain areas of the C++ language dynamic
allocation might occur without the developer knowing about it.
Searches on The Internet\texttrademark{} has identified no ways direct
ways of instructing the compiler that ``no allocations allowed, fail
if one is made''. The best option I have been able to find is to use a
combination of odd arguments to prevent linking with the standard
libraries (primarily libc and libstdc++), provide some specially
required functions, and then get a linker error if the \texttt{new}
or \texttt{delete} operator functions were needed\cite{cs107e} - not
very practical - \textit{and} one would have to compile the program
twice to first do the check and then to actually create the binary.
The reworked free-standing specification might make this possible in
the future, similarly to how exceptions and run-time type information
can be disabled with \texttt{-fno-exceptions} and \texttt{-fno-rtti},
but that doesn't seem to be around the corner.
Note that there is an existing \texttt{-ffreestanding} argument, but
this follows a different definition of free-standing which allows
dynamic allocation, so this is not a solution.

As mentiond in section \ref{dyn}, some constructs have built-in
optimizations that make them store their data in local members,
instead of on the free-store through a pointer, until some limit is
reached making it possible to use for example \texttt{std::string}s
or \texttt{std::function}s as long as they don't require more than $N$
bytes, where $N$ is a compiler dependent.

To detect if or when a free-store allocation is done (ie. when the $N$
is exceeded) a simple overload of \texttt{new} operator can be made,
simply throwing an exeption if called.
This will lead to a run-time error and not a compile-time error, but
at least it can assists in finding the $N$s for a specific compiler.

The complete code for this experiment can be found in the
\texttt{noalloc.cc} file. For reference the code for the \texttt{new}
operator is listed here:

\footnotesize\begin{lstlisting}[language=C++]
void* operator new(std::size_t)
{
  throw std::bad_alloc();
}
\end{lstlisting}\normalsize

For each of the components, the sizes of $N$ were increased until the
compiled program started throwing the \texttt{std::bad\_alloc} exceptions.
The following table shows thise impirically deduced sizes of $N$ for my
specific compiler for the components supporting SSO or SBO along with
comments about the general component behaviour.\\

\noindent\begin{tabular}{| l | c | l |}
\hline
Component & $N$ & Comments \\
\hline
\texttt{std::string}   & $16$ & Exception propagates to caller. \\
\texttt{std::vector}   & N/A  & No SBO, exception propagates to caller. \\
\texttt{std::function} & $16$ & Exception doesn't seem to propagate to caller\\
                       &      & (possible gcc bug?). \\
co-routine             & N/A  & No SBO, exception propagates to caller. \\
\hline
\end{tabular}\\

16 bytes for storing is not much, but for the \texttt{std::function}
it might be sufficient for most simple lambda captures. For
the \texttt{std::string}s it might also be reasonable for many uses,
but since the strings are more dynamic in nature it is easy to
accidentally break the barrier after a string has been created.

\subsection{Custom Allocator}

Custom allocators are classes that can be used as replacements for the
normal \texttt{new} and \texttt{delete} operators for a single object
instance.

This feature is only supported by some of the components in the STL,
mainly the containers, and can therefore only be used as a solution to
a sub-set of the allocations in an application.
In particular they are good for container types that use monotonic
allocations where all allocations for the same size. Here a freed
chunk of memory is guaranteed to always be possible to re-use so no
fragmentation will occur.

For this experiment a custom allocator is written that uses an
internal member buffer and always returns that when an allocation is
requested.
It throws an exception if the allocation size exceeds the buffer size.
The code can be found in the \texttt{custom.cc} file.
A summized version of the custom allocator, \texttt{StackAllaocator},
can be seen below. It was inspired by the answer to this
article \cite{mapo}, heavily modified to take a stack size as a
template argument:

\footnotesize\begin{lstlisting}[language=C++]
template <typename T, std::size_t S>
struct StackAllocator
{
  ...
  pointer allocate(size_type n)
  {
    if(n > S) throw std::bad_alloc();
    return buf;
  }

  void deallocate(void*, size_type) {}

private:
  T buf[S];
};
\end{lstlisting}\normalsize

The allocator can be used in the following ways:

\footnotesize\begin{lstlisting}[language=C++]
std::basic_string<char, std::char_traits<char>,
                  StackAllocator<char, 32>> str(31, 'a');

std::vector<int, StackAllocator<int, 10>> vec{42};

// Pre-c++17 syntax (not verified)
//std::function<std::allocator_arg, StackAllocator<int, 10>, int()> f;
\end{lstlisting}\normalsize

\texttt{std::string} works but need to use
the basetype, \texttt{std::basic\_string}, for the instantiation,
which is a bit clumsy.
This can be made a bit clearer to read by using a templated type
indirection only exposing the buffer size template argument:

\footnotesize\begin{lstlisting}[language=C++]
template<std::size_t S>
using String = std::basic_string<char, std::char_traits<char>,
                                 StackAllocator<char, S>>;
\end{lstlisting}\normalsize

\texttt{std::vector} on the other hand works works beautifully. The
syntax is clean and easy to understand.

In C++17 the custom allocator support in \texttt{std::function} were
removed - so this is no longer supported\cite{P0302R0}.

Co-routines doesn't support custom allocators directly but they can be
made to use a similar construct by adding a \texttt{operator new}
implementation inside their \texttt{promise\_type}\cite{coroutines}.
In the experiment it was done by modifying the \texttt{Generator}
class, adding the operator along with a template argument specifying
the buffer size.
This new implementation resides in \texttt{generator\_stack.h} and the
main change is shown here:

\footnotesize\begin{lstlisting}[language=C++]
template <typename T, std::size_t S>
struct Generator
{
  ...
  struct promise_type // required
  {
    ...
    void* operator new(std::size_t n)
    {
      static char buf[S];
      if(n < S) throw std::bad_alloc();
      return buf;
    }
  };
};
\end{lstlisting}\normalsize

The allocator, and therefore the buffer it carries, is stored within
the object, so the lifetime of the buffer is guaranteed to live as
long as the object using it.
On the other hand the developer needs to guarantee that no
re-allocations are ever made with any of the objects. Otherwise the
same memory would be re-used leading to undefined behaviour.
A more clever custom allocator could be written that could keep track
of how much of the buffer has actually been used and make new
allocation in the style of the monotonic allocator, but then the same
problems as with the original free-store fragmentation will apply to
the stack buffer itself.

\subsection{Use \texttt{new} with Stack Buffer}

To be able to address all allocations in an application this last
approach examines using the overloaded \texttt{new} and \texttt{delete} from
section \ref{detect} expanding it with a mechanism for using supplying
a stack-buffer used in (at most) one call to \texttt{new}.
It looks like this:

\footnotesize\begin{lstlisting}[language=C++]
namespace memory
{
  void* ptr{};
  std::size_t n{};
}

void* operator new(std::size_t n)// throw(std::bad_alloc)
{
  if(memory::ptr == nullptr) throw std::bad_alloc();
  if(n > memory::n) throw std::bad_alloc();

  auto ptr = memory::ptr;
  memory::ptr = nullptr; // use only once
  memory::n = 0;

  return ptr;
}

void operator delete(void*) throw() {}
void operator delete(void*, std::size_t) throw() {}
\end{lstlisting}\normalsize

The \texttt{ptr} pointer and accompanying size \texttt{n} is put in a
namespace to avoid name namespace pollution but is otherwise left
``naked''.
The pointer is used once in the \texttt{new} operator and reset
to \texttt{nullptr} to avoid being used again.
The \texttt{delete} operators must do nothing since the memory was
owned elsewhere.

A stack buffer object is then made with the following type:

\footnotesize\begin{lstlisting}[language=C++]
template<std::size_t S, typename T = char>
class StackNew
{
public:
  StackNew()
  {
    memory::ptr = buf;
    memory::n = S * sizeof(T);
  }

private:
  T buf[S];
};
\end{lstlisting}\normalsize

and the intention is now to have a \texttt{StackNew} instance create
immediately before any calls to \texttt{new} would happen with an
appropriate buffer size supplied.
It both owns the memory and registers its pointer and size in the
constructor which will be used in the firstcoming call
to \texttt{new}.
This means that objects can be made with the following syntax:

\footnotesize\begin{lstlisting}[language=C++]
StackNew<10, int> buffer; // stack buffer for 10 integers
std::vector<int> vec{1,2,3,4,5,6,7,8,9,10};
\end{lstlisting}\normalsize

The code for these experiments can be found in
the \texttt{stack\_new.cc} file.

The upside is that it works for all allocating object types, including
\texttt{std::string}, \texttt{std::vector}, \texttt{std::function} and
co-routines because the \texttt{new} operator is overridden at a
global level, ie. for the entire application.
The dangers of the mechanism, however, is that it is up to the developer to
ensure that the stack objects are large enough for the allocation,
which can be tricky, and also make sure that it outlives the objects
that use it.
The compiler cannot help with any of those properties so it must be
done by convention and careful code reviews.

\section{Summing Up}

None of the described methods work for all usecases, and many of them
are overly complicated to use correctly.
Even worse is that using them wrong will create bugs that can be very
hard to find, so they may not be advisable to use.

For the \texttt{std::vector} and \texttt{std::string} the most robust
solution is probably to use the custom memory allocators and try to
make the code in a way so that no re-allocations are needed, by
reserving some known maximum size when creating the object.

For the co-routines, supplying the \texttt{new} operator in
the \texttt{promise\_type} is a relatively elegant solution, since
the \texttt{promise\_type} is also the object carrying the rest of the
state. But again the upper limit needs to be known in advance, which
need careful attention (avoid recursion?).

The best way to use \texttt{std::function} is likely to just try to
keep lambda captures small.

Even though the lastly described method works it should probably never
be used in production code.

So there is no single solution that works well for all constructs, but
at least it is possible to use many of the contemporary features of
the language and STL if one is willing to pay the price of extra
cautiousness - and if this is the price to pay to be able to use
co-routines in microcontrollers perhaps it is worth it?

\printbibliography

\end{document}