From fa35a5035548bac7a7bcf2a45728534ab47f5a3e Mon Sep 17 00:00:00 2001
From: Florian Fischer <florian.fl.fischer@fau.de>
Date: Fri, 15 Feb 2019 15:26:30 +0100
Subject: Add rudimental user documentation

---
 doc/Allocators.md |  26 ++++++++++
 doc/Benchmarks.md | 146 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 172 insertions(+)
 create mode 100644 doc/Allocators.md
 create mode 100644 doc/Benchmarks.md

diff --git a/doc/Allocators.md b/doc/Allocators.md
new file mode 100644
index 0000000..2fce8d8
--- /dev/null
+++ b/doc/Allocators.md
@@ -0,0 +1,26 @@
+# Allocators
+
+allocbench supports three mechanisms to change the used allocator for program
+run with exec. The easiest is using ```LD_PRELOAD``` to overwrite ```malloc/free```
+with the functions of a shared library like libtcmalloc.so. If LD_PRELOAD
+can't be used you can specify a command prefix to somehow load and use your allocator.
+This command prefix is used for different versions of glibc. The command is
+prefixed with the loader of the glibc version to test. *Note that the whole glibc
+is changed maybe tampering with the results*. Additionally binary suffixes are
+supported. This could be used to use with  ```patchelf``` patched binaries to
+use different ```rpath``` or ```linker```.
+
+The used allocators are stored in a global python dictionary associating
+their names with the fields: ```cmd_prefix, binary_suffix, LD_PRELOAD``` and ```color```.
+
+By default this dictionary is build from locally installed allocators found by ```whereis```.
+
+You can overwrite the default allocators with the ```-a | --allocators``` option
+and a python script exporting a global dictionary with the name ```allocators```.
+
+## Building Allocators
+
+To reproducible build allocators and patched version you can use the
+classes ```Allocator{_Sources,_Patched}``` provided in ```src/allocator.py```.
+
+See allocators/no_falsesharing.py or allocators/BA_allocators.py for examples.
diff --git a/doc/Benchmarks.md b/doc/Benchmarks.md
new file mode 100644
index 0000000..7e78947
--- /dev/null
+++ b/doc/Benchmarks.md
@@ -0,0 +1,146 @@
+# Benchmarks
+
+A benchmark in the context of allocbench is a command usable with exec and a
+list of all possible arguments. The command is executed and measured for each
+permutation of the specified arguments and for each allocator to test.
+
+Benchmarks are implemented as python objects that have a function `run(runs, verbose)`.
+Other non mandatory functions are:
+
+* load
+* prepare
+* save
+* summary
+* cleanup
+
+## Included Benchmarks
+
+### loop
+
+A really simple benchmark that allocates and frees one randomly sized block per
+iteration. This benchmark measures mostly the fastpaths of the allocators.
+Allocations are not written or read because this is done by the next benchmark.
+
+### falsesharing
+
+This benchmark consists of two similar programs written by Emery Berger for
+his [Hoard](https://github.com/emeryberger/Hoard/tree/master/benchmarks) allocator.
+They test allocator introduced false sharing.
+
+### larson server benchmark
+
+A benchmark simulating a server application written by Paul Larson at
+Microsoft for its research on memory allocators [[paper]](https://dl.acm.org/citation.cfm?id=286880).
+
+### mysql
+
+Read-only SQL benchmark using mysqld and sysbench to simulate "real" workloads.
+
+### DJ Delorie traces
+
+Rerun of [traces](http://www.delorie.com/malloc/) collected and provided by DJ
+Delorie using the tools from dj/malloc branch of the glibc.
+
+## Add your own Benchmark
+
+1. Make sure your command is deterministic and allocator behavior is a significant
+	part of your measured results
+2. Create a new Python class for your benchmark. You can inherit from the
+	provided class src.Benchmark
+3. Implement your custom functionality
+4. Export a object of your class, import and add it to the list of benchmarks in
+	bench.py
+
+#### loop.py as Example
+
+```python
+import multiprocessing
+
+from src.benchmark import Benchmark
+
+
+class Benchmark_Loop(Benchmark):
+    def __init__(self):
+        self.name = "loop"
+        self.descrition = """This benchmark makes n allocations in t concurrent
+                             threads. Each iteration one block is allocated, """,
+
+        self.cmd = "loop{binary_suffix} {nthreads} 1000000 {maxsize}"
+
+        self.args = {
+                        "maxsize":  [2 ** x for x in range(6, 16)],
+                        "nthreads": range(1, multiprocessing.cpu_count() * 2 + 1)
+                    }
+
+        self.requirements = ["loop"]
+        super().__init__()
+
+    def summary(self):
+        # Speed
+        self.plot_fixed_arg("perm.nthreads / (float({task-clock})/1000)",
+                            ylabel='"MOPS/cpu-second"',
+                            title='"Loop: " + arg + " " + str(arg_value)',
+                            filepostfix="time")
+
+        # Memusage
+        self.plot_fixed_arg("int({VmHWM})",
+                            ylabel='"VmHWM in kB"',
+                            title='"Loop Memusage: " + arg + " " + str(arg_value)',
+                            filepostfix="memusage")
+
+        # L1 cache misses
+        self.plot_fixed_arg("({L1-dcache-load-misses}/{L1-dcache-loads})*100",
+                            ylabel='"L1 misses in %"',
+                            title='"Loop l1 cache misses: " + arg + " " + str(arg_value)',
+                            filepostfix="l1misses")
+
+        # Speed Matrix
+        self.write_best_doublearg_tex_table("perm.nthreads / (float({task-clock})/1000)",
+                                            filepostfix="memusage.matrix")
+
+
+loop = Benchmark_Loop()
+```
+
+## The Benchmark class
+
+The class Benchmark defined in the src/benchmark.py implements lots of
+common operations for a benchmark.
+It provides load and save functions using pythons pickle module,
+helpers generating plots using matplotlib and most importantly a run method using
+the attributes `cmd` and `args` to execute your benchmark. To not enforce some
+result format hooks are available to parse the results of your benchmark yourself.
+
+### run
+
+```
+for number_of_runs
+	for each allocator
+		preallocator_hook
+
+		for each permutation of args
+			build command
+			run command
+
+			process_output
+
+		postallocator_hook
+```
+
+#### run hooks
+
+* ```preallocator_hook((alloc_name, alloc_definition), current_run, verbose)``` is called
+	if available once per allocator before any command is executed. This hook may
+	be useful if you want to prepare stuff for each allocator. The mysql benchmark
+	uses this hook to start the mysql server with the current allocator.
+
+* ```process_output(result_dict, stdout, stderr, allocator_name, permutation, verbose)```
+	is called after each run of your command. Store relevant data in result_dict
+	to use it for your summary.
+
+* ```postallocator_hook((alloc_name, alloc_definition), current_run, verbose)``
+	is called after all permutations are done for the current allocator.
+	The mysql benchmark uses this hook to terminate the in preallocator_hook started
+	mysql server.
+
+### plot helpers
-- 
cgit v1.2.3