Table of Contents

Background

The name of the script, Mr. Fusion is inspired by the device that was used to power Doc Brown's Delorian; particularly the fact that Mr. Fusion was capable to supply power by recycling garbage. The paralell was drawn to expensive HP RAID arrays that are flexible, accept SATA drives but are notoriously bad, along with the hpsa at maintaining stability. That being said, the project was created in order to alllow cheap, aftermarket and generally crap 2.5" SATA drives to be just dumped into a StorageWorks HP enclosure and then used for storage. Mr. Fusion is the script that makes recycling the crap drives possible by providing a certain amount of speed to the various IO operations and lifting the load off the RAID controller to prevent very annoying crashes.

One overly obnoxious crash typical of a combination of cheap SATA drives and obsolete HP RAID controllers involves the typical "task … blocked for more than 120s" kernel dump that occurs probably due to the RAID controller being saturated with IO requests and unable to cope anymore. The result is typically processes sliding into uninterruptible IO such that recovery is only possible by rebooting the entire machine.

Given the above, the technique described on this page will act in such situations as a band-aid, in order to avoid fixing the HP RAID controller by replacement with a more competent piece of hardware. Nevertheless, the code is general enough to add a RAM-based cache to any block device such that its usability might vary.

About

Even though Linux allocates buffers out of RAM for various disk operations, it is the case that sometimes a user wants to explicitly back a block device with a RAM cache. In general, it may be desirable to back any block device with a cache without having to redesign the device and the underlying filesystem. One good example thereof would be backing extremely slow storage, such as tape storage, in order to buffer fast reads and writes, many of which may cancel each other out, in order to reduce the number of commits to the slow storage.

Other circumstances could include, for instance, broken RAID array controllers with faulty drivers such as the older HP raid hardware that are many times overwhelmed by the amount of IO operations and tend to lock up such that only a reboot can restore the functionality. In the latter case, perhaps a solution would be to buffer writes and reads such that only definite and permanent changes manage to trickle down to the broken storage.

The setup on this page is meant to back StorageWorks SmartArrays with RAM via EnhanceIO where the actual caching is not performed, as designed, on an SSD, yet directly on a RAM block device with optional compression. As such, the methodology is akin to previous examples where logs have been made to be stored in RAM in order to reduce the wear and tear of SSD or NVME drives.

Design

Conceptually, the following diagram can be used to illustrate the benefits of buffering a slow storage device with a fast and relatively, by comparison, small storage device:

Where A, B and C can be readers or writers that access the filesystem, regardless whether that implies concurrent processes, network accesses or any other form of read and write operation.

The slow storage can either be a faulty RAID controller, tape storage or any other cheap yet large storage container whilst the fast cache is considered an expensive yet small storage container such as RAM, NVME or SSDs.

Given a LRU or FIFO cache replacement policy, the most beneficial situation is ideal when all A, B and C operations are dependent upon each other, and access more or less the same blocks, such that no backing store fetch is necessary to retrieve updates from the slow storage. The most detrimental situation is when all operations are independent of each other and access a diversity of blocks, such that the cache is perpetually saturated and backing store fetches will always be required.

Practical Scenarios

Practical scenarios could include:

Setup

The setup will use EnhanceIO but instead of using an SSD as a "fast device", a customized and optionally compressed RAM device will be used instead. The first step is to make sure that the zram module is loaded. This can be done by editing /etc/modules and appending zram to the file such that the zram module will be loaded on reboot. For now, the zram module can be loaded with modprobe zram in order to check that the caching works as designed.

Copy the contents of the SystemD service file from the code section below to a file placed at /etc/systemd/system/eio_mrfusion.service. Next, edit the /etc/systemd/system/eio_mrfusion.service file in order to configure various parameters. Looking at the top of the file, a section is marked with a large header CONFIGURATION and all following parameters may be configured by the user. The parameters include the following:

The rest of the file after the INTERNALS header is a compressed, in-line script that is packed and unpacked dynamically whenever the service starts or stops.

Performance Enhancements

The state and occupation of the cache can be observed by periodically polling a file created by EnhanceIO at /proc/enhanceio/CACHE_NAME/stats where CACHE_NAME is the descriptive name of the cache configured within the SystemD service file or the shell script. A suitable command could consist in the following:

watch -n .1 -c 'cat /proc/enhanceio/CACHE_NAME/stats'

and could be used for further optimizations or monitoring.

The following observations, based on the settings can be made:

Code

As a developer note, similar Wizardry and Steamworks design has been used before for the bash JSON parser where a script has been reduced, compressed and Base64 encoded in order to be inlined inside a different script. The inlined payload gets dynamically unpacked at runtime, evaluated into a program and then executed with extra optional parameters.

One of the reasons behind doing so is to prevent littering the filesystem with various files and other artifacts that might be forgotten whenever a system update occurs and end up being a liability. Aside from the overwhelmingly cool aspect of it, and the fact that all the above has been realized within a SystemD service file that is below 100 lines, the payload is just a gzip-compressed and Base64 encoded version of the script responsible for setting up the cache.

To check the payload, the large Base64 string can be copied to a file, the backslashes acting as SystemD-centric multiline separators stripped and then unpacked using the following command:

cat payload | base64 -d | gzip -d | less

Service File

[Unit]
Description=Back RAID array with compressed RAM
Requires=local-fs.target remote-fs.target
After=local-fs.target remote-fs.target

[Install]
WantedBy=multi-user.target

[Service]
##############################################################################
#                             CONFIGURATION                                  #
##############################################################################

# The path to the block device.
Environment=DEVICE=/dev/md1

# Size of the cache to create; b, K, M, G, T are accepted suffixes of units.
Environment=SIZE=10G

# A short identifier for the cache.
Environment=NAME=slow

# Compression to use for in-RAM cache.
Environment=ZIP=lzo

# The EIO cache mode, writeback (wb) or writethrough (wt)
Environment=MODE=wt

# Cache replacement policy (lru,fifo or random)
Environment=POLICY=lru

##############################################################################
#                               INTERNALS                                    #
##############################################################################
Environment='SCRIPT=\
H4sIAF+gD18AA91ZbXPbuBH+fPwVG8ZTSzeWZDuZuYx8vlaWZUfjF7mynLaXpBmIhCSOSYIHgGbs\
y/337gIkRUqWnV7SL+UHhwKxi90Hu88ukJcvOqmSnWkQd3h8B2rhvPx+D+oC6IvkXgbzhYZGvwn/\
CB6Y9OU9sNiHa81ZlAl5q2B/d38XWnAeeDxWvAunlzdwenV+9wrMk+u6CjlTHBTHGQutk26nk2VZ\
ex6nbSHnndBKq848CdsLHYUwExJCPmch+FyzIFQ7pS5jkwIxgxkLJKSKzfkO6AUHP1BeyIKIS2Nl\
xqRksb4HT8R+oAMRqzZAxa7viBcM4gWLPT4cwYVsw0mqcDnERS8CBVqIEK1R4EnONPchiEFIH83U\
AqbMu0VP0hANJcNQ17g3PCajtRRhyKWCLNALHIgSyZVC+XHvghyWfHoPiRR3gR/Ec2CoRmmIeCRk\
qWuazma4ULZADDWwMBQZzSW4VpYhYxR+DgmwKAo0DfgyuOMEG+n6Xg/pmqABaPpcsggiE2ULdsdB\
kFu4p1yBChBPQDMQwTmPuQw8NP8eA0IF8xhRQPNyHwlCFmNsymmgJUPvp6HAMZ/fYWRZ+Agzigpf\
oO5YaEi4xMiKSQ0h8T/wccx/SwPJIx5r1f1GXQA/VmLslsuYhxAJPw05NLj9EIiWfxuppvETMZQq\
Yeh+qoMQw99sY6mrpyBKvRwWZeJBBQ+c0oreJZrOFcWqx7z8W3uTXb+SkkfW24GESR14acgkbtwD\
7rWnQ2jQhFYYxOnnZruu6/s8Jr4wcAPWrWTjDhxNJidok8gwlky++DwUMmCxjZA5hg9yyZqu78gT\
aNgxN8kOdwxXnuLu+RxJSzLLT06vPxmOLg9d1+n3+m8Hn66Hvw7o10vIpjuQyUBzCvcdFDN6upDp\
fFwvpEjni1zwYnQ8OMym+a/L3sVgqbR3fjoaDydvLw7DB5GPXY3Oh/1/HYYydRy1ENlbHiaNJvzu\
EAoe0/Dzz4PRBL6A5ty5IdLtwtYuvG8xUBq3+YvSIvmIv5UJFnrza2n40Vmi+r5FXDRH8PUioqkJ\
bgtWgXt6j/KYw+hGNTWpOP8Us4jDR6f4dmQIoJ70mNkrjIlRgLteJpHjjBIDOtIG75bL1PzB3+YH\
0jXQT1rdhk51raWs9b00uJpV1nAkG1sEoIHbeRYc7cAF/TnFP+12c6mpjp3RVB8QVuNSogpbvnZh\
Lg1Vgoe8qUVMqaOyKz8YBTmCxn1KcVOYq6pXESg28odiEnFJEiIvEA3mH6GBYbYzC2aCTMEi7Yuo\
4nts97eKohnIuclqreBgAHWcwWcWJSFXXRsXfQv4KnAd/LcT+bs7NukZ7CH2Jj6sXuJOKjQkmcc+\
Dbs+08zNVWPcF2FCe763+9Orn17vvdl/TftWLEB+kJAVuRTZDqhqDJFl5dyqiSuLoEipqZfqhZCF\
h5t6soaYzVDP3+YywFaAU4vVRHhGE+cPx6FOgGNBRb3YRbGu6vrdqBt3dTfpwuiK2Of6AKtk7jtu\
ubuVD7uYVGaYLTer4Cua0xufuuWHXNR+LyXpcQ1wLlKJS965TTg4KFMZfmxWMh6AewsB7mA8Ho27\
kMa3schiYB4lrluf+Blbhb3aEKotvyrmOSvjarlSlWvXPMmn+6vTj85H/bNPx4N3w/4TYpXADmbw\
Hl5A6wHK2fDxgAIhrtldIfB1tRVsl/Nq+BqMs6kBONMr8D4C8UaYK3xSXWtdeB36CgDlrOoWzIJV\
nOJVeG3J2gSrXp2+rGkbY3FlYj0osRIaxMKH18W/C8+8YaUNkWDM+5vX+38iXnNmKaj1WwM3WfU9\
r93POG5nrXgtU+MXcbF5sWTsQrNqxFc7m4l4I+k/6fVjbr7/68flckVL4mxQkMsYPb6IObVaPc/0\
38jxxK5tx+Sfyb7JxdXxcFzPPjt22NFR4mB4muQ+GZ5TDNpPnTZ22J8iOTPtZPuVz/beuKQ0uvXx\
IOpulSIu7P/ylz34xdB7nIZhZRk8nySw/TvICFpyVpM6gD+24Wx4fg5/vxlOYDIYX8Dgn/g2vJzA\
25srh4fK1qQa8H2Rhr450rDcYVNMZsjzFvMcKnTKWWdlQ8YW55fQX3CU1Ats91i11VrgEWHKeQwq\
TZIw4H7bCCwBXabsOqXVjI1FVa89vHLffWxbc46wqhGH68n14VaD9sALAzQdu4cvMMdAq6/fXJr2\
omqc1fCMecv+goXYVvj3ZJDS6ikDa7Ct9RtWvl0zaloaVa0hz5hW0/qIuImAmUjjjWjm1p5ya2ac\
RlM8bGNXxe5YEJrDSP/qRlnUr24I7VB5SVrA/G8c/NBQH5o4wLJbaJ3AdnebgjnBA5WGrX0K4C9A\
DXeLw7bqfFCdzny7jC7b+dYcMe3WEj40twoYnSyLCrvVKA6QrdYsQLFWS2nUGClCE811aYS67QJc\
quc0WBLvIzWgubYxlTWf2RGvTLy8pycDc8ee2YMcilorS1DYnjJvES0G48H1zfmkEvn5YthqPh5F\
LbXqRSta6RZayVpVaHkraUwU1vzWDPxv88+rcJkO7pYYYWGhHn0VXLpkicQdr3xslx/LeJErgDyR\
zFhHqCn9PyXEP0+H5bWZSc+n6fCY061hfsdYbIv1byZFVNFKY+uZ7lr1g+Ho0/DyZLTBR6Keda4p\
qanOTHtu1zX0ZME4Gd1cHh/uWrPxCHo+vBzQjcFWsWZ5AqLnjCwwsGzRRHfTGqV6et59ndB+Tchs\
krt15sIhuOYAe4kIPX5KyOe+M3OfjLfisV7XW81KI16w05oh19fHxzbvnjLEaEce4b/B3kYTatuM\
tm+yZXWTKp9Nd1euayL66yn7JjZlTgswJcRG4WPE8lhcT+gyHxtRoqgpLy6B6CwvEm5v8DZQNvdR\
2xrDIi1LUeHZl3CUasgYzmXxfcbuy9sW7C/1PSRszpX5PwOeX9Hrgngobq8nPSQEt4Mc4nXKK+FO\
ZckOlpeil7HoUQNayq5jZ1VuNUzCxfKTtaMmsmwFYHNc20sHipJcqhVz2KXlKmlGjwo5rrQM0XKz\
K+QS8qIqbEDbt1OeqGjL7vBr+PNrQ8qvmPYcQ5qZq0Xr6YKFpSk/fz17KVI7LVUMQBXmhOT8B0W8\
7k4/HAAA\
'
Type=oneshot
KillMode=none
User=root
TimeoutSec=300
RemainAfterExit=yes
ExecStart=/usr/bin/env sh -c \
    "echo $SCRIPT | \
    sed -e 's/\s//g' | \
    base64 -d | \
    gzip -d | \
    xargs -0 -i sh -c 'eval {}' sh -a start -s $SIZE -t $ZIP -m $MODE -p $POLICY -d $DEVICE -n $NAME"
ExecStop=/usr/bin/env sh -c \
    "echo $SCRIPT | \
    sed -e 's/\s//g' | \
    base64 -d | \
    gzip -d | \
    xargs -0 -i sh -c 'eval {}' sh -a stop -n $NAME"

Script

The unpacked and unzipped Base64 string reveals the following script:

#!/usr/bin/env sh
###########################################################################
##  Copyright (C) Wizardry and Steamworks 2020 - License: GNU GPLv3      ##
##  Please see: http://www.gnu.org/licenses/gpl.html for legal details,  ##
##  rights of fair usage, the disclaimer and warranty conditions.        ##
###########################################################################
# EnhanceIO Mr. Fusion - this tool was created in order to back faulty    #
# RAID controllers with compressed RAM thereby providing a fast memory    #
# buffer whilst allowing the RAID controller to slowly commit to drives.  #
#                                                                         #
# The program might have other uses since it is generically designed to   #
# back any arbitrary block device with RAM and does not pertain to RAID.  #
#                                                                         #
# Requirements:                                                           #
#   * EnhanceIO kernel module (enhanceio-dkms) and userspace utilities.   #
#   * As much RAM as the size of the requested cache size.                #
#   * ZRAM userspace utilities, particularly zramctl (util-linux).        #
#                                                                         #
# Trivia: Mr. Fusion, BTTF, powering a delorian with garbage              #
###########################################################################
 
# Default variable declarations.
ACTION=""
CACHE_SIZE=""
# wb, writeback, default: wt, writethrough
CACHE_MODE=wb
CACHE_NAME=""
CACHE_ALGORITHM=lzo
CACHE_POLICY=lru
 
showHelp() {
    cat <<EOT | tee
Usage: $0 [-a start|stop] [-s size] [-d block device]
          [-t algorithm] [-p policy] [-m cache mode]
          [-n cache name ]
 
       Back a block device in compressed RAM using EnhanceIO
 
Options are:
        -a start|stop   start or top caching a block device
        -s size         the size of cache to create (b, KiB, MiB, GiB,..)
        -d block device the block device to cache
        -m cache mode   the caching mode, writeback or writethrough
        -t algorithm    the compressor to use for the caching device
        -p policy       the cache replacement policy (lru,fifo or random)
        -n name         the name of the cache device to create
 
Examples:
 
    Cache the block device /dev/md0, with a 1GiB RAM cache and call the
    cache "data":
 
    $0 -a start -s 1073741824 -d /dev/md0 -n data
 
    Now, stop caching the /dev/md0 block device:
 
    $0 -a stop -n data
 
Authors:
 
    Wizardry and Steamworks (office@grimore.org)
 
EOT
}
 
while getopts a:s:d:m:n:t:p: OPTIONS; do
    case "$OPTIONS" in
    a)
        ACTION="$OPTARG"
        case "$ACTION" in
        "start" | "stop") ;;
 
        *)
            echo "ERROR: unknown action"
            exit 1
            ;;
        esac
        ;;
    s)
        CACHE_SIZE="$OPTARG"
        ;;
    d)
        CACHE_BLOCK_DEVICE="$OPTARG"
        ;;
    m)
        if [ ! -z "$OPTARG" ]; then
            CACHE_MODE="$OPTARG"
            case "$CACHE_MODE" in
            "wb" | "wt") ;;
 
            *)
                echo "ERROR: unknown cache mode $CACHE_MODE"
                exit 1
                ;;
            esac
        fi
        ;;
    n)
        CACHE_NAME="$OPTARG"
        ;;
    t)
        CACHE_ALGORITHM="$OPTARG"
        case "$CACHE_ALGORITHM" in
        "lzo" | "lz4" | "lz4hc" | "deflate" | "842") ;;
 
        *)
            echo "ERROR: unknown cache algorithm"
            exit 1
            ;;
        esac
        ;;
    p)
        CACHE_POLICY="$OPTARG"
        case "$CACHE_POLICY" in
        "lru" | "fifo" | "random" ) ;;
 
        *)
            echo "ERROR: unknwon cache replacement policy"
            exit 1
        esac
        ;;
    [?])
        showHelp
        exit 1
        ;;
    esac
done
 
# Acquire a lock.
if [ -z "$TMPDIR" ]; then
    TMPDIR=/tmp
fi
LOCK_FILE="$TMPDIR/.eio_mrfusion.3da18"
if mkdir "$LOCK_FILE" 2>&1 >/dev/null; then
    trap '{ rm -rf "$LOCK_FILE"; }' KILL QUIT TERM EXIT INT HUP
else
    echo "ERROR: Could not acquire lock file"
    exit 1
fi
 
case "$ACTION" in
start)
    # Check that a cache name has been supplied.
    if [ -z "$CACHE_NAME" ]; then
        echo "ERROR: no cache name provided"
        exit 1
    fi
    CACHE_EXISTS=$(eio_cli info | grep "$CACHE_NAME")
    if [ ! -z "$CACHE_EXISTS" ]; then
        echo "ERROR: the cache already exists"
        exit 1
    fi
    # Check that the block device exists.
    if [ ! -b "$CACHE_BLOCK_DEVICE" ]; then
        echo "ERROR: block device $CACHE_BLOCK_DEVICE not found"
        exit 1
    fi
 
    # Get the number of available CPUs
    CPUS=$(lscpu | grep ^CPU\(s\) | awk -F ':' '{ print $2 }' | sed -e 's/\s//g')
    # Create block device and check that it exists.
    ZRAM_DEVICE=$(zramctl --find --streams "$CPUS" --size "$CACHE_SIZE" --algorithm "$CACHE_ALGORITHM")
    if [ ! -b "$ZRAM_DEVICE" ]; then
        echo "ERROR: could not create zram device"
        exit 1
    fi
 
    # Create cache device and start caching.
    RESULT=$(eio_cli create -d "$CACHE_BLOCK_DEVICE" -s "$ZRAM_DEVICE" -m "$CACHE_MODE" -p "$CACHE_POLICY" -c "$CACHE_NAME" 2>&1)
    CACHE_EXISTS=$(eio_cli info | grep "$CACHE_NAME")
    if [ -z "$CACHE_EXISTS" ]; then
        echo "ERROR: could not activate cache on RAM device"
        # Remove RAM device.
        zramctl -r "$ZRAM_DEVICE"
        exit 1
    fi
    ;;
stop)
    # Check that a cache name has been supplied.
    if [ -z "$CACHE_NAME" ]; then
        echo "ERROR: no cache name provided"
        exit 1
    fi
    CACHE_EXISTS=$(eio_cli info | grep "$CACHE_NAME")
    if [ -z "$CACHE_EXISTS" ]; then
        echo "ERROR: the cache does not exist"
        exit 1
    fi
    # Derive the RAM device name from the cache name.
    ZRAM_DEVICE=""
    EIO_INFO=$(eio_cli info | grep ':' | sed -e 's/\s//g' | awk -F':' '{ print $1":"$2 }')
    FOUND=0
    for LINE in $EIO_INFO; do
        K=$(echo "$LINE" | awk -F':' '{ print $1 }')
        V=$(echo "$LINE" | awk -F':' '{ print $2 }')
        if [ "$K" = "CacheName" ]; then
            if [ "$V" = "$CACHE_NAME" ]; then
                FOUND=1
            fi
        fi
 
        if [ "$K" = "SSDDevice" ]; then
            if [ "$FOUND" -eq 1 ]; then
                ZRAM_DEVICE="$V"
            fi
            FOUND=0
        fi
    done
    if [ -z "$ZRAM_DEVICE" ]; then
        echo "ERROR: Unable to find cache RAM device"
        exit 1
    fi
    # This should be a blocking operation.
    RESULT=$(eio_cli edit -c "$CACHE_NAME" -m ro 2>&1)
    # But wait anyway for the dirty pages to be committed.
    EIO_STATS="/proc/enhanceio/$CACHE_NAME/stats"
    if [ -f "$EIO_STATS" ]; then
        STATS=$(grep nr_dirty "$EIO_STATS" | awk -F ' ' '{ print $2 }')
        while [ "$STATS" -ne 0 ]; do
            sleep 1
        done
    fi
    # Delete cache.
    RESULT=$(eio_cli delete -c "$CACHE_NAME" 2>&1)
    if [ ! -z $(eio_cli info | grep "$CACHE_NAME") ]; then
        echo "ERROR: Unable to delete cache"
        exit 1
    fi
    # Delete RAM device.
    zramctl -r "$ZRAM_DEVICE"
    ;;
*)
    echo "ERROR: unknown action"
    showHelp
    exit 1
    ;;
esac