About

One of the re-occurring issues is the problem of transferring files or directories between machines in a very convenient manner when the physical transfer of the data would be inconvenient. The scenario is very typical and consists in some data, either a large file or a directory that has to be moved from one system to another but with a minimal amount of time invested or by doing both the sending and receiving remotely. Tools exist, such as rsync or unison but it is not always a guarantee that these tools are a available on systems, in particular embedded systems, nor is there most of the time any graphical interface such that everything has to be done at the command line while using fancy program sparingly. The following guide aims to lay out the problem and find a generic solution with interchangeable parts that can then be applied to whatever type or constrictions are placed upon the systems involved in the data transfer.

Diagram (Transfer)

There are situations where the topology is more of a star network with multiple nodes where one file would have to be replicated to many. However, to start of, the network consists in just two nodes, one of them being the sender and the other the receiver. The following is a representation of the two systems with the two layers involved, one of them being the filesystem and the other the network layer:

With that said, it is identified that the two layers that will affect the transfer between the two systems labeled send and recv will be the filesystem and the network layer. The filesystem layer is not shared between the two systems meaning that it is a free and unbound variable that can have local properties on both systems. This is somewhat trivial considering that the data to be transferred does not necessarily reside on the same type, brand or even technology storage medium between the two systems such that applying local optimizations to the transfer on both ends that would also not be observable by the other end of the transfer might be an avenue to explore. The same applies to the CPU and RAM layer that is also not shared but might differ between machines.

Given the former, the transfer can be summarized using a block-diagram of irreducible abstract operations that would be involved in the transfer regardless of any local mismatch between the two systems. The sequence of operations is important for the diagram with the encapsulation being read from left to right, where the left-most operation is most distant and the last operation is the closest to the network layer.

Obviously, on the other end, the operations will have to be applied symmetrically reversed over the network.

Breaking the symmetric order will not work and will more than likely result in early termination or garbage being written to the filesystem.

Read Files

A followup distinction is that handling a single file rather than a directory is very different even in abstract, considering that for a directory the files can first be collected into an archive or collected individually on the fly before being passed to the rest of the work pipeline involved in the transfer. There are various ramifications here, depending on storage limitations, RAM limits and even CPU power, mostly involving embedded systems where creating a large archive before sending it over to the rest of the work pipeline is either not feasible or impractical.

Compression

Compression, like cryptography, is another CPU or rather processing-power bound operation that might vary both with the contents of the files to be transferred and with the actual algorithm applied. Compared to older days, modern file formats usually have built-in compression such that most of the time the complexity of this layer in the transfer pipeline should be minimized or else there is a risk to waste time on redundant operations. For example, compressing a movie file or a "song", for instance, an MP3 is mostly a pointless operation because the file formats have built-in compression.

Perhaps one of the best options is to use a fast compressor like LZMA or LZO, mainly a compressor that would handle redundant zeroes within files or apply patterns that do not overload the CPU too much.

Cryptography

The cryptography block should not be omitted although, if the network is considered secure then the cryptography block can be dropped altogether. Cryptography is used because after the data is handed over to the network layer, the data travels unencrypted between the connected endpoints, which makes the data entirely observable by the gateways, and unless there is some form of tagging implemented across all the routers in-between, then the packets will end up broadcasted to every machine on the network meaning that any adversary machine on the network could reconstruct the data just by assembling the raw data packets.

In fact, the command-chains being built by users typically do not include encryption, with the data not being considered necessarily sensitive or the network itself being considered secure. Nevertheless, there are distinctions to be made here depending on what each individual system supports, such that if encryption is used and given that the same algorithm is used, some mean common denominator must be chosen that will work best across the systems being involved.

One way to check performance is to use cryptsetup-benchmark from the cryptsetup package in order to determine which cipher scores the highest in terms of performance. Normally, it should be known in advance what the system supports or will be good at in terms of performance depending on the hardware available. For example AES-NI, that is the cryptographic algorithm itself has been embedded as an instruction set in AMD and Intel machines starting from 2008 (and implemented mostly a few years later), such that a machine with AES-NI support or its wider version AVX will perform much faster on hardware that supports the hardware extension. Go figure that second-hand commodity hardware and even much of the current enterprise hardware that is being phased out (this includes even powerful Xenons) might not actually have too good of a support or encryption such that it is a matter of balancing the choice with the rest of the pipeline.

Note that there is no time for PKI and if setting up the infrastructure takes longer than performing the actual transfer then that seems a waste of time, such that symmetric encryption should be used instead. Typically applying a cryptographic layer works by using the openssl command-line tool, with equivalents for embedded systems existing such as wolfclu for WolfSSL and applying encryption, respectively decryption on both ends using an unsalted password. The type of cipher to use is important along with the chosen bit size and a list can be obtained by issuing:

openssl enc -list

that will dump out a list of ciphers that can be used. The result can then be cross-referenced with cryptsetup, namely cryptsetup benchmark in order to determine which cipher performs best across both machines.

Connect or Bind

One of the main remarks here is that in terms of raw TCP/IP protocols such as UDP that are session-less, the concept of "connecting" or "being connected to" is entirely irrelevant, the typical way that an UDP connection is established is by opening up a socket on both ends and then sending the data with the distinction of who the sender or the receiver might be being only observable depending on what operations are performed on both ends but the protocol does not regulate the distinction in any way. Having said that, the distinction between connecting to a system and sending the files or having a system connect back to the current system and then sending the files is completely agnostic to the file transfer such that the flexibility of connecting or listening might be helpful in order to work around security protocols such as firewalls that might be controllable only partially on the systems involved in the file transfer.

Flow Chart

To summarize the commands, the following table splits actual software usage into the four abstract categories discussed in the previous section. Now, you can see how the layer abstraction is useful because the abstract layers are composed of various Linux commands and variants that are actually interchangeable assuming that the symmetry is observed (ie: use tape archives on one end, implies the use of tape archives on the other end).

The following assumptions are made for the table:

DIRECTORY is a placeholder for a directory path like /mnt/data,
HOST is a host name,
PORT is a port number,
FILE is a path to a file or a file name

The table is then read off from left to right up to the first "Networking" which represents another higher level abstraction representing the machine that is sending the files, then, the rest of the table is read to the right starting from the next "Networking" and all the way to the right which represents the machine that is receiving the files. In order to construct commands for both the left-hand-side and the right-hand-side systems, simply pick some from the categories marked by the table headers, ie:

pick a snippet that reads a directory,
pick one compression snippet,
pick one encryption snippet,
pick one networking snippet

and construct the command by concatenating the categories using the pipe operator | in order to obtain the sending command. Then, continue reading the table by:

pick one networking snippet,
pick one decryption snippet,
pick one decompression snippet,
pick one file-writing snippet

and concatenate them over the pipe operator | in order to obtain the command that will be ran on the receiving machine. The last column should be omitted because it contains a reference to the description of the command-tuplet.

One way to read this table is to see each column as the tumbler on a combination lock with "selection wheels" for each column of the table but with the extra requirement that the right-hand-side should match the left-hand-side symmetrically. Additionally, some commands slurp multiple operations into each other, so for example, the tape archiver tar can use zlib or bzip2 encryption using flags symmetrically such that the "compression", respectively "decompression" column can be skipped as it is already performed by the first "reading", respectively "writing directory" column.

Reading (Current) Directory	Compression	Encryption	Networking	Networking	Decryption	Decompression	Write to (Current Directory)	Notes
`tar -cvf - -C DIRECTORY .`	`lzma -z`	`openssl enc -aes128 -k "password"`	`nc -q 1 HOST PORT`	`nc -l -p PORT`	`openssl enc -d -aes128 -k "password"`	`lzma -d`	`tar -xvf -`	*
`tar -zcvf - -C DIRECTORY .`						`tar -zxvf -`
`tar -jcvf - -C DIRECTORY .`						`tar -xvf -`		`-j` adds bzip2 compression (stronger but more time-consuming than `-z` zlib)
`find . -type f \| cpio -o`	`gzip -1`					`gunzip`	`cpio -v -i -d`	*
`find . -type f \| cpio -o`	`pigz -1 -c`					`pigz -1 -d`		`pigz` speeds up compression using parallelism
`dd bs=8M if=FILE`							`dd bs=8M of=FILE`	`bs=8M` can be tuned according to the network
			`>/dev/tcp/HOST/PORT`					not symmetric, a bashism in case netcat is not available
		`ncat –ssl HOST PORT`		`ncat –ssl -l -p PORT`				netcat re-implementation from NMAP with SSL
	`ssh -C HOST`							not symmetric, assumes `sshd` is running on `HOST`

Notes

due to how raw TCP/IP works, there is no terminator for a session with protocols such as MQTT building a thin layer on top of TCP/IP in order to "at least" have a notion when a message was sent and it is time to terminate, such that the -q 1 parameter is passed to the listening netcat endpoint in order to make netcat close the connection; otherwise the connection will just hang and there will be no discernible termination of the transfer,
there are many implementations of netcat, with ncat from NMAP being the most advanced having built-in SSL; however all the examples should work with the traditional netcat

Table of Contents