Depending on your application, you may need to let anonymous users interact with your web services such that files will have to be written to the filesystem. It is a boilerplate case where user-input takes part in the decision where those files have to be written.
One typical scenario is a web-server where an anonymous user performs a GET
request such as:
http://server.tld/path/to/file.html
and your web-server will typically take a base-path, as the path to a document root, say:
C:\Web\docRoot
and combine the document root with the user-supplied path, in order to obtain:
C:\Web\docRoot\path\to\file.html
and be able to read the file.
Combining the local path to the document root and the user-supplied path is typically achieved with some sort of "path join" function depending on your application's API.
Nevertheless, it is entirely permitted for the anonymous user to supply a path such as:
http://server.tld/../../../../../../../../shadow-
such that given any document root path, for instance:
/var/www
the combined path would result in:
/var/www/../../../../../../../../shadow-
which is a semantically valid path for most common filesystems.
The result is that the path traversal will allow any anonymous user to access any file owned by the user that your web-server is currently running under. Furthermore, if your web-server is configured (or implements) directory listings, then anonymous users could potentially probe your filesystem by listing the contents of directories that your web-server user has access to.
There are "worse" scenarios, when, for instance, you have, say, a path that combines with an username in order to obtain the path to a log file. In that instance, users could arbitrarily write to files.
Obviously, given a document root, your application needs to read or write from and to files that are under the document root - such that access to any file outside the document root does not make any sense.
To judge on a basic example, consider the document root to be:
/var/www/website.tld
and that GET
requests of the form:
http://website.tld/css/style.css http://website.tld/html/errors/404.html
can be performed by anonymous users and that the document root path will have to be combined with the requests in order to achieve the filesystem path:
/var/www/website.tld/css/style.css /var/www/website.tld/html/errors/404.html
such that files can be read or written.
To prevent path injections of the form:
/var/www/website.tld/../../../../../etc/shadow-
The following standard mitigation steps have to be performed before attempting to read from the reading from the resulting combined path:
/var/www/website.tld/../../../../../etc/shadow-
./var/www/website.tld/../../../../../etc/shadow-
and then use a real path resolver to determine the resolved path. The keyword to look for in the API is usually realpath
that is typically a function or method that takes as parameter a path such as /var/www/website.tld/../../../../../etc/shadow-
and resolves it, in this case, to, say /etc/shadow-
by climbing up the filesystem tree and following parent directories (..
).You will now have the "real path" to the requested file:
/etc/shadow-
and you will also know your document root:
/var/www/website.tld
The next step in determining whether the requested file /etc/shadow-
is a child of your document root would be to:
/
character - although, it is much better to search the API for a path separator compile-time constant that is operating-system agnostic), in this case, the array would contain the following elements: var
, www
, website.tld
.etc
, shadow-
.Finally, you will have to loop sequentially over the array of document root path-parts and check that each element is equal to the corresponding element in the "real path" to the requested file. This loop will have to be performed until all the elements in the document root path-parts array have been exhausted. If at any point during the loop, a path part from the document root array is not equal to the path part of the "real path" to the requested file at the same index, then the file is not a child of the document root and you should deny access.
Here we list some examples just reasoning on the document root path and the "real" requested path without mentioning code.
Consider the following document root:
/var/www/website.tld
and the following GET
request performed by an anonymous user:
http://website.tld/css/../img/icon.png
Splitting the document root path into an array, you will obtain:
[ ''var'', ''www', ''website.tld'' ]
Combining the document root path with the requested path and retrieving the "real path", you will obtain the requested path:
[ ''var'', ''www'', ''website.tld'', ''img'', ''icon.png'' ]
You now compare the document root path with the requested path, element-by-element until no more elements remain in the document root path:
var
is the same as var
www
is the same as www
website.tld
is the same as website.tld
Since all the elements of the document root path array have been exhausted, you can conclude that the file lies within your document root.
Consider the following document root:
/var/www/website.tld
and the following GET
request performed by an anonymous user:
http://website.tld/css/../img/../../../icon.png
Splitting the document root path into an array, you will obtain:
[ ''var'', ''www', ''website.tld'' ]
Combining the document root path with the requested path and retrieving the "real path", you will obtain the requested path:
[ ''var'', ''icon.png'' ]
Comparing the document root path with "real" requested path:
var
is the same as var
www
is not the same as icon.png
You can now abort the loop and you know that the requested file lies outside the document root and you can deny access.
A weaker check would be to perform a "set equals" operation on the two arrays by checking that all the elements of the document root path array:
[ ''var'', ''www', ''website.tld'' ]
are contained within the "real" resolved path array:
[ ''var'', ''website.tld'', ''www'', ''icon.png'' ]
Although sets are by definition unordered, given a filesystem hierarchy, the order of the elements in the set is important. You can notice that in this example, all the elements of the document root path are contained within the "real" resolved path but that the "real" requested resolved path is still outside the document root path.
Do not perform a set-equals check: you need a point-wise sequential set comparison between the document root path and the "real" resolved requested path!
For the contact, copyright, license, warranty and privacy terms for the usage of this website please see the contact, license, privacy, copyright.