HTTP URL/URI Rewrite Rules in OpenLink Virtuoso

Tim Haynes
OpenLink Virtuoso Weblog
5 min readDec 1, 2017

Why Rewrite URLs/URIs?

There are several reasons why you might want a web server to rewrite URLs.

  • Shorter, tidier URLs are more elegant and more memorable.
  • Security concerns may require masking invocation of web-applications.
  • Certain kinds of requests may require different handling (e.g., images may be served directly from a filesystem, while other requests may be handled by a script interpreter).
  • Old URL forms may need special handling when the underlying implementation changes (e.g., query variable names and/or acceptable values may change, but have clear mapping from old to new).
  • Search engines sometimes penalize pages with long query string URLs.

What are Rewrite Rules?

Rewrite Rules are a system for mapping, transforming, and redirecting requests received by a web server engine.

Virtuoso offers a web-based interface for configuration of rewrite rules, which are handled on a per vhost basis. (A vhost is a virtual host, which is a mechanism by which a single server at a single IP address can invisibly provide web services for multiple host or domain names.)

Each rewrite rule consists of three parts:

  1. a pattern against which to match incoming requests, with some criteria determining which aspects of the request should be matched and the order in which rules should apply
  2. the type of redirection to use
  3. a destination pattern (which may reference parts of the incoming request)

How do Rewrite Rules work in Virtuoso?

In the Virtuoso Conductor, found at http://{virtuoso-host}:{port}/conductor/, for instance, http://localhost:8890/conductor/, click to Web Application ServerVirtual Domains & Directories. Drill down to your chosen vhost. Within each vhost, any virtual directory definition may or may not contain rewrite rules; a vdir with rewrite rules is designated by a star beside the URL-rewrite action.

Example: Sponger Content Negotiation

The Virtuoso Sponger VAD package provides a script, description.vsp, that displays a view of a graph of data sponged from an upstream Web resource. Most URLs used to invoke this display have a local part that starts with the pattern /about/id/entity/, followed by a slightly modified IRI of the entity being displayed.

When generating these /id/ URLs, for ease of encoding, the Sponger strips the characters that separate the IRI scheme from the rest of the original IRI (authority [i.e., host & port] for HTTP/S IRIs; other schemes have other components) of the resource being consumed. For most URL schemes, the scheme separator is just a single colon character, : ; for HTTP- and HTTPS-scheme URLs, it’s three characters, colon-slash-slash, ://.

The screenshot below shows a rule that intercepts user agent requests for a view of the resource in a primarily machine-friendly though human-readable data format (RDF — as distinct from an HTML page for human consumption), and redirects them, with an HTTP 303 See Other code, to a URI that directly returns the desired data in that specific format and serialization.

Virtuoso rewrite rules for /about (part of Virtuoso Sponger)

Things of note:

  • The Pattern Type is set to REGEX, meaning a regular expression will be applied to the incoming URL.
  • The Request Path pattern regex /about/id/entity/(http|https|acct|mailto|webcal|feed|nodeID|ftp|di)/(.*) matches a literal that starts with /about/id/entity; followed by one of the listed URL schemes (http, https, acct, etc.); followed by a literal / character; and concludes with any number of characters to the end. The segments of the requested URL that match regex segments wrapped in parentheses ( ) are stored for later use by the rewrite system.
  • The Accept Header Request pattern regex checks the Accept header of the request against the listed MIME-types of RDF serializations — RDF/XML, RDF-N3, RDF-Turtle, etc.
  • The Destination Path format specifies that the destination path be constructed from the literal /about/data/entity/ (notice that the middle part changes the request’s “id” to “data”); concatenated with the first variable match from the request regex pattern (i.e., the URL scheme; designated here by$s1); followed by a literal /; and concluded with the second variable match from the request regex pattern (i.e., the rest of the requested URL — typically a host, optional port, and local-part, including query, if any; designated here by $s2).
  • In this example, we are dealing with URLs in the /about/id/entity space, which have been generated by the Sponger’s description.vsp.
  • The destination path’s references to the matches from the regex, $s1and $s2, used the s modifier, which means that HTTP URL encoding is not to be performed between the request and the destination; i.e., the matched strings will be passed through unmodified, exactly as they were in the request. In other rules (for instance, if feeding an entire request URL to another processing service within the destination URL), it might be useful to stipulate that the matched string be HTTP URL encoded (meaning, any reserved characters in the string will be replaced by %-codes, such as %40 replacing each at-sign, @, or %20 replacing each space character) for inclusion in the destination URL. This is done by using the U modifier instead of the s, i.e., $U1 and $U2 instead of $s1and $s2. Note: these modifiers are case-sensitive (lowercase s, uppercase U), and if the modifier is left out (e.g., $1), the U will be forced.
  • The HTTP Response Code is set to 303 See Other.
  • We do not need to include any additional HTTP Response Headers, so this field is intentionally left blank.

Rule Matching Order

Rules are matched in the order in which they are listed in the Admin UI (which may be controlled with the Up and Down buttons), as modified by the Rule matching strategy menu selection. The menu defaults to last matching, meaning it will test all rules for a match, but only act on the last-listed match.

For example, you could define several rules that all match the pattern /text/(.*), and set the destination URL to include the matching-order strategy expected to result in that destination, as shown below:

Testing rewrite rule order strategies

With all set as shown, a request for /test/something would be redirected with HTTP 302 Found to /test2/firstmatching1/.

If one were to disable the first-matching rules, then the normal order would take over.

If one were to further disable the normal order strategy rules, the request would be redirected to /test2/lastmatching1/.

--

--