URL rewrite

web2py has the ability to rewrite the URL path of incoming requests prior to calling the controller action (URL mapping), and conversely, web2py can rewrite the URL path generated by the URL function (reverse URL mapping). One reason to do this is for handling legacy URLs, another is to simplify paths and make them shorter.

web2py includes two distinct URL rewrite systems: an easy-to-use parameter-based system for most use cases, and a flexible pattern-based system for more complex cases. To specify the URL rewrite rules, create a new file in the “web2py” folder called routes.py (the contents of routes.py will depend on which of the two rewrite systems you choose, as described in the next two sections). The two systems cannot be mixed.

Notice that if you edit routes.py, you must reload it. This can be done in two ways: by restarting the web server or by clicking on the routes reload button in admin. If there is a bug in routes, they will not reload.

Parameter-based system

The parameter-based (parametric) router provides easy access to several “canned” URL-rewrite methods. Its capabilities include:

  • Omitting default application, controller and function names from externally-visible URLs (those created by the URL() function)
  • Mapping domains (and/or ports) to applications or controllers
  • Embedding a language selector in the URL
  • Removing a fixed prefix from incoming URLs and adding it back to outgoing URLs
  • Mapping root files such as /robots.txt to an applications static directory

The parametric router also provides somewhat more flexible validation of incoming URLs.

Suppose you’ve written an application called myapp and wish to make it the default, so that the application name is no longer part of the URL as seen by the user. Your default controller is still default, and you want to remove its name from user-visible URLs as well. Here’s what you put in routes.py:

  1. routers = dict(
  2. BASE = dict(default_application='myapp'),
  3. )

That’s it. The parametric router is smart enough to know how to do the right thing with URLs such as:

  1. http://domain.com/myapp/default/myapp

or

  1. http://domain.com/myapp/myapp/index

where normal shortening would be ambiguous. If you have two applications, myapp and myapp2, you’ll get the same effect, and additionally myapp2‘s default controller will be stripped from the URL whenever it’s safe (which is mostly all the time).

Here is another case: suppose you want to support URL-based languages, where your URLs look like this:

  1. http://myapp/en/some/path

or (rewritten)

  1. http://en/some/path

Here’s how:

  1. routers = dict(
  2. BASE = dict(default_application='myapp'),
  3. myapp = dict(languages=['en', 'it', 'jp'], default_language='en'),
  4. )

Now an incoming URL like this:

  1. http:/domain.com/it/some/path

will be routed to:

  1. /myapp/some/path

and request.uri_language will be set to ‘it’, so you can force the translation. You can also have language-specific static files.

  1. http://domain.com/it/static/filename

will be mapped to:

  1. applications/myapp/static/it/filename

if that file exists. If it doesn’t, then URLs like:

  1. http://domain.com/it/static/base.css

will still map to:

  1. applications/myapp/static/base.css

(because there is no static/it/base.css).

So you can now have language-specific static files, including images, if you need to. Domain mapping is supported as well:

  1. routers = dict(
  2. BASE = dict(
  3. domains = {
  4. 'domain1.com' : 'app1',
  5. 'domain2.com' : 'app2',
  6. }
  7. ),
  8. )

does what you’d expect.

  1. routers = dict(
  2. BASE = dict(
  3. domains = {
  4. 'domain.com:80' : 'app/insecure',
  5. 'domain.com:443' : 'app/secure',
  6. }
  7. ),
  8. )

maps http://domain.com accesses to the controller named insecure, while HTTPS accesses go to the secure controller. Alternatively, you can map different ports to different apps, in the obvious way.

For further information, please consult the file “routes.parametric.example.py” provided in the “examples” folder of the standard web2py distribution.

Note: The parameter-based system first appeared in web2py version 1.92.1.

Pattern-based system

Although the parameter-based system just described should be sufficient for most use cases, the alternative pattern-based system provides some additional flexibility for more complex cases. To use the pattern-based system, instead of defining routers as dictionaries of routing parameters, you define two lists (or tuples) of 2-tuples, routes_in and routes_out. Each tuple contains two elements: the pattern to be replaced and the string that replaces it. For example:

  1. routes_in = (
  2. ('/testme', '/examples/default/index'),
  3. )
  4. routes_out = (
  5. ('/examples/default/index', '/testme'),
  6. )

With these routes, the URL:

  1. http://127.0.0.1:8000/testme

is mapped into:

  1. http://127.0.0.1:8000/examples/default/index

To the visitor, all links to the page URL looks like /testme.

The patterns have the same syntax as Python regular expressions. For example:

  1. (r'.*\.php', '/init/default/index'),

maps all URLs ending in “.php” to the index page.

The second term of a rule can also be a redirection to another page:

  1. (r'.*\.php', '303->http://example.com/newpage'),

Here 303 is the HTTP code for the redirect response.

Sometimes you want to get rid of the application prefix from the URLs because you plan to expose only one application. This can be achieved with:

  1. routes_in = (
  2. ('/(?P<any>.*)', '/init/\g<any>'),
  3. )
  4. routes_out = (
  5. ('/init/(?P<any>.*)', '/\g<any>'),
  6. )

There is also an alternative syntax that can be mixed with the regular expression notation above. It consists of using $name instead of (?P<name>\w+) or \g<name>. For example:

  1. routes_in = (
  2. ('/$c/$f', '/init/$c/$f'),
  3. )
  4. routes_out = (
  5. ('/init/$c/$f', '/$c/$f'),
  6. )

would also eliminate the “/init” application prefix in all URLs.

Using the $name notation, you can automatically map routes_in to routes_out, provided you don’t use any regular expressions. For example:

  1. routes_in = (
  2. ('/$c/$f', '/init/$c/$f'),
  3. )
  4. routes_out = [(x, y) for (y, x) in routes_in]

If there are multiple routes, the first to match the URL is executed. If no pattern matches, the path is left unchanged.

You can use $anything to match anything (.*) until the end of the line.

Here is a minimal “routes.py” for handling favicon and robots requests:

  1. routes_in = (
  2. (r'/favicon\.ico', '/examples/static/favicon.ico'),
  3. (r'/robots\.txt', '/examples/static/robots.txt'),
  4. )
  5. routes_out = ()

Here is a more complex example that exposes a single app “myapp” without unnecessary prefixes but also exposes admin, appadmin and static:

  1. routes_in = (
  2. ('/admin/$anything', '/admin/$anything'),
  3. ('/static/$anything', '/myapp/static/$anything'),
  4. ('/appadmin/$anything', '/myapp/appadmin/$anything'),
  5. ('/favicon.ico', '/myapp/static/favicon.ico'),
  6. ('/robots.txt', '/myapp/static/robots.txt'),
  7. )
  8. routes_out = [(x, y) for (y, x) in routes_in[:-2]]

The general syntax for routes is more complex than the simple examples we have seen so far. Here is a more general and representative example:

  1. routes_in = (
  2. (r'140\.191\.\d+\.\d+:https?://www\.web2py\.com:post /(?P<any>.*)\.php',
  3. '/test/default/index?vars=\g<any>'),
  4. )

It maps http or https POST requests (note lower case “post”) to host www.web2py.com from a remote IP matching the regular expression

  1. r'140\.191\.\d+.\d+'

requesting a page matching the regular expression

  1. r'/(?P<any>.*)\.php'

into

  1. '/test/default/index?vars=\g<any>'

where \g<any> is replaced by the matching regular expression.

The general syntax is

  1. '[remote address]:[protocol]://[host]:[method] [path]'

If the first section of the pattern (all but [path]) is missing, web2py provides a default:

  1. '.*?:https?://[^:/]+:[a-z]+'

The entire expression is matched as a regular expression, so “.” must be escaped and any matching subexpression can be captured using (?P<...>...) using Python regex syntax. The request method (typically GET or POST) must be lower case. The URL being matched has had any %xx escapes unquoted.

This allows to reroute requests based on the client IP address or domain, based on the type of the request, on the method, and the path. It also allows web2py to map different virtual hosts into different applications. Any matched subexpression can be used to build the target URL and, eventually, passed as a GET variable.

All major web servers, such as Apache and lighttpd, also have the ability to rewrite URLs. In a production environment that may be an option instead of routes.py. Whatever you decide to do we strongly suggest that you do not hardcode internal URLs in your app and use the URL function to generate them. This will make your application more portable in case routes should change.

Application-Specific URL rewrite

When using the pattern-based system, an application can set its own routes in an application-specific routes.py file located in the applications base folder. This is enabled by configuring routes_app in the base routes.py to determine from an incoming URL the name of the application to be selected. When this happens, the application-specific routes.py is used in place of the base routes.py.

The format of routes_app is identical to routes_in, except that the replacement pattern is simply the application name. If applying routes_app to the incoming URL does not result in an application name, or the resulting application-specific routes.py is not found, the base routes.py is used as usual.

Note: routes_app first appeared in web2py version 1.83.

Default application, controller, and function

When using the pattern-based system, the name of the default application, controller, and function can be changed from init, default, and index respectively to another name by setting the appropriate value in routes.py:

  1. default_application = "myapp"
  2. default_controller = "admin"
  3. default_function = "start"

Note: These items first appeared in web2py version 1.83.

Routes on error

You can also use routes.py to re-route requests to special actions in case there is an error on the server. You can specify this mapping globally, for each app, for each error code, or for each app and error code. Here is an example:

  1. routes_onerror = [
  2. ('init/400', '/init/default/login'),
  3. ('init/*', '/init/static/fail.html'),
  4. ('*/404', '/init/static/cantfind.html'),
  5. ('*/*', '/init/error/index')
  6. ]

For each tuple, the first string is matched against “[app name]/[error code]“. If a match is found, the failed request is re-routed to the URL in the second string of the matching tuple. If the error handling URL is a not a static file, the following GET variables will be passed to the error action:

  • code: the HTTP status code (e.g., 404, 500)
  • ticket: in the form of “[app name]/[ticket number]“ (or “None” if no ticket)
  • requested_uri: equivalent to request.env.request_uri
  • request_url: equivalent to request.url

These variables will be accessible to the error handling action via request.vars and can be used in generating the error response. In particular, it is a good idea for the error action to return the original HTTP error code instead of the default 200 (OK) status code. This can be done by setting response.status = request.vars.code. It is also possible to have the error action send (or queue) an email to an administrator, including a link to the ticket in admin.

Unmatched errors display a default error page. This default error page can also be customized here (see “routes.parametric.example.py” and “routes.patterns.example.py” in the “examples” folder):

  1. error_message = '<html><body><h1>%s</h1></body></html>'
  2. error_message_ticket = '''<html><body><h1>Internal error</h1>
  3. Ticket issued: <a href="/admin/default/ticket/%(ticket)s"
  4. target="_blank">%(ticket)s</a></body></html>'''

The first variable contains the error message when an invalid application or function is requested. The second variable contains the error message when a ticket is issued.

routes_onerror work with both routing mechanisms.

In “routes.py” you can also specify an action in charge of error handling:

  1. error_handler = dict(application='error',
  2. controller='default',
  3. function='index')

If the error_handler is specified the action is called without user redirection and the handler action will be in charge of dealing with the error. In the event that the error-handling page itself returns an error, web2py will fall back to its old static responses.

Static asset management

Since version 2.1.0, web2py has the ability to manage static assets.

When an application is in development, static file can change often, therefore web2py sends static files with no cache headers. This has the side-effect of “forcing” the browser to request static files at every request. This results in low performance when loading the page.

In a “production” site, you may want to serve static files with cache headers to prevent un-necessary downloads since static files do not change.

cache headers allow the browser to fetch each file only once, thus saving bandwidth and reducing loading time.

Yet there is a problem: What should the cache headers declare? When should the files expire? When the files are first served, the server cannot forecast when they will be changed.

A manual approach consists of creating subfolders for different versions of static files. For example an early version of “layout.css” can be made available at the URL “/myapp/static/css/1.2.3/layout.css”. When you change the file, you create a new subfolder and you link it as “/myapp/static/css/1.2.4/layout.css”.

This procedure works but it is pedantic since every time you update the css file, you must remember to move it to another folder, change the URL of the file in your layout.html and deploy.

Static asset management solves the problem by allowing the developer to declare a version for a group of static files and they will be requested again only when the version number changes. The asset version number is made part of the file url as in the previous example. The difference from the previous approach is that the version number only appears in the URL, not in the file system.

If you want to serve “/myapp/static/layout.css” with the cache headers, you just need to include the file with a modified URL that includes a version number:

  1. /myapp/static/_1.2.3/layout.css

(notice the URL defines a version number, it does not appear anywhere else).

Notice that the URL starts with “/myapp/static/“, followed by a version number composed by an underscore and 3 integers separated by a period (as described in SemVer), then followed by the filename. Also notice that you do not have to create a “_1.2.3/“ folder.

Every time the static file is requested with a version in the url, it will be served with “far in the future” cache headers, specifically:

  1. Cache-Control : max-age=315360000
  2. Expires: Thu, 31 Dec 2037 23:59:59 GMT

This means that the browser will fetch those files only once, and they will be saved “forever” in the browser’s cache.

Every time the “_1.2.3/filename” is requested, web2py will remove the version part from the path and serve your file with far in the future headers so they will be cached forever. If you changed the version number in the URL, this tricks the browser into thinking it is requesting a different file, and the file is fetched again.

You can use “_1.2.3”, “_0.0.0”, “_999.888.888”, as long as the version starts with underscore followed by three numbers separated by period.

When in development, you can use response.files.append(...) to link the static URLs of static files. In this case you can include the “_1.2.3/“ part manually, or you take advantage of a new parameter of the response object: response.static_version. Just include the files the way you used to, for example

  1. {{response.files.append(URL('static', 'layout.css'))}}

and in models set

  1. response.static_version = '1.2.3'

This will rewrite automatically every “/myapp/static/layout.css” url as “/myapp/static/_1.2.3/layout.css”, for every file included in response.files.

In models, you can also set

  1. response.static_version_urls = True

to rewrite any link to the static directory, aside from just those included in response.files.

Often in production you let the webserver (apache, nginx, etc.) serve the static files. You need to adjust your configuration in such a way that it will “skip” the “_1.2.3/“ part.

For example, in Apache, change this:

  1. AliasMatch ^/([^/]+)/static/(.*) /home/www-data/web2py/applications/$1/static/$2

into this:

  1. AliasMatch ^/([^/]+)/static/(?:_[\d]+.[\d]+.[\d]+/)?(.*) /home/www-data/web2py/applications/$1/static/$2

Similarly, in Nginx change this:

  1. location ~* /(\w+)/static/ {
  2. root /home/www-data/web2py/applications/;
  3. expires max;
  4. }

into this:

  1. location ~* /(\w+)/static(?:/_[\d]+.[\d]+.[\d]+)?/(.*)$ {
  2. alias /home/www-data/web2py/applications/$1/static/$2;
  3. expires max;
  4. }