Tutorial:Beautiful MediaWiki URLs
This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:
- //wiki.seb35.fr/Learning:Beautiful_MediaWiki_URLs?action=edit
instead of
- //wiki.seb35.fr/index.php?title=Learning:Beautiful_MediaWiki_URLs&action=edit
and it is even worse in languages with non-latin alphabets:
- //wiki.seb35.fr/维基百科?action=history
instead of
- //wiki.seb35.fr/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history
Rationale and introduction
The reasoning behind this is: The syntax "index.php" is not interesting from the user’s point of view so it shouldn’t be there; additionally the user should (want) to see understandable page names instead of %-encoded nonsense.
You can see on this wiki these rules in action. Technically, these ugly URLs come from good’ ol’ time’ of CGI scripts (ten years ago), but now —and since some time— it is possible to rewrite URLs on the server, added to the fact that all recent browsers display URLs with characters instead of %-encoded URLs, but still not in the parameters of the URL after the "?". So it’s time to remove this "index.php".
To reconciliate MediaWiki URLs with recent browsers, we take advantage of the "path info" capability offered by servers (here nginx) and we configure MediaWiki such that it vastly limit the rendering of ugly URLs.
Prerequisites
- MediaWiki, installed in the directory, say, '/var/www/mediawiki'
- nginx, preferably with the module Lua (package nginx-extras in Debian/Ubuntu)
MediaWiki configuration
It is assumed the wiki is directly served from the server root location instead of a subdirectory (e.g. "https://example.org/" instead of "https://example.org/tools/mywiki/"). In other words, the "root" directive in the nginx server is the directory itself where is installed MediaWiki (where index.php is).
The following configuration variables must be added at the end of the file LocalSettings.php, in MediaWiki directory.
1. Remove the "script path" to remove all web-visible subdirectories. This should be done accordingly with nginx configuration, or it could break the installed website.
$wgScriptPath = '';
2. Set the article path to its simplest form. With this, links leading to the 'view' action of the pages will be beautiful, instead of using "index.php". The server should support the "path info" capability —it’s the case of nginx.
$wgArticlePath = '/$1';
3. Now, true improvements begin. When you view an article, the URL is beautiful, but it becomes again ugly when you edit an article or ask its printable form. So let’s remove the "index.php".
$wgScript = '';
4. The links for the actions now have the form "/Main_Page?title=Main_Page&action=edit" for the edit action. There is still the "title=" parameter which should be removed since it is already displayed in the main part of the URL. To achieve that, we have to add a small hook in LocalSettings.php to remove the "title=" parameter.
$wgHooks['GetLocalURL::Internal'][] = 'phpAgnosticURL'; function phpAgnosticURL( $title, &$url, $query ) { global $wgArticlePath; $url = str_replace( '$1', wfUrlencode( $title->getPrefixedDBkey() ), $wgArticlePath ); while ( preg_match( '/^(.*&|)title=([^&]*)(&.*|)$/', $query, $matches ) ) { $query = $matches[1] . $matches[3]; } if ( $query != '' ) { $url = wfAppendQuery( $url, $query ); } }
Nginx configuration
The basic skeleton is just below. We will then add 'location' directives.
server { listen 80; listen 443 ssl; server_name my-wiki.example; root /var/www/mediawiki; # 'location' directives to be added here }
First, let’s add simple things: backend PHP scripts (I don’t search to hide them, they are backend so not viewed by human users).
location ~ ^/(api|load|opensearch_desc)\.php$ { include php5-fpm.conf; }
Next, the more important thing: the call to index.php to catch all actions and transmit them to MediaWiki.
location / { if ($arg_title != '') { set_by_lua $mwredirect ' local args = ngx.var.args local arg_title = ngx.var.arg_title arg_title = ngx.re.gsub(arg_title, "%3A", ":") arg_title = ngx.re.gsub(arg_title, "%20", "_") arg_title = ngx.re.gsub(arg_title, "[+]", "_") local url = "/" .. arg_title newargs, n, err = ngx.re.gsub(args, [[\btitle=[^&]*&?]], "", "jo") if string.len(newargs) > 0 then url = url .. "?" ..newargs end if string.sub(url, -1) == "&" then url = string.sub(url, 1, -2) end return url '; return 301 $mwredirect; } include php5-fpm.conf; fastcgi_split_path_info ^(/)(.*)$; fastcgi_param SCRIPT_FILENAME $document_root/index.php; fastcgi_param PATH_INFO $fastcgi_path_info; }
The normal scenario, at the end of the location /, split the first part of the URL (only the slash) and the title of the article, transmitted to PHP/MediaWiki via the FastCGI parameter PATH_INFO. The first part of the location / rewrite remaining "title=" which can still be found in MediaWiki in the forms (Special:Recentchanges, Special:Log, etc.) -- AFAIK there is no way to improve this directly in MediaWiki without modifying the core (which is bad because it adds work for the updates).
Now, you should have a quite functional wiki, apart some missing images. Let’s add the UI images from the /skins subdirectory and the /extensions subdirectory at the same time (some extensions have JavaScript files or image files in their directories). Here are allowed only: the JS files, the images, the CSS and LESS files, and .htc files (used by IE).
location ~ ^/(extensions|skins)/ { location ~ \.(js|css|htc|less|png|svg|gif|jpg|xcf)$ { allow all; try_files $uri =403; } deny all; }
Now we can allow integrated files of the /images subdirectory. Below are two variants depending if your wiki is public or private.
location ~ ^/images/ { # Publicly accessible files location ~ ^/images/(\.htaccess|README|lockdir.*)$ { return 404; } location ~ /$ { deny all; } try_files $uri =404; # Private files for a private wiki #include php5-fpm.conf; #fastcgi_split_path_info ^(/images/)(.*)$; #fastcgi_param SCRIPT_FILENAME $document_root/img_auth.php; #fastcgi_param PATH_INFO $fastcgi_path_info; }
To finish, we can add /robots.txt and /favicon.ico if they are not managed by MediaWiki extensions.
location ~ ^/(robots.txt|favicon.ico)$ { try_files $uri =404; }
Conclusion
- Now the URLs are "/Main_Page", "/Main_Page?action=history", "/Main_Page?printable=yes", etc.
- Your articles can have a wide range of titles and are not limited by system files, e.g. you can write an article about "/Robots.txt", "/Images", "/Skins", or "Index.php" (NB: the initial capital).
- Apart whitelisted system files and subdirectories, nobody can access underlying system files, e.g. "/includes/Title.php" is considered as an article, as well as "/LocalSettings.php", so an attacker doesn’t have a direct access to the system files, s/he has to deal with MediaWiki before attacking PHP files.