Tutorial:Beautiful MediaWiki URLs

De Wiki Seb35
Révision datée du 26 août 2021 à 23:39 par Seb35 (discussion | contributions) (A retiré la protection de « Tutorial:Beautiful MediaWiki URLs »)
(diff) ← Version précédente | Voir la version actuelle (diff) | Version suivante → (diff)
Aller à la navigation Aller à la recherche

Langue : en
Statut : stable

This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:

http://wiki.seb35.fr/Tutorial:Beautiful_MediaWiki_URLs?action=edit

instead of

http://wiki.seb35.fr/index.php?title=Tutorial:Beautiful_MediaWiki_URLs&action=edit

and it is even better in languages with non-latin alphabets:

http://wiki.seb35.fr/维基百科?action=history

instead of

http://wiki.seb35.fr/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history


Rationale and introduction

The reasoning behind this is: The syntax "index.php" is not interesting from the user’s point of view so it shouldn’t be there; additionally the user should (want) to see understandable page names instead of %-encoded nonsense.

You can see on this wiki these rules in action. Technically, these ugly URLs come from good’ ol’ time’ of CGI scripts (ten years ago), but now —and since some time— it is possible to rewrite URLs on the server, added to the fact that all recent browsers display URLs with characters instead of %-encoded URLs, but still not in the parameters of the URL after the "?". So it’s time to remove this "index.php".

To reconciliate MediaWiki URLs with recent browsers, we take advantage of the "path info" capability offered by servers (here nginx) and we configure MediaWiki such that it vastly limit the rendering of ugly URLs.

Prerequisites

  • MediaWiki, installed in the directory, say, '/var/www/mediawiki'
  • nginx, preferably with the module Lua (package nginx-extras in Debian/Ubuntu)
  • The wiki is assumed to be installed on "http://wiki.example.org/"
  • The server must support the PATH_INFO directive; it is the case for most "nginx + PHP gateway" but you should really check it works before continuing, see Manual:$wgUsePathInfo on MediaWiki.org.

MediaWiki configuration

It is assumed the wiki is directly served from the server root location instead of a subdirectory (e.g. "https://example.org/" instead of "https://example.org/tools/mywiki/"). In other words, the "root" directive in the nginx server is the directory itself where is installed MediaWiki (where index.php is).

The following configuration variables must be added at the end of the file LocalSettings.php, in MediaWiki directory.


1. Remove the "script path" to remove all web-visible subdirectories. This should be done accordingly with nginx configuration, or it could break the installed website.

$wgScriptPath = '';


2. Set the article path to its simplest form. With this, links leading to the 'view' action of the pages will be beautiful, instead of using "index.php". The server should support the "path info" capability —it’s the case of nginx.

$wgArticlePath = '/$1';


3. Now, true improvements begin. When you view an article, the URL is beautiful, but it becomes again ugly when you edit an article or ask its printable form. So let’s remove the "index.php".

$wgScript = '';


4. The links for the actions now have the form "/Main_Page?title=Main_Page&action=edit" for the edit action. There is still the "title=" parameter which should be removed since it is already displayed in the main part of the URL. To achieve that, we have to add a small hook in LocalSettings.php to remove the "title=" parameter.

$wgHooks['GetLocalURL::Internal'][] = 'phpAgnosticURL';
function phpAgnosticURL( $title, &$url, $query ) {
    global $wgArticlePath;
    
    $url = str_replace( '$1', wfUrlencode( $title->getPrefixedDBkey() ), $wgArticlePath );
    
    while ( preg_match( '/^(.*&|)title=([^&]*)(&.*|)$/', $query, $matches ) ) {
        $query = $matches[1] . $matches[3];
    }
    
    if ( $query != '' ) {
        $url = wfAppendQuery( $url, $query );
    }
}

Basically, MediaWiki now generates beautiful URLs but nginx no more understand the URLs where there is no "title=" parameter. In the next part, we will retablish the functioning, we will completely hide the PHP system files (this improves security), and by the way permit the creation of page with almost all possible titles (e.g. you will be able to create the page "Index.php" on the wiki; it is probably useless^^, but perhaps you can be interested by titles "Skins/…").

Nginx configuration

5. The basic skeleton is just below. 'location' directives will be added in the following.

server {
    
    listen 80;
    listen 443 ssl;
    server_name wiki.example.org;
    
    root /var/www/mediawiki;
    
    # 'location' directives to be added thereafter
    
}


6. First, let’s add simple things: backend PHP scripts. We don’t search to hide them, they are backend so not viewed by human users, but machines must have access to these scripts.

location ~ ^/(api|load|opensearch_desc)\.php$ {
    include php5-fpm.conf;
}


7. Next, the more important thing: the call to the hidden "index.php" to catch all actions and transmit them to MediaWiki. This use the PATH_INFO parameter of CGI scripts. Additionally, we add an exception for the situation where there is still a parameter "title="; it’s the case in MediaWiki forms (Special:Recentchanges, Special:Log, etc.) and it will make the whole system both backward-compatible and with beautiful URLs (a permanent redirect from the old URLs to the new URLs). The Lua rewriting part was done thanks to an agentzh’s message (thanks!).

location / {
    if ($arg_title != '') {
        set_by_lua $mwredirect '
            local args = ngx.var.args
            local arg_title = ngx.var.arg_title
            arg_title = ngx.re.gsub(arg_title, "%3A", ":")
            arg_title = ngx.re.gsub(arg_title, "%20", "_")
            arg_title = ngx.re.gsub(arg_title, "[+]", "_")
            local url = "/" .. arg_title
            newargs, n, err = ngx.re.gsub(args, [[\btitle=[^&]*&?]], "", "jo")
            if string.len(newargs) > 0 then
                url = url .. "?" ..newargs
            end
            if string.sub(url, -1) == "&" then
                url = string.sub(url, 1, -2)
            end
                return url
        ';
        return 301 $mwredirect;
    }
    include php5-fpm.conf;
    fastcgi_split_path_info ^(/)(.*)$;
    fastcgi_param  SCRIPT_FILENAME  $document_root/index.php;
    fastcgi_param  PATH_INFO        $fastcgi_path_info;
}


8. Now, you should have a quite functional wiki, apart some missing images. Let’s add the UI images from the /skins subdirectory and the /extensions subdirectory at the same time (some extensions have JavaScript files or image files in their directories). Here are only allowed: the images, the JS files, the CSS and LESS files, and the .htc files (used by IE).

location ~ ^/(extensions|skins)/ {
    location ~ \.(js|css|htc|less|png|svg|gif|jpg|xcf)$ {
        allow all;
        try_files $uri =403;
    }
    deny all;
}


9. Now we can allow integrated files of the /images subdirectory. Below are two variants depending if your wiki is public or private.

location ~ ^/images/ {
    # Publicly accessible files
    location ~ ^/images/(\.htaccess|README|lockdir.*)$ {
        return 404;
    }
    location ~ /$ {
        deny all;
    }
    try_files $uri =404;
    
    # Private files for a private wiki
    #include php5-fpm.conf;
    #fastcgi_split_path_info ^(/images/)(.*)$;
    #fastcgi_param  SCRIPT_FILENAME  $document_root/img_auth.php;
    #fastcgi_param  PATH_INFO        $fastcgi_path_info;
}


10. To finish, we can add /robots.txt and /favicon.ico.

location ~ ^/(robots.txt|favicon.ico)$ {
    try_files $uri =404;
}

Complements

a. If you cannot install the Lua module in nginx, directly-entered old URLs (coming from external places like emails, other websites, user bookmarks, etc.) will work but will be displayed as old URLs ("/index.php?title=Main_Page&action=edit"). Once the user click on some link on the wiki, s/he will see new URLs.


b. To avoid MediaWiki to create URLs with remaining "title=" parameters, MediaWiki core should be patched in many places: in all forms where there is a <input type="hidden" name="title" … /> HTML tag and possibly in other places. Possibly some bug should be created, at least to give an option to natively create beautiful URLs.


c. MediaWiki update maintenance: nothing from the MediaWiki configuration side, apart if you modify MediaWiki core (it is not recommanded to facilitate updates). Probably the changed configuration parameters will stay backward-compatible for some long time since most were introduced in pre 1.1.0 MediaWiki versions. Possibly some PHP scripts must be added in the point 6 in future MediaWiki releases, see the entry points in your Special:Version page or on Manual:Code#Access points on MediaWiki.org. Possibly you can want to change the files enumerated in point 9 (this list is better from a security point of view, but this requires maintenance over time).


d. The nginx configuration above was done with strict security considerations in order to remove all entry points to system files, but this requires maintenance over time. You can want to soften these security checks:

  1. add more PHP scripts in point 6, see Manual:Code#Access points on MediaWiki.org
  2. add more media files extensions in point 8, or even acess all files in the "extensions" and "skins" subdirectories, or at the contrary blacklist some file extensions (e.g. mainly .php files)
  3. remove the list of files in point 9, this is probably harmless from a security point of view


e. The list in point 10 could be changed depending of specific needs. For example, you could change the location of the favicon (see $wgFavicon), or you can add a file /sitemap.xml, etc. Additionally some MediaWiki extensions or core features propose to manage some "metadata system files" like /robots.txt or /sitemap.xml; if you want to use these features, you could have to create other "location" in nginx configuration with some wrapper to PHP file, similarly to the wrapper for "/images/" in the case of a private wiki.


f. As of MediaWiki 1.23, the entry points on the page Special:Version will show "[ ]" for "index.php", this can be considered a bug: the case where "index.php" is completely removed is not expected :)


g. An easier way to remove most of the index.php is to use $wgActionPaths. With this configuration setting, the 'edit' action on this page would be http://wiki.seb35.fr/edit/Tutorial:Beautiful_MediaWiki_URLs instead of http://wiki.seb35.fr/Tutorial:Beautiful_MediaWiki_URLs?action=edit. I have no opinion on what is better. It can be kept in mind that some browsers might remove the address part after the "?"; in these browsers, the action would or would not be displayed. Note : how does MediaWiki react with $wgActionPaths when there are (e.g.) "action=edit" and "section=0": is it "/edit/Title?section=0" or "/index.php?title=Title&action=edit&section=0"?

Conclusion

  1. Now the URLs are "/Main_Page", "/Main_Page?action=history", "/Main_Page?printable=yes", etc.
  2. Non-latin alphabets benefit of the "real characters" rendering of the browser in the URL
  3. The system is backward-compatible with existing URLs, and old URLs are rewritten if you have installed the Lua part in the nginx configuration.
  4. Your articles can have a wide range of titles and are not limited by system files, e.g. you can write articles about "Robots.txt", "Images", "Skins", or "Index.php" (NB: the initial capital).
  5. Except whitelisted system files and subdirectories, nobody can access to the underlying system files: e.g. "includes/Title.php" is considered as an article, as well as "LocalSettings.php", hence an attacker does not have a direct access to the system files and s/he will have to deal with MediaWiki before attacking PHP files.