Différences entre les versions de « Tutorial:Beautiful MediaWiki URLs »

De Wiki Seb35
Aller à la navigation Aller à la recherche
(conclusion)
(polishing MW part)
Ligne 1 : Ligne 1 :
This page is about configurating beautiful, pure URLs for MediaWiki, basically without any visible "index.php". The reasoning is that this syntax "index.php" is not interesting from the user’s point of view, and it’s ugly, so it shouldn’t be there.
This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:
<div style="border: 1px dotted black; margin-left: 2em; padding: 0.5em;">
:https://wiki.seb35.fr/Learning:Beautiful_MediaWiki_URLs?action=edit
''instead of''
:https://wiki.seb35.fr/index.php?title=Learning:Beautiful_MediaWiki_URLs&action=edit
</div>


This wiki implements these rules. Technically, this comes from [[:enwikipedia:Common Gateway Interface|CGI scripts]], and this have to be taken into account to configure the server. Thereafter is the configuration to achieve that with [[:enwikipedia:nginx|nginx]].
and it is even worse in languages with non-latin alphabets:
<div style="border: 1px dotted black; margin-left: 2em; padding: 0.5em;">
:https://wiki.seb35.fr/维基百科?action=history
''instead of''
:https://wiki.seb35.fr/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history
</div>
 
 
== Rationale and introduction ==
 
The reasoning behind this is: The syntax "index.php" is not interesting from the user’s point of view so it shouldn’t be there; additionally the user should (want) to see understandable page names instead of %-encoded nonsense.
 
You can see on this wiki these rules in action. Technically, these ugly URLs come from good’ ol’ time’ of [[:enwikipedia:Common Gateway Interface|CGI scripts]] (ten years ago), but now —and since some time— it is possible to rewrite URLs on the server, added to the fact that all recent browsers display URLs with characters instead of %-encoded URLs, but still not in the parameters of the URL after the "?". So it’s time to remove this "index.php".
 
To reconciliate MediaWiki URLs with recent browsers, we take advantage of the "path info" capability offered by servers (here nginx) and we configure MediaWiki such that it vastly limit the rendering of ugly URLs.


== Prerequisites ==
== Prerequisites ==


* A server
* [https://www.mediawiki.org MediaWiki], installed in the directory, say, '/var/www/mediawiki'
* MediaWiki, in the directory, say, '/mediawiki'
* [http://nginx.com nginx], preferably with the module Lua (package nginx-extras in Debian/Ubuntu)
* nginx, preferably with the module Lua


== MediaWiki configuration ==
== MediaWiki configuration ==


In the file LocalSettings.php, add the following directives at the end.
It is assumed the wiki is directly served from the server root location instead of a subdirectory (e.g. <nowiki>"https://example.org/" instead of "https://example.org/tools/mywiki/"</nowiki>). In other words, the "root" directive in the nginx server is the directory itself where is installed MediaWiki (where index.php is).
 
The following configuration variables must be added at the end of the file LocalSettings.php, in MediaWiki directory.


Remove the "script path" to remove all subdirectories.
$wgScriptPath = '';


Set the article path to its simplest form. With this, links to view the articles will be beautiful.
1. Remove the "script path" to remove all web-visible subdirectories. This should be done accordingly with nginx configuration, or it could break the installed website.
$wgScriptPath = <nowiki>''</nowiki>;
 
 
2. Set the article path to its simplest form. With this, links leading to the 'view' action of the pages will be beautiful, instead of using "index.php". The server should support the "path info" capability —it’s the case of nginx.
  $wgArticlePath = '/$1';
  $wgArticlePath = '/$1';


Now, true improvements begin. When you view an article, the URL is beautiful, but it becomes again ugly when you edit an article or ask its printable form. So let’s remove the "index.php".
$wgScript = '';


The links for the actions have the form "/Main_Page?title=Main_Page&action=edit" for the edit action. You can remark the title is displayed twice. To remove this, we have to add a small hook in LocalSettings.php to remove the "title=" parameter.
3. Now, true improvements begin. When you view an article, the URL is beautiful, but it becomes again ugly when you edit an article or ask its printable form. So let’s remove the "index.php".
  $wgHooks['GetLocalURL::Internal'][] = 'urlPhpAgnostic';
$wgScript = <nowiki>''</nowiki>;
  function urlPhpAgnostic( &$title, &$url, $query ) {
 
 
4. The links for the actions now have the form "/Main_Page?title=Main_Page&action=edit" for the edit action. There is still the "title=" parameter which should be removed since it is already displayed in the main part of the URL. To achieve that, we have to add a small hook in LocalSettings.php to remove the "title=" parameter.
  $wgHooks['GetLocalURL::Internal'][] = 'phpAgnosticURL';
  function phpAgnosticURL( $title, &$url, $query ) {
     global $wgArticlePath;
     global $wgArticlePath;
      
      
Ligne 33 : Ligne 57 :
     }
     }
      
      
     if ( $query != '' ) {
     if ( $query != <nowiki>''</nowiki> ) {
         $url = wfAppendQuery( $url, $query );
         $url = wfAppendQuery( $url, $query );
     }
     }
Ligne 47 : Ligne 71 :
     server_name my-wiki.example;
     server_name my-wiki.example;
      
      
     root /mediawiki;
     root /var/www/mediawiki;
      
      
     # 'location' directives to be added here
     # 'location' directives to be added here
Ligne 60 : Ligne 84 :
Next, the more important thing: the call to index.php to catch all actions and transmit them to MediaWiki.
Next, the more important thing: the call to index.php to catch all actions and transmit them to MediaWiki.
  location / {
  location / {
     if ($arg_title != '') {
     if ($arg_title != <nowiki>''</nowiki>) {
         set_by_lua $mwredirect '
         set_by_lua $mwredirect '
             local args = ngx.var.args
             local args = ngx.var.args

Version du 27 novembre 2014 à 10:21

This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:

https://wiki.seb35.fr/Learning:Beautiful_MediaWiki_URLs?action=edit

instead of

https://wiki.seb35.fr/index.php?title=Learning:Beautiful_MediaWiki_URLs&action=edit

and it is even worse in languages with non-latin alphabets:

https://wiki.seb35.fr/维基百科?action=history

instead of

https://wiki.seb35.fr/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history


Rationale and introduction

The reasoning behind this is: The syntax "index.php" is not interesting from the user’s point of view so it shouldn’t be there; additionally the user should (want) to see understandable page names instead of %-encoded nonsense.

You can see on this wiki these rules in action. Technically, these ugly URLs come from good’ ol’ time’ of CGI scripts (ten years ago), but now —and since some time— it is possible to rewrite URLs on the server, added to the fact that all recent browsers display URLs with characters instead of %-encoded URLs, but still not in the parameters of the URL after the "?". So it’s time to remove this "index.php".

To reconciliate MediaWiki URLs with recent browsers, we take advantage of the "path info" capability offered by servers (here nginx) and we configure MediaWiki such that it vastly limit the rendering of ugly URLs.

Prerequisites

  • MediaWiki, installed in the directory, say, '/var/www/mediawiki'
  • nginx, preferably with the module Lua (package nginx-extras in Debian/Ubuntu)

MediaWiki configuration

It is assumed the wiki is directly served from the server root location instead of a subdirectory (e.g. "https://example.org/" instead of "https://example.org/tools/mywiki/"). In other words, the "root" directive in the nginx server is the directory itself where is installed MediaWiki (where index.php is).

The following configuration variables must be added at the end of the file LocalSettings.php, in MediaWiki directory.


1. Remove the "script path" to remove all web-visible subdirectories. This should be done accordingly with nginx configuration, or it could break the installed website.

$wgScriptPath = '';


2. Set the article path to its simplest form. With this, links leading to the 'view' action of the pages will be beautiful, instead of using "index.php". The server should support the "path info" capability —it’s the case of nginx.

$wgArticlePath = '/$1';


3. Now, true improvements begin. When you view an article, the URL is beautiful, but it becomes again ugly when you edit an article or ask its printable form. So let’s remove the "index.php".

$wgScript = '';


4. The links for the actions now have the form "/Main_Page?title=Main_Page&action=edit" for the edit action. There is still the "title=" parameter which should be removed since it is already displayed in the main part of the URL. To achieve that, we have to add a small hook in LocalSettings.php to remove the "title=" parameter.

$wgHooks['GetLocalURL::Internal'][] = 'phpAgnosticURL';
function phpAgnosticURL( $title, &$url, $query ) {
    global $wgArticlePath;
    
    $url = str_replace( '$1', wfUrlencode( $title->getPrefixedDBkey() ), $wgArticlePath );
    
    while ( preg_match( '/^(.*&|)title=([^&]*)(&.*|)$/', $query, $matches ) ) {
        $query = $matches[1] . $matches[3];
    }
    
    if ( $query != '' ) {
        $url = wfAppendQuery( $url, $query );
    }
}

Nginx configuration

The basic skeleton is just below. We will then add 'location' directives.

server {
    
    listen 80;
    listen 443 ssl;
    server_name my-wiki.example;
    
    root /var/www/mediawiki;
    
    # 'location' directives to be added here
    
}

First, let’s add simple things: backend PHP scripts (I don’t search to hide them, they are backend so not viewed by human users).

location ~ ^/(api|load|opensearch_desc)\.php$ {
    include php5-fpm.conf;
}

Next, the more important thing: the call to index.php to catch all actions and transmit them to MediaWiki.

location / {
    if ($arg_title != '') {
        set_by_lua $mwredirect '
            local args = ngx.var.args
            local arg_title = ngx.var.arg_title
            arg_title = ngx.re.gsub(arg_title, "%3A", ":")
            arg_title = ngx.re.gsub(arg_title, "%20", "_")
            arg_title = ngx.re.gsub(arg_title, "[+]", "_")
            local url = "/" .. arg_title
            newargs, n, err = ngx.re.gsub(args, [[\btitle=[^&]*&?]], "", "jo")
            if string.len(newargs) > 0 then
                url = url .. "?" ..newargs
            end
            if string.sub(url, -1) == "&" then
                url = string.sub(url, 1, -2)
            end
                return url
        ';
        return 301 $mwredirect;
    }
    include php5-fpm.conf;
    fastcgi_split_path_info ^(/)(.*)$;
    fastcgi_param  SCRIPT_FILENAME  $document_root/index.php;
    fastcgi_param  PATH_INFO        $fastcgi_path_info;
}

The normal scenario, at the end of the location /, split the first part of the URL (only the slash) and the title of the article, transmitted to PHP/MediaWiki via the FastCGI parameter PATH_INFO. The first part of the location / rewrite remaining "title=" which can still be found in MediaWiki in the forms (Special:Recentchanges, Special:Log, etc.) -- AFAIK there is no way to improve this directly in MediaWiki without modifying the core (which is bad because it adds work for the updates).

Now, you should have a quite functional wiki, apart some missing images. Let’s add the UI images from the /skins subdirectory and the /extensions subdirectory at the same time (some extensions have JavaScript files or image files in their directories). Here are allowed only: the JS files, the images, the CSS and LESS files, and .htc files (used by IE).

location ~ ^/(extensions|skins)/ {
    location ~ \.(js|css|htc|less|png|svg|gif|jpg|xcf)$ {
        allow all;
        try_files $uri =403;
    }
    deny all;
}

Now we can allow integrated files of the /images subdirectory. Below are two variants depending if your wiki is public or private.

location ~ ^/images/ {
    # Publicly accessible files
    location ~ ^/images/(\.htaccess|README|lockdir.*)$ {
        return 404;
    }
    location ~ /$ {
        deny all;
    }
    try_files $uri =404;
    
    # Private files for a private wiki
    #include php5-fpm.conf;
    #fastcgi_split_path_info ^(/images/)(.*)$;
    #fastcgi_param  SCRIPT_FILENAME  $document_root/img_auth.php;
    #fastcgi_param  PATH_INFO        $fastcgi_path_info;
}

To finish, we can add /robots.txt and /favicon.ico if they are not managed by MediaWiki extensions.

location ~ ^/(robots.txt|favicon.ico)$ {
    try_files $uri =404;
}

Conclusion

  1. Now the URLs are "/Main_Page", "/Main_Page?action=history", "/Main_Page?printable=yes", etc.
  2. Your articles can have a wide range of titles and are not limited by system files, e.g. you can write an article about "/Robots.txt", "/Images", "/Skins", or "Index.php" (NB: the initial capital).
  3. Apart whitelisted system files and subdirectories, nobody can access underlying system files, e.g. "/includes/Title.php" is considered as an article, as well as "/LocalSettings.php", so an attacker doesn’t have a direct access to the system files, s/he has to deal with MediaWiki before attacking PHP files.