Différences entre les versions de « Tutorial:Beautiful MediaWiki URLs »

De Wiki Seb35
Aller à la navigation Aller à la recherche
m (server variable)
m (A retiré la protection de « Tutorial:Beautiful MediaWiki URLs »)
 
(14 versions intermédiaires par le même utilisateur non affichées)
Ligne 1 : Ligne 1 :
{{Tutorial|lang=en|status=stable}}
This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:
This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:
<div style="border: 1px dotted black; margin-left: 2em; padding: 0.5em;">
<div style="border: 1px dotted black; margin-left: 2em; padding: 0.5em;">
:{{SERVER}}/Learning:Beautiful_MediaWiki_URLs?action=edit
:[{{SERVER}}/{{FULLPAGENAMEE}}?action=edit http://{{SERVERNAME}}/{{FULLPAGENAMEE}}?action=edit]
''instead of''
''instead of''
:{{SERVER}}/index.php?title=Learning:Beautiful_MediaWiki_URLs&action=edit
:[{{SERVER}}/index.php?title={{FULLPAGENAMEE}}&action=edit http://{{SERVERNAME}}/index.php?title={{FULLPAGENAMEE}}&action=edit]
</div>
</div>


and it is even worse in languages with non-latin alphabets:
and it is even better in languages with non-latin alphabets:
<div style="border: 1px dotted black; margin-left: 2em; padding: 0.5em;">
<div style="border: 1px dotted black; margin-left: 2em; padding: 0.5em;">
:{{SERVER}}/维基百科?action=history
:[{{SERVER}}/维基百科?action=history http://{{SERVERNAME}}/维基百科?action=history]
''instead of''
''instead of''
:{{SERVER}}/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history
:[{{SERVER}}/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history http://{{SERVERNAME}}/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history]
</div>
</div>


Ligne 26 : Ligne 27 :
* [https://www.mediawiki.org MediaWiki], installed in the directory, say, '/var/www/mediawiki'
* [https://www.mediawiki.org MediaWiki], installed in the directory, say, '/var/www/mediawiki'
* [http://nginx.com nginx], preferably with the module Lua (package nginx-extras in Debian/Ubuntu)
* [http://nginx.com nginx], preferably with the module Lua (package nginx-extras in Debian/Ubuntu)
* The wiki is assumed to be installed on "<nowiki>http://wiki.example.org/</nowiki>"
* The server must support the PATH_INFO directive; it is the case for most "nginx + PHP gateway" but you should really check it works before continuing, see [https://www.mediawiki.org/Special:MyLanguage/Manual:$wgUsePathInfo Manual:$wgUsePathInfo] on MediaWiki.org.


== MediaWiki configuration ==
== MediaWiki configuration ==
Ligne 61 : Ligne 64 :
     }
     }
  }
  }
Basically, MediaWiki now generates beautiful URLs but nginx no more understand the URLs where there is no "title=" parameter. In the next part, we will retablish the functioning, we will completely hide the PHP system files (this improves security), and by the way permit the creation of page with almost all possible titles (e.g. you will be able to create the page "Index.php" on the wiki; it is probably useless^^, but perhaps you can be interested by titles "Skins/…").


== Nginx configuration ==
== Nginx configuration ==


The basic skeleton is just below. We will then add 'location' directives.
5. The basic skeleton is just below. 'location' directives will be added in the following.
  server {
  server {
      
      
     listen 80;
     listen 80;
     listen 443 ssl;
     listen 443 ssl;
     server_name my-wiki.example;
     server_name wiki.example.org;
      
      
     root /var/www/mediawiki;
     root /var/www/mediawiki;
      
      
     # 'location' directives to be added here
     # 'location' directives to be added thereafter
      
      
  }
  }


First, let’s add simple things: backend PHP scripts (I don’t search to hide them, they are backend so not viewed by human users).
 
6. First, let’s add simple things: backend PHP scripts. We don’t search to hide them, they are backend so not viewed by human users, but machines must have access to these scripts.
  location ~ ^/(api|load|opensearch_desc)\.php$ {
  location ~ ^/(api|load|opensearch_desc)\.php$ {
     include php5-fpm.conf;
     include php5-fpm.conf;
  }
  }


Next, the more important thing: the call to index.php to catch all actions and transmit them to MediaWiki.
 
7. Next, the more important thing: the call to the hidden "index.php" to catch all actions and transmit them to MediaWiki. This use the PATH_INFO parameter of CGI scripts. Additionally, we add an exception for the situation where there is still a parameter "title="; it’s the case in MediaWiki forms (Special:Recentchanges, Special:Log, etc.) and it will make the whole system both backward-compatible and with beautiful URLs (a permanent redirect from the old URLs to the new URLs). The Lua rewriting part was done thanks to an [http://forum.nginx.org/read.php?2,243462,243464 agentzh’s message] (thanks!).
  location / {
  location / {
     if ($arg_title != <nowiki>''</nowiki>) {
     if ($arg_title != <nowiki>''</nowiki>) {
Ligne 109 : Ligne 116 :
  }
  }


The normal scenario, at the end of the location /, split the first part of the URL (only the slash) and the title of the article, transmitted to PHP/MediaWiki via the FastCGI parameter PATH_INFO. The first part of the location / rewrite remaining "title=" which can still be found in MediaWiki in the forms (Special:Recentchanges, Special:Log, etc.) -- AFAIK there is no way to improve this directly in MediaWiki without modifying the core (which is bad because it adds work for the updates).


Now, you should have a quite functional wiki, apart some missing images. Let’s add the UI images from the /skins subdirectory and the /extensions subdirectory at the same time (some extensions have JavaScript files or image files in their directories). Here are allowed only: the JS files, the images, the CSS and LESS files, and .htc files (used by IE).
8. Now, you should have a quite functional wiki, apart some missing images. Let’s add the UI images from the /skins subdirectory and the /extensions subdirectory at the same time (some extensions have JavaScript files or image files in their directories). Here are only allowed: the images, the JS files, the CSS and LESS files, and the .htc files (used by IE).
  location ~ ^/(extensions|skins)/ {
  location ~ ^/(extensions|skins)/ {
     location ~ \.(js|css|htc|less|png|svg|gif|jpg|xcf)$ {
     location ~ \.(js|css|htc|less|png|svg|gif|jpg|xcf)$ {
Ligne 120 : Ligne 126 :
  }
  }


Now we can allow integrated files of the /images subdirectory. Below are two variants depending if your wiki is public or private.
 
9. Now we can allow integrated files of the /images subdirectory. Below are two variants depending if your wiki is public or private.
  location ~ ^/images/ {
  location ~ ^/images/ {
     # Publicly accessible files
     # Publicly accessible files
Ligne 138 : Ligne 145 :
  }
  }


To finish, we can add /robots.txt and /favicon.ico if they are not managed by MediaWiki extensions.
 
10. To finish, we can add /robots.txt and /favicon.ico.
  location ~ ^/(robots.txt|favicon.ico)$ {
  location ~ ^/(robots.txt|favicon.ico)$ {
     try_files $uri =404;
     try_files $uri =404;
  }
  }
== Complements ==
a. If you cannot install the Lua module in nginx, directly-entered old URLs (coming from external places like emails, other websites, user bookmarks, etc.) will work but will be displayed as old URLs ("/index.php?title=Main_Page&action=edit"). Once the user click on some link on the wiki, s/he will see new URLs.
b. To avoid MediaWiki to create URLs with remaining "title=" parameters, MediaWiki core should be patched in many places: in all forms where there is a <nowiki><input type="hidden" name="title" … /></nowiki> HTML tag and possibly in other places. Possibly some bug should be created, at least to give an option to natively create beautiful URLs.
c. MediaWiki update maintenance: nothing from the MediaWiki configuration side, apart if you modify MediaWiki core (it is not recommanded to facilitate updates). Probably the changed configuration parameters will stay backward-compatible for some long time since most were introduced in pre 1.1.0 MediaWiki versions. Possibly some PHP scripts must be added in the point 6 in future MediaWiki releases, see the entry points in your Special:Version page or on [https://www.mediawiki.org/wiki/Manual:Code#Access_points Manual:Code#Access points] on MediaWiki.org. Possibly you can want to change the files enumerated in point 9 (this list is better from a security point of view, but this requires maintenance over time).
d. The nginx configuration above was done with strict security considerations in order to remove all entry points to system files, but this requires maintenance over time. You can want to soften these security checks:
# add more PHP scripts in point 6, see [https://www.mediawiki.org/wiki/Manual:Code#Access_points Manual:Code#Access points] on MediaWiki.org
# add more media files extensions in point 8, or even acess all files in the "extensions" and "skins" subdirectories, or at the contrary blacklist some file extensions (e.g. mainly .php files)
# remove the list of files in point 9, this is probably harmless from a security point of view
e. The list in point 10 could be changed depending of specific needs. For example, you could change the location of the favicon (see [https://www.mediawiki.org/wiki/Manual:$wgFavicon $wgFavicon]), or you can add a file /sitemap.xml, etc. Additionally some MediaWiki extensions or core features propose to manage some "metadata system files" like /robots.txt or /sitemap.xml; if you want to use these features, you could have to create other "location" in nginx configuration with some wrapper to PHP file, similarly to the wrapper for "/images/" in the case of a private wiki.
f. As of MediaWiki 1.23, the entry points on the page Special:Version will show "[ ]" for "index.php", this can be considered a ''bug'': the case where "index.php" is completely removed is not expected :)
g. An easier way to remove most of the index.php is to use [[:mw:Manual:$wgActionPaths|$wgActionPaths]]. With this configuration setting, the 'edit' action on this page would be <tt>[{{SERVER}}/edit/{{FULLPAGENAMEE}} http://{{SERVERNAME}}/edit/{{FULLPAGENAMEE}}]</tt> instead of <tt>[{{SERVER}}/{{FULLPAGENAMEE}}?action=edit http://{{SERVERNAME}}/{{FULLPAGENAMEE}}?action=edit]</tt>. I have no opinion on what is better. It can be kept in mind that some browsers might remove the address part after the "?"; in these browsers, the action would or would not be displayed. Note : how does MediaWiki react with $wgActionPaths when there are (e.g.) "action=edit" and "section=0": is it "/edit/Title?section=0" or "/index.php?title=Title&action=edit&section=0"?


== Conclusion ==
== Conclusion ==


# Now the URLs are "/Main_Page", "/Main_Page?action=history", "/Main_Page?printable=yes", etc.
# Now the URLs are "/Main_Page", "/Main_Page?action=history", "/Main_Page?printable=yes", etc.
# Your articles can have a wide range of titles and are not limited by system files, e.g. you can write an article about "/Robots.txt", "/Images", "/Skins", or "Index.php" (NB: the initial capital).
# Non-latin alphabets benefit of the "real characters" rendering of the browser in the URL
# Apart whitelisted system files and subdirectories, nobody can access underlying system files, e.g. "/includes/Title.php" is considered as an article, as well as "/LocalSettings.php", so an attacker doesn’t have a direct access to the system files, s/he has to deal with MediaWiki before attacking PHP files.
# The system is backward-compatible with existing URLs, and old URLs are rewritten if you have installed the Lua part in the nginx configuration.
# Your articles can have a wide range of titles and are not limited by system files, e.g. you can write articles about "Robots.txt", "Images", "Skins", or "Index.php" (NB: the initial capital).
# Except whitelisted system files and subdirectories, nobody can access to the underlying system files: e.g. "includes/Title.php" is considered as an article, as well as "LocalSettings.php", hence an attacker does not have a direct access to the system files and s/he will have to deal with MediaWiki before attacking PHP files.

Version actuelle datée du 26 août 2021 à 22:39

Langue : en
Statut : stable

This page is about configurating beautiful and pure URLs for MediaWiki, basically without any visible "index.php". For example:

http://wiki.seb35.fr/Tutorial:Beautiful_MediaWiki_URLs?action=edit

instead of

http://wiki.seb35.fr/index.php?title=Tutorial:Beautiful_MediaWiki_URLs&action=edit

and it is even better in languages with non-latin alphabets:

http://wiki.seb35.fr/维基百科?action=history

instead of

http://wiki.seb35.fr/index.php?title=%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91&action=history


Rationale and introduction

The reasoning behind this is: The syntax "index.php" is not interesting from the user’s point of view so it shouldn’t be there; additionally the user should (want) to see understandable page names instead of %-encoded nonsense.

You can see on this wiki these rules in action. Technically, these ugly URLs come from good’ ol’ time’ of CGI scripts (ten years ago), but now —and since some time— it is possible to rewrite URLs on the server, added to the fact that all recent browsers display URLs with characters instead of %-encoded URLs, but still not in the parameters of the URL after the "?". So it’s time to remove this "index.php".

To reconciliate MediaWiki URLs with recent browsers, we take advantage of the "path info" capability offered by servers (here nginx) and we configure MediaWiki such that it vastly limit the rendering of ugly URLs.

Prerequisites

  • MediaWiki, installed in the directory, say, '/var/www/mediawiki'
  • nginx, preferably with the module Lua (package nginx-extras in Debian/Ubuntu)
  • The wiki is assumed to be installed on "http://wiki.example.org/"
  • The server must support the PATH_INFO directive; it is the case for most "nginx + PHP gateway" but you should really check it works before continuing, see Manual:$wgUsePathInfo on MediaWiki.org.

MediaWiki configuration

It is assumed the wiki is directly served from the server root location instead of a subdirectory (e.g. "https://example.org/" instead of "https://example.org/tools/mywiki/"). In other words, the "root" directive in the nginx server is the directory itself where is installed MediaWiki (where index.php is).

The following configuration variables must be added at the end of the file LocalSettings.php, in MediaWiki directory.


1. Remove the "script path" to remove all web-visible subdirectories. This should be done accordingly with nginx configuration, or it could break the installed website.

$wgScriptPath = '';


2. Set the article path to its simplest form. With this, links leading to the 'view' action of the pages will be beautiful, instead of using "index.php". The server should support the "path info" capability —it’s the case of nginx.

$wgArticlePath = '/$1';


3. Now, true improvements begin. When you view an article, the URL is beautiful, but it becomes again ugly when you edit an article or ask its printable form. So let’s remove the "index.php".

$wgScript = '';


4. The links for the actions now have the form "/Main_Page?title=Main_Page&action=edit" for the edit action. There is still the "title=" parameter which should be removed since it is already displayed in the main part of the URL. To achieve that, we have to add a small hook in LocalSettings.php to remove the "title=" parameter.

$wgHooks['GetLocalURL::Internal'][] = 'phpAgnosticURL';
function phpAgnosticURL( $title, &$url, $query ) {
    global $wgArticlePath;
    
    $url = str_replace( '$1', wfUrlencode( $title->getPrefixedDBkey() ), $wgArticlePath );
    
    while ( preg_match( '/^(.*&|)title=([^&]*)(&.*|)$/', $query, $matches ) ) {
        $query = $matches[1] . $matches[3];
    }
    
    if ( $query != '' ) {
        $url = wfAppendQuery( $url, $query );
    }
}

Basically, MediaWiki now generates beautiful URLs but nginx no more understand the URLs where there is no "title=" parameter. In the next part, we will retablish the functioning, we will completely hide the PHP system files (this improves security), and by the way permit the creation of page with almost all possible titles (e.g. you will be able to create the page "Index.php" on the wiki; it is probably useless^^, but perhaps you can be interested by titles "Skins/…").

Nginx configuration

5. The basic skeleton is just below. 'location' directives will be added in the following.

server {
    
    listen 80;
    listen 443 ssl;
    server_name wiki.example.org;
    
    root /var/www/mediawiki;
    
    # 'location' directives to be added thereafter
    
}


6. First, let’s add simple things: backend PHP scripts. We don’t search to hide them, they are backend so not viewed by human users, but machines must have access to these scripts.

location ~ ^/(api|load|opensearch_desc)\.php$ {
    include php5-fpm.conf;
}


7. Next, the more important thing: the call to the hidden "index.php" to catch all actions and transmit them to MediaWiki. This use the PATH_INFO parameter of CGI scripts. Additionally, we add an exception for the situation where there is still a parameter "title="; it’s the case in MediaWiki forms (Special:Recentchanges, Special:Log, etc.) and it will make the whole system both backward-compatible and with beautiful URLs (a permanent redirect from the old URLs to the new URLs). The Lua rewriting part was done thanks to an agentzh’s message (thanks!).

location / {
    if ($arg_title != '') {
        set_by_lua $mwredirect '
            local args = ngx.var.args
            local arg_title = ngx.var.arg_title
            arg_title = ngx.re.gsub(arg_title, "%3A", ":")
            arg_title = ngx.re.gsub(arg_title, "%20", "_")
            arg_title = ngx.re.gsub(arg_title, "[+]", "_")
            local url = "/" .. arg_title
            newargs, n, err = ngx.re.gsub(args, [[\btitle=[^&]*&?]], "", "jo")
            if string.len(newargs) > 0 then
                url = url .. "?" ..newargs
            end
            if string.sub(url, -1) == "&" then
                url = string.sub(url, 1, -2)
            end
                return url
        ';
        return 301 $mwredirect;
    }
    include php5-fpm.conf;
    fastcgi_split_path_info ^(/)(.*)$;
    fastcgi_param  SCRIPT_FILENAME  $document_root/index.php;
    fastcgi_param  PATH_INFO        $fastcgi_path_info;
}


8. Now, you should have a quite functional wiki, apart some missing images. Let’s add the UI images from the /skins subdirectory and the /extensions subdirectory at the same time (some extensions have JavaScript files or image files in their directories). Here are only allowed: the images, the JS files, the CSS and LESS files, and the .htc files (used by IE).

location ~ ^/(extensions|skins)/ {
    location ~ \.(js|css|htc|less|png|svg|gif|jpg|xcf)$ {
        allow all;
        try_files $uri =403;
    }
    deny all;
}


9. Now we can allow integrated files of the /images subdirectory. Below are two variants depending if your wiki is public or private.

location ~ ^/images/ {
    # Publicly accessible files
    location ~ ^/images/(\.htaccess|README|lockdir.*)$ {
        return 404;
    }
    location ~ /$ {
        deny all;
    }
    try_files $uri =404;
    
    # Private files for a private wiki
    #include php5-fpm.conf;
    #fastcgi_split_path_info ^(/images/)(.*)$;
    #fastcgi_param  SCRIPT_FILENAME  $document_root/img_auth.php;
    #fastcgi_param  PATH_INFO        $fastcgi_path_info;
}


10. To finish, we can add /robots.txt and /favicon.ico.

location ~ ^/(robots.txt|favicon.ico)$ {
    try_files $uri =404;
}

Complements

a. If you cannot install the Lua module in nginx, directly-entered old URLs (coming from external places like emails, other websites, user bookmarks, etc.) will work but will be displayed as old URLs ("/index.php?title=Main_Page&action=edit"). Once the user click on some link on the wiki, s/he will see new URLs.


b. To avoid MediaWiki to create URLs with remaining "title=" parameters, MediaWiki core should be patched in many places: in all forms where there is a <input type="hidden" name="title" … /> HTML tag and possibly in other places. Possibly some bug should be created, at least to give an option to natively create beautiful URLs.


c. MediaWiki update maintenance: nothing from the MediaWiki configuration side, apart if you modify MediaWiki core (it is not recommanded to facilitate updates). Probably the changed configuration parameters will stay backward-compatible for some long time since most were introduced in pre 1.1.0 MediaWiki versions. Possibly some PHP scripts must be added in the point 6 in future MediaWiki releases, see the entry points in your Special:Version page or on Manual:Code#Access points on MediaWiki.org. Possibly you can want to change the files enumerated in point 9 (this list is better from a security point of view, but this requires maintenance over time).


d. The nginx configuration above was done with strict security considerations in order to remove all entry points to system files, but this requires maintenance over time. You can want to soften these security checks:

  1. add more PHP scripts in point 6, see Manual:Code#Access points on MediaWiki.org
  2. add more media files extensions in point 8, or even acess all files in the "extensions" and "skins" subdirectories, or at the contrary blacklist some file extensions (e.g. mainly .php files)
  3. remove the list of files in point 9, this is probably harmless from a security point of view


e. The list in point 10 could be changed depending of specific needs. For example, you could change the location of the favicon (see $wgFavicon), or you can add a file /sitemap.xml, etc. Additionally some MediaWiki extensions or core features propose to manage some "metadata system files" like /robots.txt or /sitemap.xml; if you want to use these features, you could have to create other "location" in nginx configuration with some wrapper to PHP file, similarly to the wrapper for "/images/" in the case of a private wiki.


f. As of MediaWiki 1.23, the entry points on the page Special:Version will show "[ ]" for "index.php", this can be considered a bug: the case where "index.php" is completely removed is not expected :)


g. An easier way to remove most of the index.php is to use $wgActionPaths. With this configuration setting, the 'edit' action on this page would be http://wiki.seb35.fr/edit/Tutorial:Beautiful_MediaWiki_URLs instead of http://wiki.seb35.fr/Tutorial:Beautiful_MediaWiki_URLs?action=edit. I have no opinion on what is better. It can be kept in mind that some browsers might remove the address part after the "?"; in these browsers, the action would or would not be displayed. Note : how does MediaWiki react with $wgActionPaths when there are (e.g.) "action=edit" and "section=0": is it "/edit/Title?section=0" or "/index.php?title=Title&action=edit&section=0"?

Conclusion

  1. Now the URLs are "/Main_Page", "/Main_Page?action=history", "/Main_Page?printable=yes", etc.
  2. Non-latin alphabets benefit of the "real characters" rendering of the browser in the URL
  3. The system is backward-compatible with existing URLs, and old URLs are rewritten if you have installed the Lua part in the nginx configuration.
  4. Your articles can have a wide range of titles and are not limited by system files, e.g. you can write articles about "Robots.txt", "Images", "Skins", or "Index.php" (NB: the initial capital).
  5. Except whitelisted system files and subdirectories, nobody can access to the underlying system files: e.g. "includes/Title.php" is considered as an article, as well as "LocalSettings.php", hence an attacker does not have a direct access to the system files and s/he will have to deal with MediaWiki before attacking PHP files.