Sorting results with friendly collation

From CODECS Dev

...

Evaluation - UCA_ etc.

  • Numerical sorting in both numeric and alphanumeric values, e.g. "I walked 20km" should always precede "I walked 100km".
  • Case-insensitive sorting
  • Sorting of character with or without diacritics according to ...
    • Case-insensitive sorting.
    • Language-specific treatment of character sequences. For instance, ll and rh are treated as distinct graphs in the Welsh alphabet.

Choose your entity collation algorithm

Update the settings for entity collation

Configure. Example:

$smwgEntityCollation = 'uca-default-u-kn';

Update the settings for category collation

To keep things consistent, apply the same collation to page sorting in MediaWiki categories by setting $wgCategoryCollation to the same value, e.g.

 $wgCategoryCollation = "uca-default-u-kn'";

Run maintenance scripts

To make sure changes in the settings above take effect, run the following maintenance scripts:

Background

SMW stores objects for wiki pages, subobjects and values of type Page in the smw_object_ids database table (MySQL not ES). This table holds two different fields relevant to sorting:

  • smw_sortkey : a literal sort value limited to 255 bytes.
  • smw_sort : the sort value used to support local-specific (ICU) sorting and collation. It is limited to 255 characters. This is the field that gets updated when updateEntityCollation.php is run. See e.g. TableBuildExaminer::updateSortField.

Keep the database in shape

Check .smw.json is intact

Make sure that running Composer does not accidentally remove, or make incorrect changes to, .smw.json since this is where SMW's maintenance script automatically adds the required settings. The possibility of this happening, as yet unconfirmed, was suggested in issue 5609.

Take special care with Elasticsearch

In the past, issues with Elasticsearch have often led to issues with sorting results in SMW. It is not enough to rebuild the Elasticsearch data store (rebuild­Elastic­Index.php).

It is required to do a full database rebuild through the rebuildData script. Be warned that the process is rather resource-intensive because all wiki pages will be crawled through and reparsed to re-create the semantic data in the database. Consult the document elsewhere on this site for guidance on how to run rebuildData.php responsibly without crashing the server.