Implementation notes: MW and types

From CODECS Dev

In the specifications of the Reconciliation API, a type is generically described as something that “represents a category of entities”. While services are not required to work with types, they can be efficienty used to improve the discoverability of your data. What then would be the equivalent of a type in the MediaWiki context and how should it be implemented?

Requirements and options

The definition is generic enough to leave room for interpretation and the specificities of implementation on behalf of the providing service. It does not dictate any specific ontological model, although it is evident that some data structures are better suited to LOD than others.

That said, in the course of the guidelines, it also becomes clear that a type may, should or must adhere to a couple of functional guidelines.

  • 1a. If provided, a type must be identifiable with the following fields:
    • id (required): a unique identifier
    • name (required): a human-readable name
  • 1b. A type may be identifiable with the following field:
    • broader (optional): an array of types that represent (single-level) 'broader' categories of entities. Represented in a hierarchy tree, those types would be immediate parents of the type at hand. The documentation does not insist on any specific interpretation of broad vs narrow. If skos:Concept was intended as a hint, the meaning remains flexible.
  • 2. Ideally then, types allow for the scope of a query to be broadened.
  • 3. An entity may be identifiable with one or multiple appropriate types. A record for an entity must come with a type field but it is okay for the array it holds to be empty.
  • 4. Ideally, it should be possible for the suggest service to suggest types.
  • 5. Through defaultTypes, it should be possible for the service manifest to suggest a default array of types to start out with. If types are not used at all, a generic name may be used to indicate that all “all entities in the database are instances of this type”.

MediaWiki equivalents

What about MediaWiki? In the ecosystem of MediaWiki and its extensions, there is more than one candidate to serve as an equivalent of type:

  1. it can be a MediaWiki Category (here capitalised to distinguish it from the everyday term)
  2. in Semantic MediaWiki, it can be
    1. a Concept (again capitalised, for the same reason)
    2. a value from a property of type Page or Text (or Monolingual Text) provided the property to be used was announced to the system. The property is user-defined and there is no generally accepted name though many have settled on "Class", "Has class" or similar variants.
  3. in Wikibase/Wikidata, it can point to a value of 'instance of' (Property:P31) or perhaps 'subclass of'.

(1) MediaWiki Categories

Categories have been a recognisable and flexible part of MediaWiki’s system of organising wiki pages since very early days.

Wikipedia’s own use of Categories may be described as a mix of different appproaches to classification. Sometimes a Category gathers pages about subjects of the same type (e.g. a person, author, etc.); sometimes, it simply lumps together pages, e.g. people, events and works of interest, based on their common relationship to same general subject area, e.g. "Science"; often it does both. None of this diminishes the fact that it can also be used to gather pages that represent the same type, e.g. a person, an object, etc.

Pros
  • It is a tried and tested system of MediaWiki core, which does not require any extensions to be installed.
  • [?] What potentially makes a Category ideally suited is that it supports hierarchical inheritance: pages in a subcategory are automatically inherited by its parent category (don't get confused by the image of parents inheriting from their children!), and so on.
Cons

There are some potential downsides:

  • What they lack is the granularity that comes with semantic data management.
  • Categories have been known to update slowly in the database, at least in the past.

(2) Semantic MediaWiki: 'classes' with a dedicated property

@todo


For instance, just as Wikidata uses the property "is instance of" to categorise people as "human", we can use a similar Property to associate the subject with a similar page.


Pros


Cons


(3) Semantic MediaWiki: Concepts

A Concept is a collection of pages that is the result of a user-defined semantic query.

Pros

Because a Concept is defined and stored on a wiki page, it could represent another natural answer to type. It allows for query-based granularity, if that’s what’s needed, and supports caching, which should improve the response time of a query.

Cons

Unfortunately, what makes the deployment of Concepts as types challenging is that they are more of an afterthought rather than an intrinsic part of the data structure. For the service to suggest relevant Concepts, those Concepts would need to be embedded into the network of semantic relationships.

  1. Concepts, unlike Categories, are not usually linked to broader Concepts. In order to support 'broader', Concepts should use a property, e.g. "Has broader concept", and the name of this property should be added to the configuration.
  2. Ideally, Concepts should use a property, e.g. "Belongs/applies to class", to associate them with the general 'class' to which its results belong.
  3. Concepts are not linked to the entities that are the result of its query. This means that it is not possible

For an entity to be identifiable through a type field (see above), a Concept

must be linked somehow. Concepts should use a property, e.g. "Belongs/applies to class", to state the class.

? @todo: set up setting variable

Table

Category Concept Class page
associate entity with type works only with additional effort and rules works

A neutral 'type'?

defaultTypes

The service manifest

hierarchies

'types' are potentially organised in a hierarchy.

  1. Supported by MediaWiki categories

To consider

Understandably, the API does not distinguish between the various mechanisms that a particular service may choose to use when working with types. But the service needs to be able to recognise which mechanism is used in a given context.

  • Naming:
    • If the id starts with Category:... (canonical name), we can be sure it's a Category.
    • If the id starts with Concept:... (canonical), it's a Concept.
  • Profile
    • In most other cases, the profile itself should be our go-to.

See also

Notes on '7.2 Data Extension Property Proposals (optional)'