Modeling on a semantic web
This page contains a brief outline of its final contents.
- The collaborative potential of semantic modeling is only fully realized when models are run over a network – the larger the network, the greater the number of possible options for contextualizing each single concept to produce scientific observations. Through intelligent ranking of the possible alternatives, a large network allows users to build models that offer the best possible representation of the system of interest. Ranking allows resulting observations to be customized to the spatial, temporal, and conceptual resolution, and automatic up- or down-scaling to reflect the amount and quality of data and model components available on the network.
- The IM semantic network is formed by an arbitrary number of IM servers (using the k.Server software deliverable maintained by the partnership). Servers are enabled by a server certificate and connect to others explicitly, by exchanging secure certificates, so that the content of the network – crucial for the quality of the resulting observations – can be controlled and monitored. Each server can provide:
- Knowledge projects, with semantics, data annotation and models to resolve user queries. As all servers in a network adopt the latest version of the same worldview, and all servers can serve it.
- k.LAB components, i.e., knowledge and binary assets that implement or wrap complex, independently developed models to be used transparently in modeling sessions.
- Facilities to connect data assets from files or other non-semantic servers to semantically annotated projects. This is implemented through connectors – software plug-ins that give k.Server the ability to proxy an external data source through a uniform identifier (URN), which can be referenced in k.IM code to provide semantics for that resource. Currently available connectors implement file-based access to a variety of resources and the main Open Geographic Consortium (OGC) spatial data protocols, namely the Web Coverage Service (WCS) and the Web Feature Service (WFS). Assets hosted by servers using such protocols can be hidden behind a URN, so access to the resources in them is handled through the IM semantic network, including access control if needed. Another existing connector provides access to NOAA weather station data (GHNC) and adds simple, spreadsheet-based weather data submission from users. More connectors will be developed by the partnership to handle OpenDAP and other important data access protocols present or future. Interfaces are being designed to connect to distributed file systems and processing (Apache Spark) for large (big data) applications complemented with the power of semantic modeling.
- User authentication and authorization services. Each server can choose between direct authentication (using a locally maintained user directory, e.g. LDAP) and indirect authentication (redirecting authentication requests to a directly authenticating server). The partnership maintains the central LDAP directory of all registered users of the IM network, authenticated through the primary IM server and therefore available to all others.
- Statistical services to record important information about the usage of each knowledge asset, accessible to users that have provided them and to the modeling engine. This can enable future crowd-sourcing of features in the ranking algorithms.
- Other features (such as crawling, see below) can be provided through additional software plug-ins and enabled or disabled through a web-based administration dashboard.
- The modeling engines (k.Modeler) are also implemented as server software: in a modeling engine, models are run and visualized through network calls, submitting each observation query to the semantic network. The engine also accesses local knowledge content, and is normally used in private mode on modelers’ machines for development and research. At the same time, institutions can install an engine on a public web server to provide external users with modeling facilities accessible through the World Wide Web. In such cases, each institution can freely define local user directories and access privileges for the specific models and modeling services exposed by their networked engine.
- The web interface exposed by k.Modeler is called k.Explorer – a full-featured, non-technical, drag-and-drop interface that implements the semantic modeling paradigm for users of all experience levels. The k.Explorer interface hides most details of semantic modeling from its user: a model is run by simply choosing 1) the context of interest, interactively and 2) the desired observable from a simple, auto-completing search form. Results and provenance information are then supplied to the user. The k.Explorer prototype supports user-editable, self-documenting “tool palettes” that offer specific types of models to specific classes of users. Different palettes can be accessed by users, and they can easily be built and customized (also by users without knowledge of semantics), then localized to regions so that individual research or decision-making contexts can be easily supported.
- All servers connected in a network adopt the same worldview, which is tied to the server certificate, and can access the distributed user directory and the latest version of the shared worldview. When an engine connects to the network, all the servers it is authorized to access will serve content to the models run on it. Any server on the network can be an access point for the engine; the view of the network resulting from the connection will depend on the user roles, and are connected to the user secure certificate. Queries for definitions of a concept (models and data) made during a modeling session will be dispatched to all the accessible servers before ranking and final choice of the best candidate.
An optional, configurable feature of k.Server is a web crawler that allows willing partners (i.e., researchers, research groups or data repositories) to have their data content automatically discovered from the Web. If an institution makes a data file available through the network and provides, in the same location as the data, a file with the same name and extension .kim containing the corresponding semantic annotation in k.IM, the crawler will automatically validate the semantics and the accessibility of the data, and produce the resulting models that will be automatically served to authorized users. By indicating a set of seed URLs for crawling, institutions can ensure maximum visibility of their data with no further action than writing a few lines of k.IM and storing them along with the data.