The problem

As our world has grown more interconnected, society’s challenges have also become more complex – food, water, and energy security, human migration, natural disaster prediction and response, climate change, and beyond. These challenges demand scientific solutions that empower decision makers with models that draw on decades of scientific information, integrating diverse data in near real-time with high predictive power.

Such approaches could integrate the diverse measurements taken daily across our world – data from space-borne satellites, sensor networks spread across our cities and oceans, and social media and other crowdsourced data provided by people across the world.

Unstructured big-data approaches have been used with great success in fields ranging from internet search to advertising to shipping and health care. Yet the complexity of scientific data makes such approaches inappropriate for rigorous, predictive, multidisciplinary scientific models.

This leads to today’s status quo: low reusability of scientific knowledge, with knowledge growing too slowly to address urgent societal challenges.

Scientific models are frequently developed across diverse fields, but many are accessible only to their developers, and lack the transparency to be well understood by the public and decision makers.

Technical solutions exist for the problem of integrating scientific knowledge, but have often been developed and applied in piecemeal fashion. Innovative approaches, including open data repositories, collaborative modeling, and ontologies to address data interoperability challenges, are largely confined to narrow disciplines. Scientists have thus succeeded in improving knowledge reuse on a limited scale, but not in ways that will support predictive modeling of complex societal challenges.

What if our data and models could talk to one another, and decision makers could use scientific information to more quickly and reliably answer questions about today’s most urgent problems?

The approach

Integrated modeling is an approach for the reuse and connection of scientific information that promises open knowledge for better decisions.

Over the last ten years, our work has steadily advanced toward a full implementation of the semantic web, with great promise for both science and society. Our approach builds on recent advances in open data and data integration, described in brief below and in detail in the rest of this site.

We view integrated modeling as an implementation of the semantic web, a subset of the World Wide Web where data and models exist as first-class research objects that can be found online and read and understood by both humans and computers.

We begin by conceptualizing integrated modeling worldviews. A worldview contains shared semantics that can address large cross-domain problem areas – for instance, linked socio-environmental systems.

The semantics that accompany a worldview are then used to annotate integrated modeling resources, i.e., data and models. Semantics are by design highly modular, parsimonious, and logically consistent, and allow linking to established authorities – wrapping vocabularies and thesaura that are already well-accepted for a particular scientific field or domain. As the number of individuals annotating data and models grows, more and more knowledge becomes available, which the system can use to simulate new phenomena with greater accuracy.

Using a web browser or integrated development environment, the user completes a simple action of observing a concept (e.g., elevation, streamflow, or human migration) within a specified spatiotemporal context. k.LAB, the underlying software stack, assembles the data and models needed to observe the concept, choosing from the best available data and models to match the context of interest.

Data and models, which have been made interoperable through semantic annotation, are assembled using artificial intelligence (i.e., machine reasoning over the semantics and intelligent ranking of the best options), which selects the best data and models for the context based on user-encoded decision rules.

The results are then delivered to the user, including model inputs and outputs, in well-known file formats, and a printable report documenting the data and models selected and run. Results are secure, with appropriate data access provided through a user certificate system, and traceable, with full provenance information provided to the user.

The solution

Our approach to integrated modeling combines the k.LAB open-source software platform and the k.IM semantic annotation and modeling language.

k.LAB software underlies both the integrated development environment and integrated modeling web interface. The k.IM modeling language provides approximates, as best as possible, plain-English descriptions of scientific data and models. This yields a shared, but extensible, formalization of a scientific language for integrated modeling. Advanced users can code additional model functionality in k.IM using the Groovy and Java programming languages.

Both k.LAB and k.IM are designed to be both simple to use yet powerful and flexible, supporting both novice and advanced users, and to adhere to open data and open-source software principles. They thus support a spectrum of user types, as described below.

General, non-technical users run model workflows in an online interface by simply specifying their observable and the spatiotemporal context of interest. Nontechnical users thus primarily reuse existing data and models. The provenance information supplied with their results maintains transparency, while built-in decision rules for data and model selection reduce the likelihood that inappropriate data and models are accidentally selected.

Intermediate-skill modelers can both reuse others’ data and models and contribute their own as they test new ways to link disciplinary data and models. k.LAB and k.IM support construction of small, independent models that can be reassembled to simulate more complex, linked phenomena. This allows modelers of various skill levels and interests to contribute to a growing base of data, models, and knowledge.

Developers and other highly technical users can contribute more advanced scientific knowledge, extending and improving the knowledge base of both semantics and models on the integrated modeling cloud.

Finally, data providers can host data in a way that maximizes the utility and reusability of existing scientific data. By hosting data in a way that can interface with the k.LAB/k.IM integrated modeling platform, scientific data repositories can enable data to be more quickly ingested and reused in complex scientific workflows.

In sum, our approach provides a collaborative, Wikipedia-like environment for scientific simulation powered by key components of artificial intelligence – semantics, machine reasoning, and machine learning. It also offers a path forward in modeling complex systems – one where disciplinary experts have the capacity to model key phenomena in their area of expertise, while a machine connects the building blocks of more complex models. The burden of complexity in integrated modeling is thus transferred from humans to AI.

The Integrated Modelling Partnership provides a vehicle for partner organizations to steer the direction of new growth and expanded functionality of this integrated modeling system – speeding the development of advanced integrated modeling features and capacity for users at all levels.