A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.w3.org/DesignIssues/ConnectingScience.html below:

Connecting Sciences - Design Issues

Tim Berners-Lee
Date: 2004-01-04, last change: $Date: 2004/01/13 20:17:21 $
Status: personal view only. Editing status: first draft.

Up to Design Issues

Connecting the Sciences

with the Semantic Web

Summary

It interesting to use the Semantic Web for connecting the sciences because increasingly major problems can only be solved by using many fields at once; and because scientific information naturally tends to be "data", ie. relational, logical and/or numeric in form, and so Semantic Web technology is easy to apply.

The need

No scientific discipline is as island. The fields of study to which we give names have fuzzy edges, and overlap one another. They are in fact connected in a loose web which evolves with time, as new fields arise, and we change our perceptions of existing ones. Consider physics , physical chemistry, organic chemistry, cell biology, proteomics, genetics, epidemiology, medicine, pharmacology: wheras one might be an expert in one without being an expert in all of them, one typically has to have a knowledge of neighboring fields.

Of the challenges which confront science, many interesting ones, particularly in the study of the human biology, seem to require the tracing of pathways though many fields. In searches for cures for AIDS, for cancer, or for new viruses such as the SARS, the amount of information to be brought to bear is huge, but spans many disciplines.

Now, naturally, different fields have come up with different ways of modelling their data, different standards for recording it. This makes it very difficult to try out new ideas which cross fields: one has to negotiate for the conversion and transfer of data in each case. This is normal. It takes great time and effort to bring more than one group together to use common data formats and common vocabularies.

The solution

The Semantic Web technology is designed specifically to overcome this problem in a decentralized fashion. That is, it is designed to allow conceptual connectivity between neighboring fields to be set up retrospectively and incrementally. Retrospectively, in that often the modelling has already been done in each field and the data already exists. The overlap of concepts only partial, but adding the metadata which expresses that overlap where is does exist is valuable. Incrementally,n that one does not re in that one does not redesign the data models at once, but instead work at the interfaces progressively building links between related concepts.

The Semantic Web language rise above the level of XML, at which document structure is defined, to the level at which the classes of real things in the field in question are defined, the relationships between them and their properties.

Openness

During the early years of the WWW, an element of reluctance was a hesitation by companies to allow information such as their catalogs or parts lists to be available to the general public. This hesitation evaporated when it became clear that only those companies about whose products information was freely available on the web were likely to be involved in any commerce at all. Currently, funders of science have been known to bemoan the disappearance of the original data upon which reports and papers were based. We discovered with the web of human-readable information that much of the benefit was serendipitous: information was used to advantage in ways that its publisher could never have imagined, and the enquirer who started off surfing for a particular solution often finds quite different solutions to that envisaged, not to mention solutions to quite different, but equally pressing problems.

The history of science is peppered with discoveries made serendipitously - from the proverbial bath of Archimedes through the discovery of penicillin, to the discovery of the effect Viagra. If we are to make new discoveries using information on a huge scale, we will need to emulate the openness of the minds of these researchers by making scientific data available in a Semantic Web so that crazy hypotheses can be tested in a few moments harnessing data from many diverse fields.

Indeed, science itself is not an island, as, for example, a epidemiological survey often yields results when joined with geographical an economic data. The search for a disease outbreak could take one into weather patterns, corporate financial statements, or flight timetables. It is important that the scientific Semantic Web is seen as one interoperable part of the larger Semantic Web.

One particular aspect of openness is the lab notebook. The notebook is by tradition a write-only medium in which the scientist writes what he or she did, the environmental conditions a the time, and the results observed. Often such information fades but occasionally it becomes important after the fact. Semantic Web standards, and the use of Semantic Web-aware instrumentation, may make the recording of these incidental things easier. By analogy with the lab notebook, a researcher group may keep a lot of metadata which it may not wish to publish, at the time but which may be useful to posterity. For this information, we need to find a suitable policy which works for everyone involved.

In the longer term, the Semantic Web will by its existence highlight issues such as privacy, the anonymizing of clinical trail information, the protection of possibly security-sensitive infrastructure information, the meaning of copyright especially of compilations, and so on.

Early Steps

Although there is much work yet to do in developing Semantic Web technology, basic standards exist. The Web ontology Language (OWL) allows ontologies to be written so they can be read and processed by machine; the Resource Description Framework (RDF) allows data to be published using OWL ontologies, so that the data itself can be published and re-used by others.

The building of the Semantic Web is a distributed, decentralized task. It behooves those of us who have information which may be useful to others to model it carefully, to discuss ontologies with our neighbors, and expose the information on the web. (It would not be unreasonable to make such publication a condition of funding.) What can be done to encourage this in the early days, to get the snowball rolling?

Firstly, it would be useful to create some simple ontologies for common basic concepts of science. Weights and measures, the periodic table, physical constants, and simply molecules cry out for a standard description. The sort of data would be valuable as a basis for much more complex scientific data, but also would be a great resource for schools. This basic ontology and dataset would also be a service for other fields: one could see chemical data being used as a basis for hazard information, for food and drug information, and for the chemical supply industry, for example.

Secondly, a few example datasets of great general value would demonstrators of how things should be done, and probably give rise to new tools and experience to be passed on. Geophysical, meteorological, pharmaceutical incompatibility information, e many candidates for early adoption.and genome data come to mind, but there must be many candidates for early adoption.

Initiatives to bring scientific data to the Semantic Web could originate in individual researchers, by funding groups, by journals, or scientific associations or academies. If the grow can be compared with can be compared with that of the early WWW, it will occurs wherever an individual person understands the potential long-term global benefit, and so finds a way to put in the short-term effort to make it happen locally.

Up to Design Issues

Tim BL


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.3