private thoughts about life behind data processing

Monthly Archives: syyskuu 2016


Some words about INSPIRE

INSPIRE conference is going and as I’m keeping two presentations there, it’s a great time to write down some thoughts about the directive. I’ve been somewhat active around the directive for four years now. My activities has been focused more on practical side: Finnish Meteorological Institute (FMI) has very extensive INSPIRE services as the same services are used for open data and INSPIRE services.

INSPIRE is designed to ease data exchange between European countries. It is a hugely complicated EU directive which isn’t very widely known nor utilised although by now all European agencies should provide INSPIRE services. In very short: it requires all public spatial data providers to provide their data via OGC services in a certain form and publish ISO 19115 metadata from their data. The required data formats varies a bit depending on domain, but all formats are based on Geographic Markup Language (GML) and Observation and Measurement (O&M) data model. The directive do not require open data.

In general, there’s no a single type of API to rule them all. While INSPIRE represents OGC world with O&M data model, WFS, WMS and other three-letter-standards, many modern edge developers would like to see much more simple REST APIs with JSON based data models. There’s always also an old guard who prefer FTP and more traditional data models and file formats. BUFR and GRIB are hot for example for meteorological domain.

All type of technologies have their up- and downsides. Old guard’s file based FTP systems have worked for decades. It’s a steady technology and we have ready tools to implement and use it. But generating and moving files for every transaction come quickly short in modern service based architecture. It’s better to move applications near data than data near applications.

Modern Edge

Modern edge developers prefer everything quick and easy. They don’t mind using FTP, complicated command line tools or C-libraries to process data for they specific use case. Several python and javascript libraries with Node.js and other modern web development tools have enabled a rapid prototyping and application development. A simple RESTful JSON interface with these powerful easy-to-use development frameworks are indeed an excellent combination for creating small software and web pages.

But most JSON based data formats and REST interfaces comes quickly short in serious geospatial business. Partly because of their technical features and mostly cause they are over-simplificating things. Some of these aspects are listed below:

  1. There are very few standards in REST-JSON-world. For example, there’s no standard way to select area of interest (bounding box for example), time nor output format. Every single client has to be bespoke development and every data has to be mapped together case by case. It’s fast to develop a single client, but it doesn’t take you far.
  2. For standards that DO exists, there isn’t enough expression power. For example GeoJSON community has gone so far in simplification that only WGS84 CRS is allowed — although it can only support for couple of hundreds of meters accuracy. I hope the future autopilots in aeroplanes are not based on GeoJSON.
  3. Typical  JSON interfaces don’t specify a standard way to indicate missing data (instead of broken pipeline) nor changes in data content. Environmental observations may be missing for countless reasons. For example snow depth may be missing because observation station is down, station has no sensor for snow depth or it’s summer. JSON typically says just ”not a number”.
Some interfaces and data types mapped against real life interoperability vs. rich interaction

Some interfaces and data types mapped against real life interoperability vs. rich interaction

And in addition, it’s hard to create intelligence programs based on too simple data services since there’s no way to know what the content really is. (Needs always human to interpret.) Unless you happen to be google-wice.

INSPIRE

INSPIRE, allied with OGC, tries to tackle these challenges of both old guard and modern edge developers. It comes from top as an order and it’s not very popular. I can see why. When I first saw the INSPIRE compliant data, I hated it (much because of overhead). Data format is too complicated and one needs a hell of a lot of time to get familiar with it. And it doesn’t help that it’s also very poorly documented from user point of view.

Why INSPIRE is actually a pretty good thing?

First, INSPIRE provides one framework to follow for all European geospatial data providers. Second, it’s based on standards. After going through all theoretical jargon and being able to actually develop INSPIRE compliant client (sure this has taken more time than simple REST-JSON-client), one should be able to use (almost) the same client to ALL European geospatial services.

At least if INSPIRE is really followed in all countries. Some data specification experts have seemed to taken do as little as possible strategy. They have tried just to drive their nations goals, not common European goal. Thus, no complete harmonisation for large data sets like images or meteorological data has really achieved.

Another challenge in INSPIRE is that data formats does not provide very good performance, mostly because of the used metadata. (Ok, the overhead compress very well, but still. All the compression and uncompression require a lot of CPU recourses and it’s still slow to handle the uncompressed data.) But INSPIRE is meant for data exchange, not to be as a back end of the client. It’s meant to provide compliance with several different types of data with a single client. That kind of interoperability can’t be achieved with very simple design.

Relative INSPIRE data model file sizes. Demonstration done with 138 weather station, 11 parameters and 12 hours of data.

Relative INSPIRE data model file sizes. Demonstration done with 138 weather station, 11 parameters and 12 hours of data.

From user point of view, there’s even more serious challenge: there are very few INSPIRE compliant clients. INSPIRE has came as a top-down order and it do not (at least yet) have very wide community support. This makes utilising a bit complicated services even harder. The time may help us, but I wouldn’t try to hold my breath.

Conclusion

So, every type of interface has their pros and cons. Simple data formats and interfaces enable quick and simple development. But when nothing is given, nothing is gained. INSPIRE provides a good standard based solution, but it has a lot of disadvantages. Most importantly, while OGC services with GML output has many excellent features it has proven to be too complicated for many use cases. JSON and REST should not supplant standards — standards should (and will) start supporting JSON and REST.

Will INSPIRE succeed? Depends on how widely it’s accepted between data providers. For wider acceptance, standards and data models should adopt JSON and REST and much clearer documentation with examples are required. After that we could expect to get some general OSS INSPIRE compliant clients.


http://inspire.ec.europa.eu/events/conferences/inspire_2016/page/home

https://en.ilmatieteenlaitos.fi/open-data

http://inspire.ec.europa.eu/

http://www.opengeospatial.org/

http://www.iso.org/iso/catalogue_detail.htm?csnumber=53798

http://www.opengeospatial.org/standards/gml

http://www.opengeospatial.org/standards/om

https://en.wikipedia.org/wiki/Application_programming_interface

http://www.opengeospatial.org/standards/wfs

http://www.opengeospatial.org/standards/wms

https://en.wikipedia.org/wiki/Representational_state_transfer

http://www.json.org/

https://www.wmo.int/pages/prog/gcos/documents/gruanmanuals/ECMWF/bufr_reference_manual.pdf

https://en.wikipedia.org/wiki/GRIB

https://nodejs.org/en/

http://geojson.org/

https://en.wikipedia.org/wiki/Geodetic_datum