» 2017 » helmikuu

Retrospective to FMI Open Data

8.2.2017 by r@roopetervo.com

Three years have passed since Finnish Meteorological Institute released it’s open data portal and now it’s good time for a retrospective. When the Open Data Project started, I was just finished my M.Sc thesis (although I had worked full time at FMI several years already) and I was ready to take a serious role in something cool. And serious role I took indeed. In the beginning, I worked in the project as an OGC² and INSPIRE³ specialist but soon I found myself as an overall architect and responsible for the Download Service.

It’s worth noting that this is my personal blog and this text tries to reflect on only my part and my opinions about the Open Data.

FMI’s open data initiative covers a significant amount of often updating data. The service contains weather, marine, air quality and radiation observations, weather and marine forecasts and different kinds of climate data. To browse the whole content of the data one can either access Data Catalog⁴ or Download Service’s stored query listing⁵. Many of the data sets are large (size up to ~30 gigabytes) and almost all of them have an update interval from 1 minute to 6 hours. This combination of large size and high update interval makes the weather data quite unique. FMI Open Data Portal is practically fully INSPIRE compliant with correct services, harmonised data formats and proper catalog.

INSPIRE directive require every EU state agency to share it’s geospatial data in certain format via certain services but it does not require open data. The simplest way to meet INSPIRE directive is to create a simple RSS-feed with link to downloadable items. More sophisticated services, relying heavily on OGC standards, has to be ready not until 21/10/2020⁶. After deadline, the metadata need to be shared via CSW-interface⁷ in ISO 19115 -format.⁸ The data has to be accessible via View Service, meaning WMS 1.3.0 interface⁹. Download Service has to be established to download the data itself. WFS 2.0¹⁰ is one obvious choice while some others can be also used. The data has to be provided in INSPIRE specific data models (derived from OGC O&M model¹¹) directly from download service or via specific Transformation Service.

At 2012, we were sitting at one large meeting room. There was maybe 20 of us. We were starting a project which was established to meet INSPIRE requirements. The data group had created several different scenarios about the data release but in the beginning we had no decisions.

One notable peculiarity in INSPIRE is that it provides these two different way to provide the services. While some people supported more RSS-feeds as they are simpler I recommended going to WFS straight from the beginning. Double work makes sense very rarely. Luckily I got some support and we decided to implement WFS interface to our data server SmartMet Server¹². There was also a clear consensus in the project team that we are hitting the INSPIRE requirements and possible open data with same solution. Double work makes sense very rarely.

Few months went on designing data formats, fulfilling the catalog and speculating about WMS view services. In august the Ministry of Transport and Communication decided to support FMI Open Data with 5,8 million euros¹³ (part of the budget increase was needed to compensate decreased data selling shares and part to the data releasing project). And things got going.

We had chosen our way. But how did we do? Was following INSPIRE specifications a good choice? Of course we did have other ways. We could had chosen RSS-feeds for INSPIRE and some other way for open data. Or maybe, just maybe, we could had chosen to work as a backend for our open data users as HSL does¹⁴.

Steven Adler, IBM Chief Data Strategist, specified requirements for well published open data in OGC meeting in January 2016:

It has to be linked to other data.
It has to be linked spatially (and has to be referable internationally).
The data has to be machine readable.
The data and the publishing process has to well governed.
The project has to be public; people has to be able to find the data.
The data has to be openly usable.

Note that this list does not contain easily usable, although it’s of course a good thing. Things should never be any more complicated than they need to be — but neither any simpler. Let’s see how we are answering to these requirements.

Machine Readability

Choosing to follow INSPIRE directives has gained some critics but also some praise. Critics has mainly came from private developers and praise from larger companies. In my previous post¹⁵ I speculated about good and bad sides of INSPIRE directive. It’s based on standards and includes a very good semantics structure but it’s unduly complicated. And probably because of complexity, it’s not very popular. The INSPIRE is also very poorly documented. There sure is a lots of documentation but very few practical examples.

But was INSPIRE a good choice? Simple data formats enables quick and simple development but comes soon short in features. Serious business requires reliable interfaces and using well defined generic standard is the best way to ensure reliable design. When fully implemented by all European countries, INSPIRE will provide a maximal amount of technical reusability. And it sure provides good semantics. Well designed open data has to be linked to other data, linked spatially (and georeferenced) and referable. One might think that these aspects are natural in weather data, but it’s not a case. Meta data used in FMI’s internal systems wouldn’t tell much to external user. After all, INSPIRE is designed 5 stars¹⁶ in mind.

Is INSPIRE commonly used for open data? No. Open Knowledge Foundation recommends some file formats¹⁷. Anyway, even though they march for openness, they don’t support OGC standards very much. That’s an serious issue, but another story for another rainy day. As a result though, open data portals are not following any standards which cause interoperability problems and huge amount of extra work to users.

Well Governed

Data governance is a serious issue. Unlike in many other business, weather industry has always lived from the data. There are long traditions to produce, manage, archive and consume data. But still, open data project improved our data governance. First of all, the project helped us to get all data in one place — behind one service. The project also improved our meta data practices. It forced us to describe our data sets very precisely. Our users don’t know our data as we do.

One specially good aspect¹⁸ in FMI Open Data Portal is, that open data is published by the same manner than FMI use it itself. This helps keeping data correct and up to date.

Maybe the most challenging aspect in opening the data is a culture change. Suddenly, People have to work in a showcase. The results of their work is public, their processes should be public. They get comments and critics. One example of this is the opening phase itself. In practice not all data can’t be opened at once. Selecting the data sets which are opened should be done in a transparent way¹⁹. The process should be clear and open for comments.

I think that change management can’t be emphasised enough!

Openly Usable

Even when talking about open data, ”openly usable” is not a trivial question. Open data ain’t free. Someone pays it and the funder typically want some kind of control and tracking. But all kind of control and monitoring decrease openness.

Also FMI requires its users to register. After registration, user gets an apikey which has to be added to every WMS and WFS requests. The requirement comes from need to report usage information. 5.8 millions is not a small amount of money — anyone investing this much, sure want’s to know what gives. The apikey requirement was also put in place to ensure that all can have similar access to the data. Unduly heavy usage can easily be tracked and shut down with the apikey.

Apikey requirement is quite typical. For example Facebook and Google use same kind of authentication. But registration has still gained some critique. Reasonable or not, the registration cause one severe implication that resources can’t be linked directly²⁰. One can’t give URIs to the data which makes it less referable and thus less usable. In practice personal apikey makes also documentation and presentations much more complicated when examples can’t be working and easily tested.

Another issue which can reduce usability is a licence of the data. FMI and Finnish Government in general²¹ is using CC4²² as a licence. It’s an open and free licence which requires only credits for the original author. There are only few more open common licences available but even still this licence has gained some critics.

Ideology behind open data movement is strong and well reasoned. Still, it’s good to remember that open data isn’t free and some compromises has to be made also from user side. Specially in the beginning when organisations are moving to open data they may need better control and more detailed information about their data usage. When ecosystem is ready and stable, things can be loosened more.

Public

Open data doesn’t help if nobody knows about that. FMI has a fortunate position that it’s very well known organisation. We didn’t need to advertise our data. But we sure had to work hard describing how to use the data portal. We kept workshops, press conference and speeches²³. We also participated in Apps4Finland²⁴ coding competition. We told about things in Facebook²⁵ and in Twitter²⁶. We also created some example programs^27> and JavaScript Libraries²⁸ to lower barrier in using the portal.

And of course we did quite a bit of documentation²⁹. I think this all went relatively well. People do know about our data and I haven’t met anyone who haven’t been able to use the portal. Our documentation is complete and thorough. But it still would need some practical step-by-step quick start guides. When documenting systems like this, it’s always good idea to find someone who don’t know the system beforehand to write the user’s guide.

Impacts

So, after getting things going full speed the whole FMI was working very hard to get everything ready. In March 2013 we opened a beta version of our portal and in June the portal got official status. We got all systems up and running and most data sets open. I got over hundred hours extra on working time account.

Did we do well? Measuring the impact is hard. Danish government did an impact analysis of they data before and after publishing the data as an open data³⁰. This is a clever approach but measuring the impact is still hard. Even getting the numbers is a serious task.

generic_graph

Our numbers are pretty good. At the moment we have over 10 000 registered users and over 5 data downloads every second. In total, over 300 million transactions have occurred since 2013. What’s also notable, the growth have been steady. The growth have never been very rapid though. Systems using the data are relatively complex and it takes time to implement them. Anyone publishing open data should be patient. Even three years is a short time to expect any significant results.

Another notable issue in numbers is that they are still very modest compared to FMI ’s Customer Data Service (providing the very same data for fmi.fi³¹, mobile applications³² and clients). While 300 million requests have occurred since 2013, at the same time over 32 billion requests have been served from Customer Data Service. Premium service is a vital aspect at the side of open data portal while serving the society.

Numbers are also hiding one distressing issue. Only 40-50 % of registered users are actually downloading something. One can only guess who are this majority who have registered but never downloaded anything. Some of them may have expected to get a nice user interface to fetch the data to excel. Some of them may have planned to do something at their free time but never managed to finish their project. Who knows what else.

Open data by definition means that it’s hard to keep touch with users. Apikey helps to get some granularity to the numbers but that’s pretty much all. That’s also one reason why it’s extremely important to be out there and talk with people. There’s no better way to know what’s relevant than talking to customers.

Personally, I find FMI Open Data as a success, both personally and from organisation point of view. But the World goes forward. One should not stop in publishing open data. Even large organisations often need partnerships³³ to get maximal affect with their data. Also open source code is required³⁴. To my delight, FMI is on very good track. But that’s another story.

¹https://en.ilmatieteenlaitos.fi/open-data
² http://www.opengeospatial.org
³ http://inspire.ec.europa.eu
⁴ http://catalog.fmi.fi/geonetwork/srv/en/main.home
⁵ http://en.ilmatieteenlaitos.fi/open-data-manual-fmi-wfs-services
⁶ http://inspire.ec.europa.eu/inspire-roadmap/61
⁷ http://portal.opengeospatial.org/files/?artifact_id=5929&version=2
⁸ http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798
⁹ http://www.opengeospatial.org/standards/wms
¹⁰ http://www.opengeospatial.org/standards/wfs
¹¹ http://www.opengeospatial.org/standards/om
¹² https://github.com/fmidev/smartmet-server
¹³ https://www.lvm.fi/lvm-mahti-portlet/download?did=89517
¹⁴ http://dev.hsl.fi/
¹⁵ http://roopetervo.com/some-words-about-inspire/
¹⁶ http://5stardata.info/en/
¹⁷ http://opendatahandbook.org/guide/en/appendices/file-formats/
¹⁸ http://opendatatoolkit.worldbank.org/en/technology.html
¹⁹ http://sunlightfoundation.com/opendataguidelines/#prioritization
²⁰ https://www.w3.org/TR/gov-data/#concepts.link
²¹ http://www.jhs-suositukset.fi/suomi/jhs189
²² https://creativecommons.org/licenses/by/4.0/deed.fi
²³ http://www.slideshare.net/tervo
²⁴ https://trello.com/b/Fg4ESbPs/apps4finland-2014
²⁵ https://www.facebook.com/fmibeta/
²⁶ https://twitter.com/meteorologit
^27>https://github.com/fmidev/metoclient-ui
²⁸ https://github.com/fmidev/metolib
²⁹ http://en.ilmatieteenlaitos.fi/open-data-manual
³⁰ http://inspire.ec.europa.eu/events/conferences/inspire_2014/pdfs/19.06_4_09.00_Tina_Svan_Colding.pdf & http://opendatahandbook.org/value-stories/en/danish-address-registry/
³¹ http://ilmatieteenlaitos.fi/
³² http://ilmatieteenlaitos.fi/palvelunumerot-ja-mobiilisaa
³³ http://sunlightfoundation.com/opendataguidelines/#partnerships
³⁴ http://sunlightfoundation.com/opendataguidelines/#open-code

Monthly Archives: helmikuu 2017

Retrospective to FMI Open Data

Machine Readability

Well Governed

Openly Usable

Public

Impacts

Viimeisimmät artikkelit

Arkistot

Kategoriat