private thoughts about life behind data processing

Retrospective to FMI Open Data

Three years have passed since Finnish Meteorological Institute released it’s open data portal and now it’s good time for a retrospective. When the Open Data Project started, I was just finished my M.Sc thesis (although I had worked full time at FMI several years already) and I was ready to take a serious role in something cool. And serious role I took indeed. In the beginning, I worked in the project as an OGC2 and INSPIRE3 specialist but soon I found myself as an overall architect and responsible for the Download Service.

It’s worth noting that this is my personal blog and this text tries to reflect on only my part and my opinions about the Open Data. 

FMI’s open data initiative covers a significant amount of often updating data. The service contains weather, marine, air quality and radiation observations, weather and marine forecasts and different kinds of climate data. To browse the whole content of the data one can either access Data Catalog4 or Download Service’s stored query listing5. Many of the data sets are large (size up to ~30 gigabytes) and almost all of them have an update interval from 1 minute to 6 hours. This combination of large size and high update interval makes the weather data quite unique. FMI Open Data Portal is practically fully INSPIRE compliant with correct services, harmonised data formats and proper catalog.

INSPIRE directive require every EU state agency to share it’s geospatial data in certain format via certain services but it does not require open data. The simplest way to meet INSPIRE directive is to create a simple RSS-feed with link to downloadable items. More sophisticated services, relying heavily on OGC standards, has to be ready not until 21/10/20206. After deadline, the metadata need to be shared via CSW-interface7 in ISO 19115 -format.8 The data has to be accessible via View Service, meaning WMS 1.3.0 interface9. Download Service has to be established to download the data itself. WFS 2.010 is one obvious choice while some others can be also used. The data has to be provided in INSPIRE specific data models (derived from OGC O&M model11) directly from download service or via specific Transformation Service.

inspire

At 2012, we were sitting at one large meeting room. There was maybe 20 of us. We were starting a project which was established to meet INSPIRE requirements. The data group had created several different scenarios about the data release but in the beginning we had no decisions.

One notable peculiarity in INSPIRE is that it provides these two different way to provide the services. While some people supported more RSS-feeds as they are simpler I recommended going to WFS straight from the beginning. Double work makes sense very rarely. Luckily I got some support and we decided to implement WFS interface to our data server SmartMet Server12. There was also a clear consensus in the project team that we are hitting the INSPIRE requirements and possible open data with same solution. Double work makes sense very rarely.

Few months went on designing data formats, fulfilling the catalog and speculating about WMS view services. In august the Ministry of Transport and Communication decided to support FMI Open Data with 5,8 million euros13 (part of the budget increase was needed to compensate decreased data selling shares and part to the data releasing project). And things got going.

We had chosen our way. But how did we do? Was following INSPIRE specifications a good choice? Of course we did have other ways. We could had chosen RSS-feeds for INSPIRE and some other way for open data. Or maybe, just maybe, we could had chosen to work as a backend for our open data users as HSL does14.

Steven Adler, IBM Chief Data Strategist, specified requirements for well published open data in OGC meeting in January 2016:

  • It has to be linked to other data.
  • It has to be linked spatially (and has to be referable internationally).
  • The data has to be machine readable.
  • The data and the publishing process has to well governed.
  • The project has to be public; people has to be able to find the data.
  • The data has to be openly usable.

Note that this list does not contain easily usable, although it’s of course a good thing. Things should never be any more complicated than they need to be — but neither any simpler. Let’s see how we are answering to these requirements.

Machine Readability

Choosing to follow INSPIRE directives has gained some critics but also some praise. Critics has mainly came from private developers and praise from larger companies. In my previous post15 I speculated about good and bad sides of INSPIRE directive. It’s based on standards and includes a very good semantics structure but it’s unduly complicated. And probably because of complexity, it’s not very popular.  The INSPIRE is also very poorly documented. There sure is a lots of documentation but very few practical examples.

But was INSPIRE a good choice? Simple data formats enables quick and simple development but comes soon short in features. Serious business requires reliable interfaces and using well defined generic standard is the best way to ensure reliable design. When fully implemented by all European countries, INSPIRE will provide a maximal amount of technical reusability. And it sure provides good semantics. Well designed open data has to be linked to other data, linked spatially (and georeferenced) and referable.  One might think that these aspects are natural in weather data, but it’s not a case. Meta data used in FMI’s internal systems wouldn’t tell much to external user. After all, INSPIRE is designed 5 stars16 in mind.

5stars-data

Is INSPIRE commonly used for open data? No. Open Knowledge Foundation recommends some file formats17. Anyway, even though they march for openness, they don’t support OGC standards very much. That’s an serious issue, but another story for another rainy day. As a result though, open data portals are not following any standards which cause interoperability problems and huge amount of extra work to users.

Well Governed

Data governance is a serious issue. Unlike in many other business, weather industry has always lived from the data. There are long traditions to produce, manage, archive and consume data. But still, open data project improved our data governance. First of all, the project helped us to get all data in one place — behind one service. The project also improved our meta data practices. It forced us to describe our data sets very precisely. Our users don’t know our data as we do.

One specially good aspect18 in FMI Open Data Portal is, that open data is published by the same manner than FMI use it itself. This helps keeping data correct and up to date.

Maybe the most challenging aspect in opening the data is a culture change. Suddenly, People have to work in a showcase. The results of their work is public, their processes should be public. They get comments and critics. One example of this is the opening phase itself. In practice not all data can’t be opened at once. Selecting the data sets which are opened should be done in a transparent way19. The process should be clear and open for comments.

I think that change management can’t be emphasised enough!

Openly Usable

Even when talking about open data, ”openly usable” is not a trivial question. Open data ain’t free. Someone pays it and the funder typically want some kind of control and tracking. But all kind of control and monitoring decrease openness.

Also FMI requires its users to register. After registration, user gets an apikey which has to be added to every WMS and WFS requests. The requirement comes from need to report usage information. 5.8 millions is not a small amount of money — anyone investing this much, sure want’s to know what gives. The apikey requirement was also put in place to ensure that all can have similar access to the data. Unduly heavy usage can easily be tracked and shut down with the apikey.

Apikey requirement is quite typical. For example Facebook and Google use same kind of authentication. But registration has still gained some critique. Reasonable or not, the registration cause one severe implication that resources can’t be linked directly20. One can’t give URIs to the data which makes it less referable and thus less usable. In practice personal apikey makes also documentation and presentations much more complicated when examples can’t be working and easily tested.

Another issue which can reduce usability is a licence of the data. FMI and Finnish Government in general21 is using CC422 as a licence. It’s an open and free licence which requires only credits for the original author. There are only few more open common licences available but even still this licence has gained some critics.

cc-license

Ideology behind open data movement is strong and well reasoned. Still, it’s good to remember that open data isn’t free and some compromises has to be made also from user side. Specially in the beginning when organisations are moving to open data they may need better control and more detailed information about their data usage. When ecosystem is ready and stable, things can be loosened more.

Public

Open data doesn’t help if nobody knows about that. FMI has a fortunate position that it’s very well known organisation. We didn’t need to advertise our data. But we sure had to work hard describing how to use the data portal.  We kept workshops, press conference and speeches23. We also participated in Apps4Finland24 coding competition. We told about things in Facebook25 and in Twitter26. We also created some example programs27> and JavaScript Libraries28 to lower barrier in using the portal.

And of course we did quite a bit of documentation29. I think this all went relatively well. People do know about our data and I haven’t met anyone who haven’t been able to use the portal. Our documentation is complete and thorough. But it still would need some practical step-by-step quick start guides. When documenting systems like this, it’s always good idea to find someone who don’t know the system beforehand to write the user’s guide.

Impacts

So, after getting things going full speed the whole FMI was working very hard to get everything ready. In March 2013 we opened a beta version of our portal and in June the portal got official status. We got all systems up and running and most data sets open. I got over hundred hours extra on working time account.

Did we do well? Measuring the impact is hard. Danish government did an impact analysis of they data before and after publishing the data as an open data30. This is a clever approach but measuring the impact is still hard. Even getting the numbers is a serious task.

generic_graph

Our numbers are pretty good. At the moment we have over 10 000 registered users and over 5 data downloads every second. In total, over 300 million transactions have occurred since 2013. What’s also notable, the growth have been steady. The growth have never been very rapid though. Systems using the data are relatively complex and it takes time to implement them. Anyone publishing open data should be patient. Even three years is a short time to expect any significant results.

Another notable issue in numbers is that they are still very modest compared to FMI ’s Customer Data Service (providing the very same data for fmi.fi31, mobile applications32 and clients). While 300 million requests have occurred since 2013, at the same time over 32 billion requests have been served from Customer Data Service. Premium service is a vital aspect at the side of open data portal while serving the society.

Numbers are also hiding one distressing issue. Only 40-50 % of registered users are actually downloading something. One can only guess who are this majority who have registered but never downloaded anything. Some of them may have expected to get a nice user interface to fetch the data to excel. Some of them may have planned to do something at their free time but never managed to finish their project. Who knows what else.

Open data by definition means that it’s hard to keep touch with users. Apikey helps to get some granularity to the numbers but that’s pretty much all. That’s also one reason why it’s extremely important to be out there and talk with people. There’s no better way to know what’s relevant than talking to customers.

Personally, I find FMI Open Data as a success, both personally and from organisation point of view. But the World goes forward. One should not stop in publishing open data. Even large organisations often need partnerships33 to get maximal affect with their data. Also open source code is required34. To my delight, FMI is on very good track. But that’s another story.


1https://en.ilmatieteenlaitos.fi/open-data
2
http://www.opengeospatial.org
3
http://inspire.ec.europa.eu
4
http://catalog.fmi.fi/geonetwork/srv/en/main.home
5
http://en.ilmatieteenlaitos.fi/open-data-manual-fmi-wfs-services
6
http://inspire.ec.europa.eu/inspire-roadmap/61
7
http://portal.opengeospatial.org/files/?artifact_id=5929&version=2
8
http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.htm?csnumber=53798
9
http://www.opengeospatial.org/standards/wms
10
http://www.opengeospatial.org/standards/wfs
11
http://www.opengeospatial.org/standards/om
12
https://github.com/fmidev/smartmet-server
13
https://www.lvm.fi/lvm-mahti-portlet/download?did=89517
14
http://dev.hsl.fi/
15
http://roopetervo.com/some-words-about-inspire/
16
http://5stardata.info/en/
17
http://opendatahandbook.org/guide/en/appendices/file-formats/
18
http://opendatatoolkit.worldbank.org/en/technology.html
19
http://sunlightfoundation.com/opendataguidelines/#prioritization
20
https://www.w3.org/TR/gov-data/#concepts.link
21
http://www.jhs-suositukset.fi/suomi/jhs189
22
https://creativecommons.org/licenses/by/4.0/deed.fi
23
http://www.slideshare.net/tervo
24
https://trello.com/b/Fg4ESbPs/apps4finland-2014
25
https://www.facebook.com/fmibeta/
26
https://twitter.com/meteorologit
27>
https://github.com/fmidev/metoclient-ui
28
https://github.com/fmidev/metolib
29
http://en.ilmatieteenlaitos.fi/open-data-manual
30
http://inspire.ec.europa.eu/events/conferences/inspire_2014/pdfs/19.06_4_09.00_Tina_Svan_Colding.pdf & http://opendatahandbook.org/value-stories/en/danish-address-registry/
31
http://ilmatieteenlaitos.fi/
32
http://ilmatieteenlaitos.fi/palvelunumerot-ja-mobiilisaa
33
http://sunlightfoundation.com/opendataguidelines/#partnerships
34
http://sunlightfoundation.com/opendataguidelines/#open-code