Welcome to meeting No. 20 of the Standing Committee on Government Operations and Estimates. We are continuing our study on government's open data practices. We have several witnesses with us today, starting with Ms. Lyne Da Sylva, Associate Professor, School of Library and Information Science, Université de Montréal.
We also have with us via videoconference from Oxford, Mr. Richard Stirling, International Director, Open Data Institute, in the United Kingdom. From Paris, France, we have Ms. Barbara-Chiara Ubaldi, E-Government Project Manager, Organization for Economic Cooperation and Development, and via videoconference, from Sheffield, Ms. Joanne Bates, Lecturer in Information Studies and Society, at the University of Sheffield in the United Kingdom.
As is our custom, I will remind the witnesses that they may make opening remarks for a maximum of 10 minutes. Following that, committee members will ask questions of the witnesses.
With no further delay, I would like to welcome Ms. Da Sylva, who is with us in the room today. We are ready to hear your opening remarks as they relate to our study on the government's open data practices.
Thank you for being with us this morning.
:
Thank you for this invitation.
I was told that it may be a good idea for me to introduce myself first in order to assist you in your questions, which I will be happy to answer afterwards in either English or French.
I am a bit of a strange beast. My training has been in several areas. I completed a Bachelors in Mathematics and Computer Science, after which I did a Masters in Linguistics and a Doctorate in Linguistics, with a focus on artificial intelligence. This lead to my work on what is called natural language processing, that is, the use of computers to understand texts written in French, English, Italian, and so on, for the purposes of translating them, and automatically correcting or processing them.
I worked, among other areas, in the private sector as a natural language processing—or NLP—software developer. I am currently a professor at the School of Library and Information Science. I was hired under their digital information management envelop. That is really our main theme, that is digital information.
My current expertise is in two areas. I work in the area of natural language processing as it applies to document management. On the other hand, I'm focusing more and more on digital libraries for document collections, whether they be library documents, archives, museum document or other kinds of documents, and their access functions. Certain websites and databases would also fall under digital libraries. Collections and data sets are an example of digital libraries. I am particularly interested in these issues from that perspective.
I have based my opening remarks this morning on the five questions I received. I just wanted to give you an introduction first.
We talk about open data, linked data, linked open data, RDF data. They don't all mean the same thing. There are more or less open types of data. It is not enough to publish data for that data to serve as an excellent example of open data. An excellent example, the best format, is the RDF format which is user-readable and operable.
There are several jurisdictions that will publish data, but that data is not necessarily in an easily usable format. There are degrees of usability in what is provided.
Another term that is used is big data. Once again, that is something different. That term refers to research based on massive data. Even though it is different, one can only expect that the advent of enormous quantities of data will significantly change people's attitudes towards knowledge and the use that can be made of that knowledge. That will change everything.
The first question was how the Government of Canada compares to other jurisdictions, in Canada and abroad. I compiled some data in a table that is in the notes that I gave to the committee. It includes data on the availability of data from governments in Canada and abroad.
The results are quite variable both in terms of the number of data sets and degree of real openness. Some governments publish their documents as zipped PDF images, which is not necessarily the most desirable format for open data.
I am not going to go over the table in detail. I would say quite briefly though that the United Kingdom is known internationally for its extensive publication of data, including a large quantity of truly open RDF data. The number of data sets is approximately 17,000.
Canada's number of data sets is over 190,000, which is higher. On the other hand, Canada's data is less open. There are more zipped files, geographical maps, for the data. There is currently exactly one data set in RDF, which is a little sad. The table describes much of the data and it would be too long to go over that now.
I have also pointed out a website, Linking Open Government Data, which has ranked a number of countries. It puts Canada in second place for publishing data sets.
Clearly that ranking is based on the number of available data sets, but not necessarily on the ease with which those data can be accessed.
I am now going to answer the second question, that is, how does this compare with what the private sector is collecting and making available.
Obviously, public administrations do not publish the same kind of data. They publish information on the activities of the public administration, public services management, natural resources, etc. The private sector is much more reticent to share their data. The reasons for this are quite obvious. Businesses are afraid of losing their competitiveness. Many incentives are offered to the private sector to meet certain consumer expectations, because consumers want societies to be more transparent and environmentally responsible, among other things. The public sector acknowledges that this can lead to some risk sharing. For example, insurance companies and pharmaceutical companies can benefit from other businesses' data in order to improve their competitiveness.
The third question is how can proper use of public data stimulate job creation and economic added value? The availability of open data clearly encourages the development of various applications. However, one should not only think of the money that can be made. Rather, one should consider public data as a new public service, just like libraries. That's the parallel that should be made, rather than considering this as an economic added value for the purpose of immediately making money.
The fourth question is how we can make sure that there is accountability and transparency, while being prudent on privacy issues? The distinction must be made— and others do make this distinction—between collective data, that can be open data when it is anonymous, private or personal data, which should be available to the individuals but not to the public, and transformed data, which can be anonymized before being published. It's important to define a series of confidentiality principles in order to manage this.
The last question is how we can make sure that public data serves the needs of the population of Canada? I have identified four potential ways of doing that. We can have new public officers, for example a chief data officer or something similar. Obviously there has to be a public and transparent official policy along with new structures, such as citizens' advocacy groups. Furthermore, we need to include the documentation sectors, that is, library scientists and archivists, who are used to managing data and taking into account user needs in order to improve their services.
Thank you.
I want to start, in the same way as the other witness, by giving a little bit of extra context about me. I was instrumental in the U.K.'s rollout of open data, working in the Cabinet Office to write the initial policy and also doing the first 12 months of delivery and release of data.
To my mind, a political opportunity in open data has been created by the work and resolution at the G-8 for the G-8 open data charter, which was signed by all G-8 countries last year. This means that the biggest economies in the world will start releasing more and more data, and they're releasing more and more data in a way that is useful. They're releasing data around the core information assets, around such things as locations, times, environmental information, in a way that can be combined with other data sets and can also be combined across borders.
The first question this committee asked was what the value of this is. It's a huge opportunity. The McKinsey Global Institute published a report that put the value of this market at $3 trillion globally. Other reports cover smaller geographic regions and are of similar orders of magnitude. So the opportunity is enormous here.
The Open Data Institute, which I'm from, is a not-for-profit initially funded by the U.K. government. We were created to accelerate the benefits in the U.K. economy. We're here to bring economic, societal, and environmental benefits from open data, to answer the “so what?” question. We're here to make sure that there is some impact.
The way we do that is through training people, building capacity. We foster start-ups in our space. We have 10 open data start-ups as part of our program, employing 50 people—they were employing about 20 when they joined the program—and we convene academic, private sector, and public sector communities around particular problems and challenges and sectors.
In the last 18 months, because we've only been going 18 months—it's still a very new sector—we have a few examples of ways in which that $3 trillion number stands up. One of our observations was that there were a lot of enormous macro benefits and big numbers attached, and there were lots of tiny companies, but there was very little in the middle. So in the last 18 months we've worked with other people to identify £200 million cash savings in our National Health Service gross budget. We've mapped out the corporate structures of the investment banks in the U.S.A., drawing together information from three different regulators to provide insight in two months that none of those regulators had themselves. We've worked with the Bank of England, the major financial regulator in the U.K., to prove that you can take a data-rich, regulation-like approach to a market, in the new peer-to-peer lending market, which is now at $1 billion a year.
Many of these examples come from taking open data or data that was previously closed and combining. Many of the really interesting things happen at the intersection of open data and closed data, or open data and big data, or open data and personal data.
That leads into some of the questions you were asking. How are businesses approaching this? Are governments ahead of business? Well, they are, at the moment. This is one of the few sectors in which the government is slightly ahead of industry. Through our work of convening industry and through our corporate membership program, we talk to an awful lot of businesses about how they're approaching the open data challenge and how they view open data as an opportunity.
It feels as though the conversations we're having with them are very similar to the conversations people were having inside governments about five years ago. We're starting to see the first big businesses releasing open data as part of their business as usual.
There are some great examples from the U.K., often brought on by adversity. Tesco, one of the major retailers, is committed to publishing open data about every bit of own-brand food they create. They're doing that to show the consumers what they're eating so they can rebuild trust in their products.
One of our members, Telefonica, is looking to release some of the population data they know from the way mobile phones move around London during the day. We actually used that in one of our policy analyses to show the type of population in London and to show how that impacted on some of the resource allocation in public services and fire stations.
The next question you asked was around anonymization and how you can protect people's privacy in a landscape where open data is becoming ever more prevalent.
One of the organizations we're a member of is the UK Anonymisation Network. They do fantastic work to check people's work and they ask all of the questions around whether people have taken the right steps to protect people's privacy before any large data set is released. The £200 million savings that I mentioned earlier is drawn from a data set that contains every prescription written in England and Wales. That would possibly disclose personal information, but the NHS Information Centre has already taken the steps to check that they've done their anonymization well and also that it can then be checked by this peer-review process, the UK Anonymisation Network, through which statisticians check that all the right things have been done.
There is something called the open data barometer, which isn't quite large enough to be seen. You were asking how Canada compares to the rest of the world. Well, this is a nice visual representation of how Canada compares to the rest of the world on the release of data, particularly in terms of the data sets that are being requested and signed up to in the G-8. You can see that Canada is currently eighth in the world in the release of data. It has particular strengths for some of the core data that's being released, but it still has a little way to go on getting some of the social and economic benefits from the release of data.
I'd be very happy to send a link to this site to the committee so that everybody can see it.
In terms of how Canada could move up in the rankings and what my ideal ask would be, I think there are a couple of core data sets that could be usefully examined as to whether or not they could be released. We've done some work to try to make it easy for people to build services on the back of open data. An awful lot of work has gone into the technical standards around data release, and the previous witness talked about that.
We've put some work into the social side of data release. If you believe that open data is a raw material for the digital age, then as is the case with any raw material, you care about certainty of supply, you care about how often you're going to get a release, and you care about how much time and effort people will put into customer engagement, talking to you about how you use the data and what things are important to you. That's something we've tried to codify with open data certificates. We've given that away to the world.
The final thing I would leave you with is that this is a global market. It would be great if we could start tackling some of these challenges globally.
Thank you very much.
I would like to start by giving you a very brief oversight of what we do at the OECD, what we've been doing with open data with the 34 member countries of the OECD and increasingly with the non-member countries. I would like to clarify that we work with governments for our open data project, which concerns the release of data in open formats by governments. So we don't work with the private sector.
Our project started about two years ago, and I think it's important to underline that we started the project at the request of the governments. We have a group of CIOs who represent the governments of the 34 member countries of the OECD, including Canada, who asked us to look a little more in-depth at the strategies, implementation efforts, and the impact of creation efforts that they were putting in place. We produced a working paper highlighting key issues, and we conducted a data collection in 2013 across the countries to be able to see in more detail what governments were doing in terms of being strategic, developing quotas, but also in trying to achieve the value they expect to get out of their open data strategies and initiatives, and to measure these impacts.
I think it's very important to underline that what we found out was that within the community of practitioners, both inside and outside the government, there was and there still is some confusion when it comes to definitions. This means there is much overlap with the activities, for instance, of the freedom of information movement vis-à-vis the open data movement, the discussion on access to information and open data, and how they complement each other. There is still some confusion between open data in the broader sense and open data applied within governments. There is still a little bit of confusion between open data and big data, and still some governments tend to confuse the discussion about data analytics and data mining and open data. We thought that it was extremely important, and still is extremely important, as governments progress in the implementation for open data strategies and initiatives, to work with them to clarify the definitions they refer to.
Briefly, I would like to share with you some of the outcomes of the 2013 data collection we ran that highlights some of the key challenges that governments still deal with. These challenges are of different natures. There are policy challenges when it comes to the strategy, for instance—what kind of strategy and how to make sure that the strategy for open data aligns or is better integrated with social and economic development strategies, open government strategies, public sector reform strategies, and digital agendas for governments, for instance. There are technical challenges—how to, for instance, enable interoperability and integration that didn't exist, how is it possible to foster the linkage of data sets to be released in open formats, and all the related technical issues that governments are still dealing with in many instances.
But there are also organizational challenges that, according to our survey, still remain some of the most important challenges that exist. For instance, administrations, unfortunately, are still very much silo-based in the way of functioning, meaning there is a strong sense of ownership that different public institutions associate with the fact that they are the ones responsible for producing, collecting, and distributing certain data sets. These represent a big challenge in some countries when they started thinking about the development of open data initiatives because they encounter a certain level of resistance within the public agencies.
Last but not least, there are challenges that are of a legal nature. The other witnesses, for instance, mentioned the relevance of privacy and security and how we deal with these issues. It is not only for these aspects that it is important to look at the legal constraints that exist in some legislations. For instance, I will provide two additional examples. First are access to information laws, or freedom of information acts, which were adopted by many OECD countries from decades ago. They are now going through revisions, for instance, to make sure that they also accommodate the need for open data, not just for access to information. There are also restrictions, legally speaking, that concern the sharing of data within the public sector. So at times, for instance, linked data sets can support their data analytics, which can help identify trends to improve policy-making and service delivery, but still some legal restrictions do not enable different parts of the administration to access the various data sets.
Now when it comes to value, we saw that there are three main sets of value that governments are trying to achieve. As an organization we do not advocate for any approach or for any value sets, but I think it's important to underline that there is economic value that can be achieved through open data in the wider economy.
The other witnesses mentioned for instance the ease with which business start-ups are created. I would like to add also the emergence of new private sector type businesses, for instance the so-called infomediaries that enable the relevance of the data being open to a wider group of citizens that, in many instances, would not know how to get the most value of the raw data sets being made available.
There is economic efficiency that can be gained within the public sector, improved service delivery, improved performance, and improved efficiency in the internal dynamics. There is also the social value, for instance in terms of empowering citizens to make more informed decisions on their own lives. It tends to do with a different type of engagement, for instance, and participation in policy-making and service delivery.
Last, but not least, there is a third sector value that has to do with what we call good governance value or political value. In other words, the fight for higher transparency, higher accountability, and higher responsibility of governments.
We at the OECD are now looking at the next step of what we would like accomplished in collaboration internationally with other organizations, with institutions like the ODI, and within contexts that are internationally collaborative like the OGP, the G-8, and the G-20. The big focus we have right now is on supporting the further strengthening of the strategic approach and implementation, but also focusing a lot on value creation impact assessment. Because we do believe that as investments keep being made by governments—and let's not forget that open data is not for free—there is a financial cost for governments.
It's important to keep an eye on the value being created and on the measure of this value. We are part of the working group on open data, part of the OGP, so we collaborate with other, not only international organizations, but governments and institutions to make sure that this effort moves ahead internationally, so not only working with individual governments.
So now I come to the questions that you asked. How does Canada stand in relation to other jurisdictions? Certainly we saw Canada being grouped among the countries of the OECD that we defined as quick followers, meaning there have been a group of countries that have been the pioneers, the U.K., the U.S. They have been excellent in being ambitious in this context right from the beginning.
Then we have other countries that have taken other approaches. We also have countries that have been, like I said, the quick followers. I can mention for instance France, Mexico, and Canada, which have caught up quite quickly, even if at different levels than the other countries, in following up what have been the good examples set by, for instance, the U.K. and the U.S.
In that sense, I think, an extremely positive value-add of Canada has been the one of linking open data with open government, the one of linking digital government strategy with the open data strategy, the effect of having adopted an approach that nurtures collaboration internally, the fact that a committee was created to gather various representatives from the various jurisdictions.
I think a big focus has been on improving the portal, the first version of the portal, to in June 2013 the release of a new version that increases not only the accessibility of the data sets but also the use of social media features that focus very much on increasing the engagement of the citizens.
Because when we come to value creation—I think this is one of your questions also—how do we make open data valuable for the Canadian community? I think that a key point where we see the need for strengthening the efforts of OECD member countries and maybe Canada could be strengthening the focus on knowing the demands of the data.
If you consider the three sets of value mentioned, there are different data users in the community of users, which may have different needs. So knowing the demand is important. Nurturing the demand is important. Nurturing the engagement in the use of the data is essential to produce the value.
In that sense, I think it's important down the line. For instance, in the data collection we conducted last year, Canada ranked as one of the governments that had the highest number of data sets available. But as one of the witnesses mentioned as well, I think it's very important now to move ahead in the level of openness and the visibility of these data sets, which have an important impact on the value creation.
Last but not least, I would like to refer to the point on privacy that you were asking about. In addition to what the other witnesses mentioned, I think in order to protect privacy it is extremely important to have clear guidelines for the public servants. Remember that public servants are key actors in the ecosystem, and therefore, keeping the focus on training civil servants and raising their awareness of breaches of privacy that may emerge from a number of actions they can do in relation to open data is essential.
It is essential more and more as social media efforts are combined with open data efforts and mobile government-supported efforts such as, increasingly, the use of mobile technologies within government, because all of a sudden we start merging the value domains that are relevant to produce the value for open data. But I think it's very important to remember that civil servants need to be aware of the risks for security and privacy that emerge from the linkage of these three different domains.
Last but not least, yes, I agree with the previous witnesses, in the sense that I think governments are ahead of businesses in these aspects, in a sense. But I wouldn't be unfair and compare government with the private sector in terms of how much they are opening up, because I think there are important concerns in terms of privacy and security that relate to data sets owned by governments, which are very different from data sets owned by some entities in the private sector. I think comparing the two is important, but I think it's even more important to keep high the comparisons across governments in the world to make sure that the best practices are shared and replicated.
Thank you.
:
Thank you very much for inviting me and hello from Sheffield.
I'm a lecturer in information politics and policy. I've been researching the politics of open government in the U.K. for the last few years now. What I've decided to concentrate on in my opening presentation today are the two themes that I saw emerging in the questions that were presented to the panel by the committee.
First of all, I'm going to talk a little bit about how Canada compares to other jurisdictions; and secondly, I'm going to talk about this issue of generating value from open government data.
The first question, then, is how does Canada compare to other jurisdictions? There's a number of different methods that we could use to compare different countries' open data initiatives. A very simple approach would be the one taken by the open data index, which is an Open Knowledge Foundation supported project. This basically just compares a number of different data sets that have been opened in different categories by different countries. In this kind of method, Canada comes out 10th overall out of 70 countries, so it's doing pretty well there.
A more complex approach is the one that Richard mentioned, the open data barometer project, which was supported by the Open Data Institute and the Web Foundation, and published last year. This more complex methodology looks at open government data readiness implementation and impact across different countries. In this methodology, Canada scored eighth out of 77 countries, so it's doing a little bit better in this sense.
Now, the researchers behind the open data barometer project used a number of different methods to collect the data. One was an expert survey that they did across all the different countries, and they used quite a robust methodology here to gather and to analyze this data. I think this is the best sort of comparative data that we have at the moment. What this data suggests is that Canada's is a very well-resourced open data initiative, but in terms of government support, in terms of incentivizing reuse, for example by competitions and grants and things like that, Canada is perhaps a little bit lower compared to some other countries. Also in terms of the training that's available for potential reusers in Canada...[Technical difficulty--Editor]...from the experts that Canada is a little bit lower there as well.
That's how Canada fares in terms of the implementation coming out of the open data barometer. In terms of impact, as Richard also said, Canada seems to be doing pretty well comparatively in terms of the political impact, and even the economic impact of open data. Although scoring only 3 out of 10 through this survey, that does actually compare quite well. It brings in Canada to joint eighth overall. But in terms of social impact, and this includes things such as environmental sustainability and the inclusion of marginalized populations in policy-making through using open government data, Canada is scoring relatively low, scoring 0 out of 10 for environmental impact and 2 out of 10 for social inclusion. Now relatively speaking, that means that Canada is doing quite poorly in terms of environmental impacts, but is about average for impact on socially excluded populations. There's been very little impact from open government data on improving social exclusion issues.
What this study also highlights is that this is quite a similar pattern to what we're seeing in the U.K. In the graph that Richard showed earlier, the U.K.'s pattern is very similar as well. The social impact of open government data in both the U.K. and Canada is a lot lower relative to the observed economic and political impacts. This suggests that perhaps not enough is being done in both Canada and the U.K. to enhance that social impact from open government data.
This pattern is not the same in every country. For example, in the U.S.A., Sweden, and New Zealand, those countries are scoring much better relatively on the social impact in relation to the political and economic impacts, which suggests that there might be interesting best-practice cases and similar things that you could use from those countries if you're interested in increasing the social impact of open government data.
Now what I would also point out is that both of these studies, the open data index and the open data barometer, are very quantitative studies that are interested in ranking countries against each other. My research is interested in the political drivers behind open government data.
I'd say there's a real need for further comparative political research in the drivers behind open government data across different countries. I think we need to really be asking, who is benefiting from specific decisions in different jurisdictions? Who is being empowered and disempowered as a result of where the boundary is being drawn between open and closed data in different countries? Who's being empowered and disempowered as a result of where the investment is being made, where the reuse of open government data is being incentivized? As well, what do the regulatory contexts in different countries allow in terms of what is allowed or prohibited in terms of open government data reuse?
That takes me on to thinking about the potential value to be generated from open government data. I just want to state quite explicitly there's no simple linear trajectory from opening up data to generating positive societal impact. A lot of other things go on within that space as well.
In terms of economic value, lots of claims have been made based on economic modelling. Richard referred to the McKinsey report. There has been other research done as well, such as Rufus Pollock's work in the U.K., but there are still a lot of uncertainties in terms of the conclusions this research comes to.
In terms of the headline figures that research like this promotes, such as x trillion pounds can be added to the global economy, £6 billion can be added to the U.K. economy, I think we need to remember that all economic growth is not necessarily good growth. Open government data can lead to the production of all sorts of exciting, innovative, socially beneficial products and services. Equally, open government data can be used to develop products and services that could have negative social implications even though they generate substantial profits and might contribute a lot to GDP.
One example I'm thinking of here is the weather derivatives market, which is heavily dependent upon open weather data but has a very questionable relationship with climate change mitigation.
So that's the economic value.
In terms of generating social value, which is an area that the open data barometer project suggests Canada is relatively weak in, I think what we need to see really is an investment in the development of an infrastructure that brings together organized civil society, local communities, researchers, and other domain experts, with open data, to both source data sets from public bodies to advise on their collection of data that is useful for them to be using, and to develop methods of data analysis and create tools and resources that can engage and critically inform common concerns.
We're starting to see a little bit of this in some of the work that the Open Data Institute does, but I think that could go further and be more widespread as well.
In conclusion, I just want to reiterate really that we need to avoid the assumption that there is this simple linear trajectory from opening data to generating positive societal impact. When making policy decisions, I think it's important to think about what specifically you're aiming to achieve with open data, and then think about the wider policy ecology that needs to be thought about in order to make that happen.
Thank you.
:
Thank you for your question.
We try to make governments do it in a way that is right, not just because it's the flavour of the moment. So I think your question is extremely important. We think there is value and a positive impact that still needs to be fully demonstrated; that is true. In some areas it cannot possibly be quantified, like in the social area that some of your colleagues mentioned before. Instead, it's an extremely important set of values that can be targeted.
For instance, in terms of social value, there is certainly an increasing number of examples showing how open data has increased the participation and the engagement of parts of society that otherwise would not be brought into the discussion and dialogue with governments in terms of service delivery and policy-making. However, that requires that the government focuses not only on the usual actors who are interlocutors in this area, for instance, the private sector, but there are other actors in the ecosystem like journalists, civil society organizations, citizens associations, librarians, and so on, who are non-typical groups of actors who need to be reached out to.
From the perspective of the OECD, the reason we are focusing so much on this is not because many governments have pushed it up on the agenda, but because this has an impact of changing the way the government conceives a number of actions, ranging from policy-making to service delivery. The challenge is big, so I cannot tell you that there are demonstrated values. There are important estimates that my colleagues mentioned. There's no clear data yet that demonstrates the value, but there are a number of examples from all levels of jurisdictions that demonstrate there are changes in the way the government interacts with society in creating economic and social value.
Last but not least, in terms of transparency and increased trust, there is a tendency showing that the higher transparency and openness of governments in releasing key data with information on the operation—