:
We will begin our 22nd meeting.
Today, we are hearing from two witnesses by videoconference. We have a connection with Michael Chui, partner at the McKinsey Global Institute, joining us live from Miami, the United States. Afterwards, we will establish a connection with the other witness, Mr. Baker, the Chief Executive Officer of the Chicago Open Data Institute.
As usual, we will let our witnesses make their presentation for our study on open data. Afterwards, the committee members will have an opportunity to put questions to the individual of their choice.
Mr. Chui, thank you for joining us.
[English]
You will have 10 minutes for your presentation.
[Translation]
Go ahead.
:
Thank you for the honour of being able to spend some time with you. Even though I'm in Miami now and I live in San Francisco, I actually was privileged enough to have grown up in Burlington, Ontario. It is truly an honour to be able to interact with this committee. I did prepare a few brief remarks, which I'm happy to share with you, but I'm actually looking forward to the conversation.
As introduced, my name is Michael Chui. I'm a partner at the McKinsey Global Institute, which is McKinsey and Company's research arm. I lead some of our firm's research on the impact of long-term technology trends. Basically I'd like to share with you a few of the findings from some of the research we conducted.
We published a report in October entitled “Open data: Unlocking innovation and performance with more liquid information”. Clearly, as I think people on the panel are aware, open data has become an increasingly important trend around the world, with over 40 countries having implemented open data portals. While a lot has been written about the importance of open data to unlock transparency as well as accountability in government and public institutions, we really focused on the economic potential that could be unlocked using open data.
Just to explain what we meant when we did our research, we actually viewed open data as being defined or varying across four dimensions.
The first was accessibility, or simply the number of people or the number of entities with access to data. Where more people had access to data, we considered it to be more open.
Second, we also considered machine readability. Of course, almost all data in some form can be machine readable, but some forms are easier to use, easier to process, such as comma-delimited and other formats. That was another dimension that we considered to be important.
Third, we also considered cost. When information is made less expensive, or is free, it's more open. Again, sometimes governments and other institutions implement some sort of cost recovery. We didn't want to say that data was completely closed if a modicum of charge was associated with it.
Finally, the fourth dimension we described involved the rights to use that data, whether it could be redistributed, how it could be processed, etc. Data could be completely unencumbered in terms of legal rights to use, or there could be some restrictions on it. We think that varies along the continuum. We really think that data can be more closed or more open or more liquid, as we described it, rather than just open data and then everything else.
That being said, what did we find when we looked at the potential economic impact of open data? We looked across seven different sectors of the economy. The sectors include education, transportation, consumer products, electricity, oil and gas, health care, and then various aspects of consumer finance. When we looked across all of those different sectors of the economy and we looked globally, we found that an additional $3 trillion to $5 trillion in impact could be created using open data. These benefits include increasing efficiency, developing new products and services, and even consumer surplus, which is the type of benefit that individual citizens can obtain when they have access to more open data or to applications that use open data.
There are a few other findings. Open data also enhances the impact that big data can produce, which has been another area of study for us. Oftentimes, when you combine data from multiple sources, you can actually derive more value. Some of the ways in which you derive value include increasing transparency, exposing variability, enabling the ability to conduct experiments in the real world, segmenting populations to tailor actions, augmenting or automating human decision-making, and then defining new products and services. Really when we looked across the board, if you think about exposing variability and enabling experimentation, about one-third of all the impact we found came from the ability to benchmark, to compare yourself against others.
We also found that individual citizens stand to gain the most from open data. Over half of the impact we found—again, that's not separate from benchmarking, because you can do individual benchmarking as well—in terms of potential benefits would actually accrue to individual citizens or consumers. We found in fact a very closely related concept to open data, which we described as “my data”. That's where an individual citizen or person has access to data that a government or a company has about them. That was one of the sources of benefits that individuals could have, for instance, my ability to compare my health care outcomes with people who are similar to me.
Open data can also help businesses raise their productivity and create new products and services. Companies clearly benefit from the ability to benchmark both internally as well as externally. Open data can also be used to create more tailored products by providing more consumer insights. Of course, open data also creates new risks around reputation and potential loss of control over confidential information, whether it be personal information or corporate or organizational information.
We also think that governments have a truly central role to play as a source of open data, which clearly a number of governments have been leading in that, as a catalyst for the use of open data, as a user itself of open data, and also as a policy-maker. Clearly, government has a tremendous amount of data that it could make available, and increasingly does.
The other interesting thing is if you go back to the point that I just made, which is that a lot of the benefits actually can accrue to a diffuse set of consumers or individual citizens, if you believe that's true, then in fact government is one of the entities that has the potential to actually speak for that diffuse set of groups rather than any special interest group and thereby implement policies that make the benefits of open data more likely to be captured.
The last point I'd make is this. While making data more liquid, making it more open often is an unnecessary action in order to capture some of this value and it's often not sufficient. Other things that have to happen are that you need to create a vibrant system or ecosystem of developers who actually use the data to create applications, because most people won't look at the raw data itself; they'll use applications that take advantage of the data. Open data, as a result, often has to be combined with other sources of data. You need thoughtful policies around intellectual property, privacy, and confidentiality. You'll need to invest in technology along with investment and skills. This is clearly one of those areas where we found a tremendous gap between the need for these skills and the actual supply of them.
Standards also have to be developed in order to make data comparable from multiple sources. Then actually releasing metadata, data about data, can make open data more usable.
In closing, the potential benefits of open data truly can be transformative—as we said, it's in the order of trillions of dollars annually on a global basis—but they can often be self-reinforcing. When open data is made available and applications that are useful are actually developed based on the open data, that often encourages more open data to be released and then that cycle continues.
Let me conclude with that. Hopefully that was a helpful tour of some of the research that we've conducted on open data.
:
I was sent a few questions, or things you'd like to know about. I have a bunch of notes about those. I don't know if you're going to ask questions related to what you sent.
As for my background, I've been active in the open data movement in Chicago for about six or seven years. Chicago has gained a reputation as the open data capital of the United States, and even when I travel internationally, people seem to know about Chicago's open data efforts. Government is responsible for some of that. There are a lot of independent designers and developers who've been pushing for open data, lawyers pushing for open data, organizations like Common Cause.
In the United States, the open data movement was initially about people looking for political transparency, wanting to know who was making political contributions. Much of the initial impetus was because, during the Nixon period, the Watergate burglaries were financed by secret corporate donations. The Nixon campaign had caused Common Cause to be formed as a bipartisan Republican and Democratic group that was in favour of transparency in political donations.
From that, government started collecting data. Computers got a lot more powerful. Data was released. The issue of open data is now not only political transparency, it's also efficiency. It's the idea of government as a platform. Whether it's much government data, open data can be used to create businesses like Google Maps or weather reports. Some companies are aiding farmers trying to figure out when they should plant a crop, when it's going to rain, when it's not going to rain, whether they should irrigate, and these kinds of things. There's a lot of efficiency and economic benefits to open data that have come to the fore in the last three or four years as a lot more local and national data has been released.
A couple of weeks ago, the federal government released a bunch of Medicaid paid claim data. There are several other sets of data they're going to release, several other sets of data that have been released. This and other types of data, electronic medical record data, is going to be available to doctors and hospitals and clinics treating Medicaid patients. Much of it's already available. According to the Affordable Care Act, or Obamacare, patients and doctors treating Medicaid patients are required to be able to receive electronic medical records. That's going to change the whole way people are treated.
Genomic data now is being combined with electronic medical record data to do medical studies without actually having to devise an experiment and do blind tests with control groups and that type of thing. You can just look in the data and look for patterns in that data. So, maybe in women treated for breast cancer, some live and some die. You look at the ones who have lived and you look back through how they were treated. You look at their genomic structure, and you look for particular medicines that can treat different types of diseases based on genetic traits and particular drug regimens.
That's a very broad view of what's going on in open data at the federal level. We could maybe get into the city level and the state level a little bit later.
[English]
Thank you, guests, for being here this morning.
Mr. Chui, I want to ask you some questions about the McKinsey report published in October 2013. You mentioned in your remarks that you focused on seven sectors. In the report, you identified an economic value of $3.2 trillion to $5.4 trillion annually worldwide just in those seven sectors.
How is it that McKinsey chose to focus on those seven sectors as opposed to others? Obviously, there are some other big sectors in a country like Canada: agriculture, fisheries, mining, and even tourism. I suppose the overall economic value could be much greater when you start considering those other sectors.
Also, if you look at only the United States, in the report you identified a value of $1.1 trillion. Given that the size of the American economy vis-à-vis the Canadian economy is about 11:1, would it be a reasonable assumption to say that there's an economic value of about $100 billion in those seven sectors in Canada?
:
There are a couple of things.
The first question was how we picked those sectors. Those sectors weren't picked because we thought those were the sectors where the most value would be created, but we wanted a number of sectors that varied across a number of dimensions. You notice some of them are B to C sectors, consumer focused. Some of them are B to B, and they're more business focused. Some of them are products. Some of them are services. Some of them are more public services, such as education, and some of them are very commercial.
Really what we wanted to do was to have a variety, and that's how we chose them really. It was really meant to give us a flavour for how open data could work in a number of different types of sectors.
Clearly, there are lots of sectors we weren't able to do as part of the research, and as you said, that suggests there will be even more value potential there.
In terms of trying to size the approximate potential for Canada, it's probably not unreasonable to use the sort of metric that if Canada's economy is this much smaller than the U.S. economy, then potentially Canada would be in that level of magnitude. Of course I wouldn't put any precision around it, but I think that's reasonable.
The other thing to keep in mind is that these are not GDP statistics, because they include consumer surplus, which is not captured in GDP. It's important to make sure, if you're trying to compare, that you wouldn't say this has this much GDP impact, because, as I said, over half of that impact would be a measure that is not captured by GDP. I think that's a flaw in the GDP statistic as it turns out.
:
Thanks for that clarification.
Some witnesses we've had before this committee talked about the sources of value. There's something of a focus on development of new applications, but others have said that really the true source of value, when it comes to open data, is the sense of removing friction in interactions between different stakeholders in the economy.
You could go with the example of the old days, even within the same institution or within the same government, for example. In the old days of making a request for data, you had to wait several days for that data request to be processed, then receive the data, and then translate that data. There's lots of inefficiency built into the older models, and with open data, things are able to move that much more quickly.
Is that a way to capture the primary source of the value in the McKinsey report that was done last fall?
:
Well, to get back to the previous point, I was recently at a meeting in the City of Chicago with people who work on the open data portal there. I'll give you an example. The department of housing has data.... I mean, there's housing data in seven different departments. Often the right hand doesn't know what the left hand is doing. One of the biggest consumers of Chicago data is actually other departments within the City of Chicago, which is important.
Washington, D.C. was the first city to release substantial data from almost every department, in 2006. That was the first major effort. They studied the actual users of the data. Between 60% and 70% of the people who came to the website and downloaded the data were actually members of city departments. A lot of city departments were afraid to release data initially, but then they ended up finding out it was so much more efficient to be able to get data that they didn't know was related to what they wanted to do from other departments. That's definitely one of the most immediate benefits.
People within government who are kind of reluctant to release data and make their data available, once they see that it has a lot of benefits for them also, it reduces the resistance tremendously. So within government, it has a very immediate value.
In terms of efficiency, I'll give you one example. We were working with the housing department. They have section 8 housing vouchers where basically low-income people can get money to rent apartments, and landlords or people who build apartments can get deferred taxes for eight to ten years in return for renting to low-income people.
It was a paper-based system, so when someone would leave section 8 housing, it might take six months for a landlord to rent the apartment. That's not a very good incentive, if you know you're frequently going to have vacancies and it takes a long time to rent the apartment. As a result, you would have people looking for section 8 housing. It would take a long time for them to find it. At the same time, you would have landlords who would have vacant section 8 housing and couldn't find people to occupy that housing, largely because the different departments dealing with housing data and section 8 housing didn't talk to each other much. It was paper-based.
Now that they have it all machine readable, they can reduce that time to two weeks or a month. It really improves the situation for both the landlord and the person or family who wants housing.
I want to thank our guests for joining us.
The McKinsey Global Institute's mission is to help leaders in the commercial, public and social sectors make decisions on important management and policy issues.
Mr. Chui, earlier, you talked about the loss of confidentiality and privacy. We know that the public uses the Internet to obtain information. For instance, people may want to obtain all the information about breast cancer, find out whether land is available or for sale, or obtain information about a specific home. They love getting that type of information.
However, they are not as pleased when that data is passed on to random companies. We had a case recently where the personal information of nearly one million individuals was disclosed. A few companies where involved in that.
Has your organization developed any measures to protect citizens in such situations?
:
There are a couple of specific laws in the U.S. related to privacy. One is HIPAA, which is on medical privacy. It was a major issue before we had the Affordable Care Act, because if insurance companies found out you had a particular disease, they would bar you from having insurance, or employers wouldn't hire you because you would be more expensive, that type of thing. It's less of an issue now.
There are also very strict laws around student data privacy. With elementary and high school students, you really can't reveal anything about them. That doesn't mean that data should not be shared among people who are authorized to use and see the data. For instance, there's one initiative we're working on now. In school, one of the best indicators that there's a problem is, for example, if a second-grader doesn't show up at school, and they're not sick, chances are, especially in low-income communities, the family has a problem of one sort or another. What will happen is the child won't show up. Someone from a social service agency will visit him. They'll maybe interview the family for 15 or 20 minutes, and make a decision whether the child should stay in the family or not, that type of thing. They have almost no data about the kid. They don't have any attendance data, grades data. They don't know whether the mother is in a drug- or alcohol-treatment program. They don't know whether the father is there, or if the father is there, whether he has post-traumatic stress syndrome from being in Iraq or Afghanistan, or something like that.
Compare that with a situation where you're driving your car with a broken tail light and you get pulled over by a motorcycle policeman. The policeman takes your wallet, walks back to his motorcycle, and checks you against 64 different databases. Why can't we serve children in the same way and share data privately among people dealing with low-income kids to try to improve their lives?
There are situations where you obviously don't want to share that data with the public, but you do want to share it with agencies, with teachers, with principals so they can help families develop.
:
The numbers vary. In the City of Chicago everyone always said there were 800 and some data sets, and I just talked to a guy who is in charge of the data sets yesterday and he said there are something like 550 data sets. The Obama administration originally, when they announced data.gov, said there were 190,000 data sets or something like that. Now they've reduced that to 54,000 data sets.
Even though the federal government in the U.S. has released lots and lots of data sets, and there's been a lot of data released for a very long time—weather data, satellite data, road-related data, geographic data—really, going back 20, 30,or 40 years, there's a study recently saying that even with all the open data efforts in the U.S., less than 10% of federal agencies actually release significant amounts of data. Much of that is because they make a fair amount of money by selling data either to other agencies or within their own agency. It's kind of a net wash. One agency is paying another agency for data, and the other agency is receiving income, which doesn't make much sense when you're within the same organization but it's part of the accounting rules that's creating inefficiency there.
We would like to see lots more data. There is a lot of data all over the world. We're members of the international Open Data Institute. We're the Chicago node. The U.K. has made tremendous strides in releasing data. France is releasing quite a bit more data. We have a fellow in our office right now who's in charge of the open data portals for the Dominican Republic, which didn't have a particularly democratic government for a very long time. Even there, there's an alliance of progressives and conservatives who want to release data for budget accountability, political accountability, so you can have an alliance among both sides of the political spectrum. We've been talking with them and learning a lot about what's going on there.
Things are happening. There's an effort in the Philippines, and even Russia is taking steps towards releasing more data. There are about 17 nodes in the Open Data Institute worldwide now. We were the first node just last October, and between then and now there are 16 more nodes. Many of them are in countries that were not democratic 10, 15, or 20 years ago.
:
The Open Data Institute's Tim Berners-Lee founded the group, and he invented the web. The slogan he came up with was: knowledge for everybody. This idea is it's not just the government leaders, not just the corporate leaders who should know intimately what's going on, but ordinary people should have access to similar levels of information to make democratic decisions, to be informed when they vote or participate in politics. From ODI's point of view, it's both the democratizing aspects and the economic benefit.
The Open Data Institute aggressively supports businesses using open data to create businesses partly for social good. There are some major problems, things like climate change. We're working on a project now to link Arctic researchers so they can share information more quickly, and we can make more progress on researching climate change, sea ice floes. Sea ice is melting very quickly, glaciers are, but if we can accelerate the research process by sharing data, space on the icebreakers.... It costs $50,000 a day to rent an icebreaker. If you can have three or four different teams renting space for a few days at the same time, you can cut down the cost, that type of thing.
There are many areas. Obviously, we've taken a lot of steps backward with the Billionaires United decision in the U.S., where a lot of secret money is now going into politics, and it's a serious problem, but there are many public issues. For instance, there's lot of controversy around charter schools in the U.S. A lot of groups, like the League of Women Voters, the PTA, are trying to discover data to try to figure out whether charter schools are doing good or bad. The way a lot of the laws are written, charter schools don't have to release as much data as the regular public schools do, so it's very hard to compare apples and oranges. On one side, on the charter school side, you have a big PR effort going on, sponsored by millionaires again, who want to take a certain portion of the public education budget. On the other side, you have teachers and parents who like the public schools and want to keep them. We don't have knowledge for everybody here. The charter school operators know what they're doing. They don't reveal it. We're also involved with Common Cause in another effort to try to uncover some of that data.
I would say both are very important. It's hard to know which is more important.
:
As an advocate of using data, I think it would potentially be a mistake for me to throw out ideas without actually doing any analysis. I think the thing you would want to do is to take the type of analysis that we did here globally and apply it to Canada and try to understand what types of data could potentially create the most value in Canada.
As we looked at it we did find, in transportation for instance, tremendous potential. In education we found tremendous potential. What you would need to do is to try to understand, from a Canadian perspective, where the most bang for the buck would be. I think it would be a mistake for me to speculate without having actually done some analysis, given that we actually believe in data as the way to make those decisions.
That being said, I do think the other insight you had was incredibly important, which is that “field of dreams” doesn't work by itself. You make the data available and benefits occur. You actually need to create a vibrant ecosystem of users of data. What that means is engaging with people who will develop programs, develop apps that actually use the data that make it useful to companies and individual citizens, etc.
As I said before, that's almost a marketing game. What we've seen here in the United States, the government has done things like create events. I know that in Canada, for instance, I was in Toronto for the CODE hackathon. It was one of those events that made it more noticeable to people who write computer programs that in fact there's a vast source of data they can use to create new applications. Events and contests, even advertising, quite frankly, are some of the things that have to happen in order to create an ecosystem of, let's call it loyalty. The customers of the data who are the developers of applications have to first, be aware, and second, incented to create applications using open data.
:
Yes. I like to look at the really big issues, which in my opinion are climate change and medicine right now, at least two of the kind of scientifically oriented areas. There's tremendous activity there. In the climate change area the U.S. federal government has mandated that anyone who gets federal money for scientific research has to release their data within a year. That's fairly recently. Informally it has been going on for a couple of years now, but people haven't adhered to that policy particularly well. From now on, I think they're going to adhere and that is going to accelerate the pace of scientific research, including on climate change which, at least for people who believe climate change is happening, is extremely important.
We see a tremendous amount of energy also in medicine. Silicon Valley is now “disrupting”—an overused word—medicine in many ways. The federal government and even some drug companies are sharing data about their drug studies. A couple of companies have committed to releasing all of their data related to the drug studies they've done, which could help treatments and could also help better figure out what medicines actually work and what medicines don't work.
There's an effort in the U.K. where half a million people have agreed to have their genes analyzed so they can combine the genomic data with their electronic medical record data. Kaiser Permanente in the bay area has a million of their patients agreeing to have their genes analyzed and combined with their EMR data. They're opening that data up to qualified researchers who follow privacy procedures.The promise in that area is simply tremendous.
There's also crowd-sourced medical data which is really interesting. For instance, some researchers and doctors in California have created a mobile phone app where, if you have what you think might be a cancerous mole, you can take a picture, and that picture goes into a database and then when the mole is removed and analyzed, you come back and the application tells you whether it was cancerous or not. They've accumulated enough data now that by simply taking a picture of a mole, you can get a pretty good probability as to whether you should have it analyzed or not. They're using artificial intelligence to analyze the colour, the pattern on the outside, the size, all that kind of stuff. They're building up crowd-sourced ways of doing this.
A lot of people have lots of moles, but if you have a mole you're worried about and think you should get it tested, you're much more likely to get it tested. Every time you test, it's another $1,000.
:
It comes from the City of Chicago. The City of Chicago actually had been collecting lobbying data for quite some time. Under Mayor Daley they just didn't release it, and then when Mayor Emanuel came in, he released it within the first couple of months of his term.
There were initially 14 different data sets, while now there are 17 or 18 different data sets. There was a handful of us who had been advocating for open data for quite a few years. During the Daley administration, we hadn't really been able to get a significant amount, and when Emanuel starting releasing stuff, we were complaining once he actually gave us some data that it was incumbent on us to actually do something with it.
I started looking for data sets we could do something with. I saw these 14 lobbying data sets, but if you look at one data set, no one can learn anything from it, so we basically got volunteers. We got a couple of guys from Groupon together, three people from Webitects, a couple of volunteers. One woman was riding her bicycle across the country, going to work for Code for America in San Francisco. She stopped in the city and Google sponsored a hackathon. We got this team together. We went from seven o'clock in the morning on the hackathon, and by seven o'clock in the evening, we pretty much had it done. Some were really high-quality designers and developers, and we all worked together, but the interesting part—
:
The ethics commission of the city asked us to testify in front of them on how the lobbying data could be improved.
One of the big problems with it is that it's only released annually, so by the time you actually get the data, most of the decisions have already been made. We recommended that it be released in a much more timely fashion: daily, weekly, or at the latest, monthly. That would make big improvements.
Also, we're only told which committees they appear in front of, or which people they talk to; we're not told on what. There's not enough detail on what issue they're talking about. The lobbyists are basically allowed to describe what they're doing with almost no details.
We've also talked with the City of San Francisco about the same issue. We're even thinking about putting together basically a national software application to collect and show the data, one that has more granularity, faster reporting, and all that kind of thing.
:
Actually, we talked directly at a meeting in London just a few months ago, and we talked on the phone maybe two months ago.
One of the purposes of the Open Data Institute is to create this network where everyone can learn from each other. I don't know a heck of a lot about Canada, but I know more than I did before. They are just getting started. Also, at an Open Data Institute event about a year ago, there was a representative from the government whom I talked with for quite a while. I don't remember his name now, but he was in charge of thinking about open data policy. We had conversations there.
In terms of advice for ODI Canada, it has taken a long time in Chicago and in the U.S. to really get the open data movement going. It has been driven by enlightened government leaders, but also by activists, designers, developers, and political transparency people. It's a multipronged process that takes quite a while. I'm glad you're having these hearings, because these hearings.... It's very hard to have an open data movement without any open data, so you have to get some open data from states, cities, and the federal government, and then it has to be non-trivial. It has to be stuff that you can really work with.
Up there, let's say in Toronto, I'm guessing there's probably transportation data available, whereby people can create bus trackers and train trackers and that type of thing, to try to better schedule their commute to and from work. If there isn't, then.... You have to figure out what is the low-hanging fruit, what you start with, and how you work with government and private developers and designers to get things together and get things going. You need a moderately sized town to really have enough data, enough political support, and enough developers and designers to work on it.
You have to work slowly and target your efforts. Canada is a big place with a relatively small number of people. My mother was actually born in Canada and I've spent a fair amount of time there. Now you have a better-off middle class than we do, so maybe you'll have a lot of demand for the benefits that open data can bring.