Pursuant to Standing Order 108(3), we are continuing our study of the 2016 census language data and the overestimation of the growth of English in Quebec.
Today we are pleased to welcome two Statistics Canada representatives: Jean-Pierre Corbeil, assistant director of the social and aboriginal statistics division, and Marc Hamel, director general of the census program.
I imagine you know how our committee works. As usual, we will give you about 10 minutes to make your presentation. Then we will move on to a period of questions and comments from committee members.
I believe Mr. Hamel will be making the presentation.
Please go ahead, Mr. Hamel.
First, I want to thank the committee for giving Statistics Canada this opportunity to present the facts concerning an error detected in the 2016 population census language data that it released on August 2.
I believe you have received copies of the presentation we prepared to explain to you what happened. I am going to review that presentation and talk about the various points addressed in it.
As we now know, an error occurred in the 2016 population census findings, and it mainly concerns a few communities in Quebec. The error caused an overestimation of the growth of English as a mother tongue and the language spoken most often at home, mainly in the province of Quebec and in some of its municipalities, and an overestimation of the decline of French. It also resulted in a slight overestimation of the rate of English-French bilingualism in Quebec and the rest of Canada.
The source of that error was a programming problem in an auxiliary data collection procedure. The error occurred during a follow-up step conducted with respondents to fill in incomplete information. The error occurred in the transfer of responses for a subset of French questionnaires. It affected the content of the short form only and concerned approximately 61,000 people.
Responses were miscoded by the system for two language questions: questions 8 a) and 8 b), which concern the language spoken at home, and question 9, which concerns mother tongue. Responses to the “French” and “English” categories were reversed.
In the presentation, you will find a sample paper questionnaire in which those questions appear. As you can see, the response selections are reversed between the English- and French-language versions. In short, the program read the French version of the questionnaire as though it were in English and interpreted the first response, which is "French", as being "English".
A comprehensive review of the entire collection and processing process resulted in a clear diagnostic of the impact of that error. As I mentioned, approximately 61,000 individuals had their responses incorrectly classified for these three questions. We confirmed that this error affected only the response categories that are in a different order in the English and French questionnaires. As a result, for a subset of questionnaires, the “French” responses were coded as “English” responses. As the problem originally concerned the French version of the questionnaire, the error mainly affected findings in the province of Quebec.
Statistics Canada takes the quality of its data and their importance for users very seriously. Once informed that some results appeared to be hard to explain for certain Quebec communities, we immediately proceeded with a new review of our data production processes. Our presentation provides a timeline of events from the moment we were informed of a potential problem, to the moment we identified the source of the error, and the moment we corrected it.
On August 9, the chief statistician was notified in writing by a data user about inconsistencies in the 2016 census findings for the English language in select communities in the province of Quebec. Statistics Canada then conducted an exhaustive review of the data collection and processing of the 2016 census. We looked for the origin of the problem.
On August 11, we confirmed that there was an error in a computer program and released a statistical announcement to that effect. We immediately informed data users that there was a problem with the data.
From August 12 to 15, Statistics Canada re-ran the entire data processing and analysis process for the language variables.
On August 16, an expert panel assembled by Statistics Canada reviewed the new language data.
On August 17, we released new data and a technical note explaining the nature of the problem and exactly what had been done.
All language data products were thus released as of August 17. All data products initially made available on August 2 were corrected and are now available on the Statistics Canada website.
In the work we did to correct this error, we took a number of steps, including verifications throughout the data processing, with particular attention to records affected by the error. We verified and validated that the error was limited to the language variables only and did not apply to other parts of the questionnaire. We conducted an analysis of the impact of the error at every processing stage and at several geographic levels, and we cross-checked with other data sources to ensure the new findings were valid. Lastly, we conducted a review assisted by an expert panel, as I mentioned earlier.
In view of this error, we have since implemented rigorous mechanisms to determine the sources of variations in numbers and percentages between the 2016 and previous censuses. Data validation methods have been changed to enable us to identify factors that explain the variations down to the level of every municipality in Canada. Our verification process is now vastly more robust as a result. No other production error has been detected for any other data released to date.
That, broadly speaking, covers the events surrounding our release of the language data on August 2, 2017, and the measures Statistics Canada took to uncover the causes of that error, to make the appropriate corrections, and to re-release the data so we could certify for our users that the data could be used without restrictions.
We are now prepared to answer your questions.
Thank you for being here today, Mr. Hamel and Mr. Corbeil.
As you know, gentlemen, when we parliamentarians are required to make decisions, we rely on what are called facts, factual elements. The data we are given enable us to make decisions for Canadians. Consequently, Statistics Canada stakes its credibility on all the data it provides to parliamentarians, institutions, companies, and its entire clientele in the broadest sense.
What happened in August undermined Statistics Canada's credibility to a certain degree, and it was important for us to meet with you today to take stock of the situation. You are here today to defend your institution's credibility, and I am pleased that media people are here too so they can report the matter to Canadians. We will probably be doing the same in an upcoming report.
I do not think we have any grounds to doubt Statistic Canada's credibility. What is certain is that Statistics Canada has been around for quite a long time, and decisions that Canadian parliamentarians from all parties have previously made have been based on facts, information, and data that you have provided. It is fundamentally important and even essential that the information we receive and on which we rely in making decisions be absolutely perfect, and that is particularly true with regard to official languages.
How can this kind of error occur given the number of employees you have, the credibility you enjoy, and the history of your institution? How can this kind of error still occur in 2017? That is the main question in my mind. Furthermore, I would like to know whether this has happened before. Whatever the case may be, do you think that this error, which occurred in 2017, was human or technological in origin? Can the two be separated?
I know what happened, but I still do not know how the error escaped us.
When we create a system, it is systematically designed and individually tested. We test the outputs of that system. We verify where in fact the information subsequently goes, which system takes over, and so on. All that is done systematically when we prepare for and conduct the census.
For the moment, I cannot tell you why we did not detect the error when we tested all those systems. However, we take measures and use matrices to test all these processes. Once the data are produced, they are validated. At the validation stage, we saw that changes had occurred, but we did not understand that the verification should have been done before releasing and correcting the data.
This type of error is highly unlikely but not impossible.
All right. I had understood 31,000 when you made your presentation. I probably misunderstood. I just wanted to verify that it was indeed 61,000.
You will understand my concern after the following comments.
In our proceedings, the committee has often discussed the importance of accurately enumerating anglophone and francophone rights holders under paragraphs 23(1)(a) and (b) and subsection 23(2) of the Canadian Charter of Rights and Freedoms.
In your last appearance before the committee, Mr. Corbeil, you explained that the process involved in asking the right questions and ensuring you cover the right things was a long one. In fact, you did not seem sure that all francophone rights holders in the rest of Quebec could be enumerated. I assume you must have had to conduct some tests to make sure you asked the right questions.
Allow me to respond briefly.
You should know that, pursuant to a standard issued by the federal government, in all documents, the French must precede the English in the French version, and the English must precede the French in the English version. This is why the language questions are the only census questions in which the order of the responses is reversed.
This standard has been in force since the early 2000s. Consequently, this is not the first census for which we have proceeded in this way. The 2016 census was the first and only one in which we encountered this kind of problem.
We also used different methods—
I would like to get this straight.
According to the figures you obtained, the anglophone population increased by 164% in Rimouski, 115% in Saguenay, and 110% in Drummondville, not to mention Sudbury and Ottawa. Those are not normal figures, but you nevertheless decided to publish them. Is that correct?
You said that, when you saw those figures, you thought they made no sense and that something abnormal had occurred. Why then were they published? If those figures were abnormal, they should not have been released.
What I understand from this matter is that, when the figures were published, Canadian citizens, including Mr. Normand, holder of the research chair in Canadian francophonie and public policies, and Ms. Mainville, of the University of Ottawa, realized that something was not right. They then called you, and that is when you changed those figures. Is that not correct?
I understand. I mention this because I know that QCGN and the FCFA do not have access to those closed meetings, and I wonder whether that should be reviewed.
There is another point. As we have seen, the system you use to validate the figures before releasing them failed in this specific case. I am not speaking generally but rather in this specific case. What steps would you take to prevent this kind of error from reoccurring in the data validation process?
After determining that an error had occurred, you had a good system that worked well. Several steps were followed, including verification, validation, analysis, cross-checking, and expert panel review.
Are the same steps taken in normal circumstances?
What happened in this specific case? Ultimately, why did your validation process not work in this specific case?
Thank you for your presentation, gentlemen.
Errors are never a simple matter. There can be no doubt about that. What concerns me, however, is that there were a number of errors. The main error was a misinterpretation of the language, as you said. However, other errors occurred during the process right up until the information was published. That is what is troubling. The fact that the initial error occurred internally is one thing, but the fact that it went through four or five stages without being noticed before the data were made public is quite another. The data analysis method should be reviewed.
We can also see how quickly this kind of error can cause problems. If my memory serves me, the Bloc member Mr. Beaulieu declared, after reading the data indicating a major increase, that English was taking control of Quebec, or something like that. That is always disturbing.
I read what a certain Mr. Éric Boucher wrote, that it was somewhat odd that the people who work full time on an issue are unable to detect these kinds of anomalies. How do you respond to that comment?
Before commenting on what Mr. Boucher wrote, I can give you an answer based on my viewpoint.
I am responsible for the census program at Statistics Canada, and this is a dramatic incident for all the people who work on my project. No one is proud of this. We take this very seriously. We are very proud of the work we do, and we completely understand the importance of this information for all data users and the implications the data have for decision-making everywhere. We did not take this lightly. We really worked very hard to correct it, and we will continue to work to prevent it from reoccurring.
To err is human. It can happen, but we do not take it lightly. I can assure you we are doing what is necessary to ensure the integrity of census findings.
Generally speaking, all our statistics programs are extraordinary. We have learned a great deal from this error, and we will make sure we improve our processes—even though they were very robust before this incident—so that it does not reoccur.
With your permission, I am going to draw a brief comparison with the Acadians and minority francophones across Canada.
In this case, bad data led to results that raised a lot of questions. The data did not represent the actual situation.
And yet, for 35 years, there have been no accurate or incorrect findings concerning Acadians and minority francophones outside Quebec because the census does not include questions that would assist in enumerating rights holders as defined under paragraph 23(1)(b) and subsection 23(2) of the charter.
It took one week for these incorrect data to cause panic, whereas there have been no data to help increase the francophone population outside Quebec in the past 35 years. I see that as a problem.
What are your comments on that?
Did you establish a certain discipline? I am asking you the question in good faith. Since your institution has 5,000 employees, I imagine some form of discipline is applied in accordance with a pyramid model.
We now live in a society in which the people in positions of responsibility are virtually never held to account. This creates problems in our culture and does not set a good example for young people. We are truly living in a society of non-accountability.
Will you try to determine whether a division, or indeed a particular employee, failed to do the proper review work?
You are not the person concerned, Mr. Hamel. Since you are the director general, we can assume you are not the one who conducted the review. However, I imagine you or the chief statistician potentially have the authority to dismiss people.
Do you intend to apply some sort of discipline in a specific manner? As regards the error that was discovered, if it turns out that experts did not do their job, will they be reprimanded?
Perhaps it would be a good idea to send your 5,000 employees a letter asking them in a diplomatic and positive way to be more vigilant because this must not happen again.
We are judging no one, and we are targeting no one. However, I am a former member of the armed forces, and they do not fool around there. Discipline is very quickly established, and, when you wage war, it works. When it does not work, it is because the government has not provided sufficient resources.
I imagine the census has always been conducted using computer systems. You mentioned the Dominion Bureau of Statistics, which existed before Statistics Canada. For issues as important as language issues, which may directly improve or undermine the welfare of any anglophone or francophone community in Canada, would it not be better to do the work by hand?
I know what I just said is extreme. However, I am a Conservative and I hate machines.
Voices: Oh, oh!
Some of the systems are not new, but some have to be rebuilt for every census because our questionnaires change. Then we have to make the necessary corrections to those systems to reflect the fact that the questionnaires have been updated.
A lot of data is handled and transferred to ensure that we ultimately get high-quality data. There are several stages: compilation and findings for Canadians, certification, and so on. Most of those stages are automated. Where that is impossible, they are performed manually. Consequently, parts of the work are indeed done manually.
In a case such as this, since Canada has 5,000 municipalities and tens of different variables must be cross-referenced, we will look for automated ways to do that cross-referencing. If it were done by hand, it could take us years.
We also try to see whether we can detect anomalies in the data before releasing them. Here again, once the lesson has been learned, we will make those systems more rigorous and smarter to prevent the situation from reoccurring.
You have to understand that the order in which we place the questions in the context in which they are asked may have an impact on the answers given. In 2006, for example, the questions on language followed all of those on ethnocultural diversity, that is to say on immigrant status, citizenship, and so on. In 2011, the language questions migrated to the short form, as a result of which no questions on ethnocultural diversity preceded the language questions. That may have led people to respond differently.
Previously, when the mother tongue question was asked in isolation on the short form, we underestimated the unofficial languages in Canada by approximately 20%. However, when we put the mother tongue question at the very end of the questionnaire, people understood what we wanted to know, which was the first language learned at home as a child.