Accident Details



Covid-19 test results silently deleted by Excel
Domain: Medical
Year: 2020
Data Categories: Dynamic
Properties Lost: Integrity, Completeness, Timeliness, Availability, Fidelity / Representation

Summary:

Importing Covid-19 test results into an Excel file truncated the data after 65536 records

Details:

In early October 2020 the British Government announced that 15841 positive Covid-19 test results had not been reported in the totals for England between 25 September and 2 October. This also meant that the contacts of those people were not traced, or asked to self-isolate, meaning that the virus might have spread further than it would otherwise have done, and possibly have taken additional lives.

The results were silently discarded as they were imported into the Public Health England database. Results were delivered as a “Comma separated Values” (“.csv”) file, which was imported into an Excel spreadsheet “template”, using the “.xls” format, which was then in turn imported into the national database. The “.xls” format has a limit of 65536 (2^16 ) rows, and rows beyond this limit in the “.csv” file were silently discarded. (Data Category: Dynamic, Properties lost: Integrity, Completeness, Timeliness, Availability, Fidelity/representation)

The newer “.xlsx” file format would have increased the row limit to 1048576 (2^20 ) rows before suffering the same problem, but the “.csv” file format has no limit on the number of rows.

This demonstrates the danger of using COTS software for safety-related functions without fully analysing its limitations. A decision had already been made to replace this system, but had not been acted upon. It is also an example of Dark Data, where it comes under the Data we don’t know are missing: “unknown unknowns”, and the Missing What Matters categories.

Finally, it is an example of where an error is known to the system, but not reported (adequately) to the user (Data Category: Dynamic, Properties lost: Timeliness, Availability).

Links: