This is the third post in a series about the X-Road logs. The first part was about different log types (technical logs, business logs, audit logs) and the X-Road logs in general. The second part concentrated on the X-Road business log which contains all the messages processed by a Security Server. The third part is about how to provide access to the logs of who has accessed my data and when.
Our data out there
Our personal information is processed by numerous different information systems on a daily basis. Some of the processes are fully automated and other include human actions. Wouldn’t it be nice to know for which purposes your personal information is used, when and by whom? Getting access to this information is already possible, at least in theory, but it requires requesting the information from each registry owner separately. Once you get a response, its format (printed document, email, structured data etc.), level of detail and delivery method (post, email, API etc.) depend on the registry owner, because there’s no unified way of providing this information. In practice, it is currently impossible for a citizen to get a good overall picture regarding the usage of his/hers personal information. There must be a better way.
As you all know by now, the X-Road message log contains all the business events processed by a Security Server. Non-repudiation of the message log is guaranteed using time-stamping and digital signatures. Could the message log be used for providing access to the information regarding who has accessed my data and when? Let’s find out.
X-Road message log to rescue
The message log contains all the required information in a machine-readable format so it might provide a solution to our problem. However, message log is Security Server specific so when a service can be accessed through multiple Security Servers, which is very common in a highly available setup, message log entries are distributed between multiple Security Servers. This means that information must first be collected from multiple sources and then combined. Collecting and combining the information requires shell access to Security Server as there’s no API for querying the logs based on the message content. In practice, collecting and combining the information must be done manually by a Security Server administrator.
Another thing that must be considered is the archiving of the Security Server message log. By default, messages are archived from message log database to disk after 30 days and therefore there must be an additional method for searching data from the archived message log files. This can be done using standard Linux command line tools, but it’s not very efficient when the amount of archived log data is big. In addition, it’s not recommended to keep archived logs on the Security Server so access to a separate long term storage is required and the archived logs might even be encrypted.
Too good to be true?
It seems that all the required information is there in the message log and with some new functionalities the information could be made accessible in a way that manual work is no longer required. Sounds like a good solution if personal information which usage needs to be logged is accessed through the X-Road only. In real life, this is rarely the case. Usually personal information is accessed through multiple channels (information system’s own UI, mobile apps, p2p integrations using native APIs etc.) and the X-Road is just one of them. The message log contains information about the messages processed by a Security Server, but all the other channels are excluded. From a citizen’s point of view this kind of partial solution is not sufficient and usage logs must contain the access information from all the different channels. To be able to log all the events processed through different channels, an alternative approach is needed.
The technical challenges mentioned above could be solved using a centralized system for storing usage logs. In general, it is a common practice for an organization to have a centralized log management system that contains logs from organization’s all the information systems. A log management system could be designed so that it provides a separate index/table/container for information related to access to personal information. In addition, a common access log format, that all information systems would use, should be defined and implemented. However, unlike Security Server’s message log, information systems’ logs are not usually signed and time-stamped so non-repudiation is not guaranteed.
Accessing usage logs
Implementing a centralized logging system on organizational level would solve half of the problem - collecting and storing the access logs. The other half, providing citizens access to the usage logs is yet to be resolved. Each organization could build their own solution for viewing the data, but for a citizen getting an overall picture regarding different services would require accessing many different online services and websites. A better solution would be to provide access to all the usage logs from one centralized service or portal, e.g. state portal.
Centralized access to all usage logs could be implemented in a distributed or centralized way. The distributed way means that each organization stores its own usage logs and the data is fetched on request only, e.g. when a citizen logs into the state portal and wants to see the access logs of a specific data source. The centralized way means that usage logs are regularly harvested from organization specific storages and stored in one central usage log storage. Both alternatives have their pros and cons, but their further analysis is out of scope of this blog post. For a citizen both alternatives would provide the same result – access to all the usage logs from one place.
Access to the logs through state portal could be implemented using the X-Road. Each organization would implement a common interface for accessing and/or harvesting logs. In this way, even the access to the access log would be logged and transparent to citizens.
Of course, in addition to architectural and technical questions there are many other questions regarding the content and format of access logs that would have to be commonly agreed, e.g. how fine grained the logging should be, how detailed the descriptions should be etc. Failing in this area could make the result very confusing and even misleading for citizens. Badly implemented, the result might even do more harm than good. Therefore, instead of talking about technical details it would be better to concentrate on the targeted outcome from citizen’s point of view first.
Back to the topic
Back to the earlier question – could the message log be used for providing access to the information regarding who has accessed my data and when? Basically yes, but a partial solution would not bring very much value to citizens. In addition, development of new features would be required to remove the manual work regarding handling of the logs. In practice, the message log alone is not a sufficient solution, but it can definitely be used as a part of a wider solution discussed before. Therefore, the message log alone cannot be used for providing access to the information regarding who has accessed my data and when.
That’s it about the X-Road logs for now. There is more to come later...