When you are beginning to work with Sign Protocol, one decision arises almost at once. So whither does the attestation reside? On-chain implies that the entire record is stored on the blockchain. Off-chain implies that it exists in some other place, only a reference is placed on-chain. Both are valid. Both are not necessarily better. The correct decision will be based on the type of data you are constructing and what the data really is.
Start with on-chain. In cases where an attestation is fully on-chain all the data in that record is stored in the blockchain of a single block. It is permanent. It is publicly readable. Any end-user with the transaction hash or the attestation ID can look up the transaction immediately without having to get anything out of a remote server. They do not rely on a third party to remain online, they are not at risk of losing it if their storage provider goes offline, they are not in doubt that their data has been altered. What goes on the chain is what was there, and it will remain there.
This is effective in some categories of data. A document that proves that a wallet was involved in an event. An identification indicating that a user has successfully passed a verification test. An indicator that a given vote was made in a round of governance. Those are facts that are supposed to be public, they are small enough to store at low cost, and long-term availability is just what you require. On-chain storage is the appropriate decision in this case.
Not all the attestations are of that kind. The information found in medical records is sensitive personal information that should not be kept on a public ledger. Money, personal messages, business agreements containing secret information - all of that implies the information that must be able to be proved and yet cannot be seen by anyone who understands how to use a blockchain to request data. All these would be inappropriate on-chain storage, and in most laws would directly conflict with privacy laws such as GDPR and HIPAA which mandate the right to delete / restrict access to personal data.
This is done by off-chain storage. The real data exist somewhere beyond the blockchain: on the IPFS, on a local server, in encrypted database or on the personal computer of the user. By what is placed on-chain is a cryptographic calculated hash of such data. A hash is a predetermined length string that is calculated mathematically based on the content. Alter one character of the original data and the hash is altered entirely. This is to say that the on-chain hash is a fingerprint that cannot be tampered with. Any one who desires to verify the attestation retrieves the off-chain data, executes it using the same hash function and determines whether the output matches the on-chain data. In case it is identical, the information is real. Something was changed, otherwise it would not have done so.
The authentication phase introduces a minor tax to pure on-chain storage, but it addresses the privacy issue in a clean manner. The blockchain never comes into contact with the sensitive data. The evidence that it was never altered or altered by malicious people persists only on-chain. A healthcare professional is able to attest to a patient regarding their eligibility to receive a treatment without putting any of their health details in a registry. The patient carries the data. The verifier checks the hash. Nothing sensitive is stored in the chain.
Another factor that is not much talked about initially but with scale conditions a lot is cost. Blockchain storage does not come free. Writing on-chain incurs a cost of gas. This is insignificant in the case of a small attestation with a limited number of text fields. A costly process of storing the entire content on-chain would not be viable to a large document, an elaborate compliance report, or any attestation with rich media. Off-chain storage allows you to anchor any amount of data on-chain at a comparable cost as anchoring a small string, since only the hash is stored on-chain, irrespective of the size of the document you are storing.
There are hybrid models in those cases when it does not have an obvious one or the other. Other elements of an attestation may require being publicly visible and readable immediately - a form of credential, an issuer identity, a time. Other areas of the same attestation may have to remain confidential - the particulars that prove eligibility, the personal identifiers which support the assertion. A hybrid solution places the on-chain components of the public fields and the off-chain components of the private fields, the on-chain component being a hash of the off-chain component. This provides the verifiers with the information of the people and does not reveal what is not supposed to be told.
Sign Protocol has been used to facilitate all three models. The decision is determined during the preparation of the attestation. It does not need any architectural modification and does not have a distinct system to run. The choice of a developer depends on what the data is, what use case it is used in and the rest is taken care of by the protocol.
One question that can be useful in the decision is to ask two questions. First it must be permanent and immediately readable data with no outside dependence. If yes, on-chain. Second, does it contain any sensitive data, large data, or data subject to privacy laws? Yes, off-chain and anchored by an on-chain. In case the answer to the two is in part yes, hybrid. The vast majority of real world attestation use cases are nicely divisible into one of these three buckets once one considers what the data is actually and who should access it.
The sort of flexibility here is not an accident. An on-chain only storage would be useless in regulated industries. The protocol that exclusively supported off-chain storage would not be useful in situations where the permanent public verifiability is important. Ensuring that both are supported and allowing developers to make the choice depending on their real needs make Sign Protocol relevant to a much broader scope of industries and deployment scenarios than a limited variety of crypto-native applications only.
