As the mania surrounding artificial intelligence (and the instances of pub chat being interspersed with someone asking ChatGPT stupid questions) subside, the question of where we’re going with artificial intelligence remains.
Surprisingly enough I can’t actually predict the future (although that might make me an anomaly as a hedge fund manager!). But what I can say with a reasonable degree of certainty is that AI is based on data; data is the food of AI, it’s the energy source, the building blocks for the intelligent part of its artificiality. AI’s capabilities are really only as good as the data you train it on and the data that it’s able to act upon. If there was no digital data, there would be no AI. So if you want to control AI, then it seems logical to me that we must control the data.
What do I mean by ‘control’. I think we can broadly split it into three sub sections: control over creation and integrity, control over storage and ownership, and control over access. I would like to argue that blockchain is the only technology that can cover all three of these areas of control over data.
Creation & Integrity
I have had many experiences with large language models (LLMs) where the answer to my question has been totally wrong. It’s difficult to tell whether that’s because the data that the AI is using is incorrect, or because it has drawn the wrong inferences from the correct data. Irrespective, as Elon Musk discovered when he was going through the process of purchasing Twitter, it has become increasingly prevalent for data created online to come from bots rather than trusted sources.
To be able to trust data, there are 3 key components; who created it, when it was created and how it has been stored (which i’ll come to in the next section). If you don’t have trustworthy information about those three things, then you can’t distinguish between fact or fiction. The ONLY technology that is able to provide those things is blockchain; data is stored in blocks that contain a time stamp and the digital identity of the entity that created the data. Over time, if the network comes to believe that data associated with a particular digital identity lacks integrity, then it can be assumed to not be trusted. The converse argument is more powerful, and would have helped Elon Musk a LOT; if you can have data associated with a digital identity that is known to be from a trusted source (in Elon’s case a human), then you can discount all other sources of data (you would never have to complete a Captcha again, for example). Incidentally this is also true for many other sources of non human related data, such as the weather, and within the blockchain world we have oracles (such as API3 and Chainlink) that have the sole purpose of assessing the integrity of digital data using cryptographics and supplying it to blockchains for use in smart contracts.
Storage & Ownership
Whilst there is a component to the integrity of data which is to do with how it’s created and by whom, there is a second component to data integrity, which is how it’s stored - so that we can trust that it hasn’t been changed. Within a blockchain, the storage of data is cryptographically secured over a decentralised network, which means it can’t be changed. The decentralised component is critical: when Facebook tells advertisers that their adverts are reaching a specific target audience, how do they know that’s true? They have no idea; they take it on face value. I’ve no idea whether the data that Google owns about me is true or not. I’ve no idea if they’ve changed it. And neither would AI.
This leads on to the second part of storage which is ownership. Unless you want to run your own data centre and servers, it’s only possible for individuals to own their data via a digital wallet and storing their data on a blockchain. Because the alternative is that a centralised entity provides you with some tools to use that create data which they then store for you, and in most cases, that means that you hand over ownership too (at least while they’re storing it for you). OpenAI and Microsoft are now facing their second class action lawsuit in which they’re accused of training their AI on ‘stolen personal information from hundreds of millions of internet users’. The article is here.
Access
Once we’ve created the data and stored it, the final piece is who can access it. Now I’m sure this is huge oversimplification and I’m more than happy to be told I’m wrong - but if we don’t want AI to bring on the advent of the next nuclear apocalypse, then don’t give them access to the data on how to build a nuclear bomb? Public and private key encryption, combined with blockchain based digital identities built on biometric data should, in theory, allow us to be very specific about who or what has access to what forms of digital data. At the moment all data is accessible to all people that have an internet connection. That might be a mistake. Certainly for parents, a digital identity that is linked to biometric data, which can then restrict use of various websites seems to me a much better solution than what we have presently. Once we’re able to prove that we’re human online (see my articles about Digital Identity and World Coin), and to share various characteristics of our humanity, then we should be able to restrict data accessed by AI. This will also mean that, for example, the people that have written the thousands of articles, reviews, blogs, papers etc that LLMs are scraping for data, could actually be paid for their data, rather than it just being accessed/taken for free.
The pace with which AI has developed recently is both impressive and frightening, and there are times that it feels like we may have let the proverbial cat out of the bag, without really understanding the consequences; I haven’t touched on the consequences for jobs or control or autonomy, and one area that I’m interested/scared about is AI Autonomous Agents. But with all the hype and scaremongering, it does seem to me that we need to take a step back and think very carefully about the feedstock that we’re providing to AI, which is the data that it uses. We must have control over data creation and integrity, control over storage and ownership, and control over access. Through the combination of blockchain’s distributed ledger technology, public and private key cryptography, oracles and biometric based digital identities, we can take back control of data. And that should, in theory, enable us to control and harness the undeniable power and potential of artificial intelligence as a force for improving the world.