Errant being.
292 stories
·
0 followers

NFTs and Web Archiving

1 Share
One of the earliest observations of the behavior of the Web at scale was "link rot". There were a lot of 404s, broken links. Research showed that the half-life of Web pages was alarmingly short. Even in 1996 this problem was obvious enough for Brewster Kahle to found the Internet Archive to address it. From the Wikipedia entry for Link Rot:
A 2003 study found that on the Web, about one link out of every 200 broke each week,[1] suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[2]
One might have thought that academic journals were a relatively stable part of the Web, but research showed that their references decayed too, just somewhat less rapidly. A 2013 study found a half-life of 9.3 years. See my 2015 post The Evanescent Web.

I expect you have noticed the latest outbreak of blockchain-enabled insanity, Non-Fungible Tokens (NFTs). Someone "paying $69M for a JPEG" or $560K for a New York Times column attracted a lot of attention. Follow me below the fold for the connection between NFTs, "link rot" and Web archiving.

Kahle's idea for addressing "link rot", which became the Wayback Machine, was to make a copy of the content at some URL, say:
http://www.example.com/page.html
keep the copy for posterity, and re-publish it at a URL like:
https://web.archive.org/web/19960615083712/http://www.example.com/page.html
What is the difference between the two URLs? The original is controlled by Example.Com, Inc.; they can change or delete it on a whim. The copy is controlled by the Internet Archive, whose mission is to preserve it unchanged "for ever". The original is subject to "link rot", the second is, one hopes, not subject to "link rot". The Wayback Machine's URLs have three components:
  • https://web.archive.org/web/ locates the archival copy at the Internet Archive.
  • 19960615083712 indicates that the copy was made on 15th June, 1996 at 8:37:12.
  • http://www.example.com/page.html is the URL from which the copy was made.
The fact that the archival copy is at a different URL from the original causes a set of problems that have bedevilled Web archiving. One is that, if the original goes away, all the links that pointed to it break, even though there may be an archival copy to which they could point to fulfill the intent of the link creator. Another is that, if the content at the original URL changes, the link will continue to resolve but the content it returns may no longer reflect the intent of the link creator, although there may be an archival copy that does. Even in the early days of the Web it was evident that Web pages changed and vanished at an alarming rate.

The point is that the meaning of a generic Web URL is "whatever content, or lack of content, you find at this location". That is why URL stands for Universal Resource Locator. Note the difference with URI, which stands for Universal Resource Identifier. Anyone can create a URL or URI linking to whatever content they choose, but doing so provides no rights in or control over the linked-to content.

In People's Expensive NFTs Keep Vanishing. This Is Why, Ben Munster reports that:
over the past few months, numerous individuals have complained about their NFTs going “missing,” “disappearing,” or becoming otherwise unavailable on social media. This despite the oft-repeated NFT sales pitch: that NFT artworks are logged immutably, and irreversibly, onto the Ethereum blockchain.
So NTFs have the same problem that Web pages do. Isn't the blockchain supposed to make things immortal and immutable?

Kyle Orland's Ars Technica’s non-fungible guide to NFTs provides an over-simplified explanation:
When NFT’s are used to represent digital files (like GIFs or videos), however, those files usually aren’t stored directly “on-chain” in the token itself. Doing so for any decently sized file could get prohibitively expensive, given the cost of replicating those files across every user on the chain. Instead, most NFTs store the actual content as a simple URI string in their metadata, pointing to an Internet address where the digital thing actually resides.
NFTs are just links to the content they represent, not the content itself. The Bitcoin blockchain actually does contain some images, such as this ASCII portrait of Len Sassaman and some pornographic images. But the blocks of the Bitcoin blockchain were originally limited to 1MB and are now effectively limited to around 2MB, enough space for small image files. What’s the Maximum Ethereum Block Size? explains:
Instead of a fixed limit, Ethereum block size is bound by how many units of gas can be spent per block. This limit is known as the block gas limit ... At the time of writing this, miners are currently accepting blocks with an average block gas limit of around 10,000,000 gas. Currently, the average Ethereum block size is anywhere between 20 to 30 kb in size.
That's a little out-of-date. Currently the block gas limit is around 12.5M gas per block and the average block is about 45KB. Nowhere near enough space for a $69M JPEG. The NFT for an artwork can only be a link. Most NFTs are ERC-721 tokens, providing the optional Metadata extension:
/// @title ERC-721 Non-Fungible Token Standard, optional metadata extension
/// @dev See https://eips.ethereum.org/EIPS/eip-721
/// Note: the ERC-165 identifier for this interface is 0x5b5e139f.
interface ERC721Metadata /* is ERC721 */ {
/// @notice A descriptive name for a collection of NFTs in this contract
function name() external view returns (string _name);

/// @notice An abbreviated name for NFTs in this contract
function symbol() external view returns (string _symbol);

/// @notice A distinct Uniform Resource Identifier (URI) for a given asset.
/// @dev Throws if `_tokenId` is not a valid NFT. URIs are defined in RFC
/// 3986. The URI may point to a JSON file that conforms to the "ERC721
/// Metadata JSON Schema".
function tokenURI(uint256 _tokenId) external view returns (string);
}
The Metadata JSON Schema specifies an object with three string properties:
  • name: "Identifies the asset to which this NFT represents"
  • description: "Describes the asset to which this NFT represents"
  • image: "A URI pointing to a resource with mime type image/* representing the asset to which this NFT represents. Consider making any images at a width between 320 and 1080 pixels and aspect ratio between 1.91:1 and 4:5 inclusive."
Note that the JSON metadata is not in the Ethereum blockchain, it is only pointed to by the token on the chain. If the art-work is the "image", it is two links away from the blockchain. So, given the evanescent nature of Web links, the standard provides no guarantee that the metadata exists, or is unchanged from when the token was created. Even if it is, the standard provides no guarantee that the art-work exists or is unchanged from when the token is created.

Caveat emptor — Absent unspecified actions, the purchaser of an NFT is buying a supposedly immutable, non-fungible object that points to a URI pointing to another URI. In practice both are typically URLs. The token provides no assurance that either of these links resolves to content, or that the content they resolve to at any later time is what the purchaser believed at the time of purchase. There is no guarantee that the creator of the NFT had any copyright in, or other rights to, the content to which either of the links resolves at any particular time.

There are thus two issues to be resolved about the content of each of the NFT's links:
  • Does it exist? I.e. does it resolve to any content?
  • Is it valid? I.e. is the content to which it resolves unchanged from the time of purchase?
These are the same questions posed by the Holy Grail of Web archiving, persistent URLs.

Assuming existence for now, how can validity be assured? There have been a number of systems that address this problem by switching from naming files by their location, as URLs do, to naming files by their content by using the hash of the content as its name. The idea was the basis for Bram Cohen's highly successful BitTorrent — it doesn't matter where the data comes from provided its integrity is assured because the hash in the name matches the hash of the content.

The content-addressable file system most used for NFTs is the Interplanetary File System (IPFS). From its Wikipedia page:
As opposed to a centrally located server, IPFS is built around a decentralized system[5] of user-operators who hold a portion of the overall data, creating a resilient system of file storage and sharing. Any user in the network can serve a file by its content address, and other peers in the network can find and request that content from any node who has it using a distributed hash table (DHT). In contrast to BitTorrent, IPFS aims to create a single global network. This means that if Alice and Bob publish a block of data with the same hash, the peers downloading the content from Alice will exchange data with the ones downloading it from Bob.[6] IPFS aims to replace protocols used for static webpage delivery by using gateways which are accessible with HTTP.[7] Users may choose not to install an IPFS client on their device and instead use a public gateway.
If the purchaser gets both the NFT's metadata and the content to which it refers via IPFS URIs, they can be assured that the data is valid. What do these IPFS URIs look like? The (excellent) IPFS documentation explains:
https://ipfs.io/ipfs/<CID>
# e.g
https://ipfs.io/ipfs/Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu
Browsers that support IPFS can redirect these requests to your local IPFS node, while those that don't can fetch the resource from the ipfs.io gateway.

You can swap out ipfs.io for your own http-to-ipfs gateway, but you are then obliged to keep that gateway running forever. If your gateway goes down, users with IPFS aware tools will still be able to fetch the content from the IPFS network as long as any node still hosts it, but for those without, the link will be broken. Don't do that.
Note the assumption here that the ipfs.io gateway will be running forever. Note also that only some browsers are capable of accessing IPFS content without using a gateway. Thus the ipfs.io gateway is a single point of failure, although the failure is not complete. In practice NFTs using IPFS URIs are dependent upon the continued existence of Protocol Labs, the organization behind IPFS. The ipfs.io URIs in the NFT metadata are actually URLs; they don't point to IPFS, but to a Web server that accesses IPFS.

Pointing to the NFT's metadata and content using IPFS URIs assures their validity but does it assure their existence? The IPFS documentation's section Persistence, permanence, and pinning explains:
Nodes on the IPFS network can automatically cache resources they download, and keep those resources available for other nodes. This system depends on nodes being willing and able to cache and share resources with the network. Storage is finite, so nodes need to clear out some of their previously cached resources to make room for new resources. This process is called garbage collection.

To ensure that data persists on IPFS, and is not deleted during garbage collection, data can be pinned to one or more IPFS nodes. Pinning gives you control over disk space and data retention. As such, you should use that control to pin any content you wish to keep on IPFS indefinitely.
To assure the existence of the NFT's metadata and content they must both be not just written to IPFS but also pinned to at least one IPFS node.
To ensure that your important data is retained, you may want to use a pinning service. These services run lots of IPFS nodes and allow users to pin data on those nodes for a fee. Some services offer free storage-allowance for new users. Pinning services are handy when:
  • You don't have a lot of disk space, but you want to ensure your data sticks around.
  • Your computer is a laptop, phone, or tablet that will have intermittent connectivity to the network. Still, you want to be able to access your data on IPFS from anywhere at any time, even when the device you added it from is offline.
  • You want a backup that ensures your data is always available from another computer on the network if you accidentally delete or garbage-collect your data on your own computer.
Thus to assure the existence of the NFT's metadata and content pinning must be rented from a pinning service, another single point of failure.

In summary, it is possible to take enough precautions and pay enough ongoing fees to be reasonably assured that your $69M NFT and its metadata and the JPEG it refers to will remain accessible. Whether in practice these precautions are taken is definitely not always the case. David Gerard reports:
But functionally, IPFS works the same way as BitTorrent with magnet links — if nobody bothers seeding your file, there’s no file there. Nifty Gateway turn out not to bother to seed literally the files they sold, a few weeks later. [Twitter; Twitter]
Anil Dash claims to have invented, with Kevin McCoy, the concept of NFTs referencing Web URLs in 2014. He writes in his must-read NFTs Weren’t Supposed to End Like This:
Seven years later, all of today’s popular NFT platforms still use the same shortcut. This means that when someone buys an NFT, they’re not buying the actual digital artwork; they’re buying a link to it. And worse, they’re buying a link that, in many cases, lives on the website of a new start-up that’s likely to fail within a few years. Decades from now, how will anyone verify whether the linked artwork is the original?

All common NFT platforms today share some of these weaknesses. They still depend on one company staying in business to verify your art. They still depend on the old-fashioned pre-blockchain internet, where an artwork would suddenly vanish if someone forgot to renew a domain name. “Right now NFTs are built on an absolute house of cards constructed by the people selling them,” the software engineer Jonty Wareing recently wrote on Twitter.
My only disagreement with Dash is that, as someone who worked on archiving the "old-fashioned pre-blockchain internet" for two decades, I don't believe that there is a new-fangled post-blockchain Internet that makes the problems go away. And neither does David Gerard:
The pictures for NFTs are often stored on the Interplanetary File System, or IPFS. Blockchain promoters talk like IPFS is some sort of bulletproof cloud storage that works by magic and unicorns.
Read the whole story
jchalifour
10 days ago
reply
Montréal
Share this story
Delete

In facing death, Concordia neuroscientist Nadia Chaudhri has built a lasting legacy and inspired thousands

1 Share
Nadia Chaudhri

From a palliative care bed, Concordia neuroscientist Nadia Chaudhri has shared pockets of joy and wisdom that have resonated around the world, and raised funds for future scientists.

Read the whole story
jchalifour
15 days ago
reply
Montréal
Share this story
Delete

Unique literary festival focuses solely on Haudenosaunee storytelling, publishing

1 Share
Janet Rogers

What looks to be Canada’s first literary festival to focus on Haudenosaunee writing and storytelling will be held on Aug. 25 and 26.

Read the whole story
jchalifour
34 days ago
reply
Montréal
Share this story
Delete

Montreal's Mile End bids farewell to beloved art gallery and curiosity shop

1 Share
Monastiraki

After 23 years of selling local art, fine prints and vintage treasures, Monastiraki’s owner is shutting down his curiosity shop and art gallery on St-Laurent Boulevard in Montreal’s Mile End neighbourhood.

Read the whole story
jchalifour
35 days ago
reply
Montréal
Share this story
Delete

Five key takeaways from the UN climate report

1 Share
Read the whole story
jchalifour
47 days ago
reply
Montréal
Share this story
Delete

Quebec's relationship with forestry industry under scrutiny as pressure mounts to protect woodlands

1 Share
Forestry industry in Quebec

In the eyes of experts, environmental activists and Indigenous communities, Quebec’s Forest Ministry appears more interested in safeguarding the profits of industry than the forests themselves

Read the whole story
jchalifour
54 days ago
reply
Montréal
Share this story
Delete
Next Page of Stories