Model Tokenization
Last updated
Last updated
There are multiple ways to mark and ensure the rights of a model for various scenarios. Considering the effectiveness and recognizability, embedding digital watermarks into the model is the most feasible and effective method. By embedding watermarks that are difficult to remove, illegal distribution of models can be tracked. This helps to protect intellectual property and prevent model leakage.
Watermarks can reveal information about the leaker and other copyright-related details. However, due to the black-box nature of deep learning models, traditional methods cannot embed watermarks into a model without affecting its original functions. So current watermarking methods integrate the embedding process into the training phase by designing specific loss functions, preventing the disruption of the model's original function through the training process. Inspired by traditional parameter-featured methods, we hide the watermark within the model's parameters, which can be decoded through a decoding process.
For generative models, training VAE or GAN module ensures that functionality of the model is not compromised. The advantage of this method is that the watermark is more stable, can hide information within the model, and has anti-attack properties.
The crucial parts in building such a platform include model tokenization, model verfication, model transaction, and transaction log recording. The entire process is divided into the following steps:
Watermark embedding and extraction
Unlike non-blockchain architectures, feature watermark generation requires users to manually select specific features. Because of the lack of control or simply selecting strategies, It may result in conflicts due to close Hamming distances between watermarks chosen by different users, making them difficult to distinguish. Therefore, utilizing smart contracts on blockchain for negotiating watermark is a better choice.
Users obtain a decoding matrix X and a watermark matrix K through the negotiation contract. NNegotiation is based on the keys allocated by smart contracts. It can achieve this goal by optimizing the loss function by adding a regularization term into the objective functions. In this design, we embed the watermark K into the elements of XW. The watermark embedding process fits the watermark into the model by training parameters. The blockchain records the mapping of the model along with its unique watermark. With the support of IPFS or other decentralized storage technologies, the model can be stored and published on-chain. Anyone can use the on-chain contract to know true owner of the model by invoking the query function.
When a user claims ownership of a model, we need to confirm it through watermark extraction. Smart contracts will search for the model token owned by the user and verify through decentralized validation mechanisms whether the watermark can be detected in the model. We multiply the decode matrix X with the model parameters W to extract the watermark and get model information by adding step function S. After information comparison with watermark K, it will be clear whether the two models are the same.
On-chain Processing
On the blockchain, smart contracts store the mapping between user addresses and watermarks with designed data structures to tokenize models. The core structure of the contract comprises three crucial components: the watermark negotiation module, metadata for model training process and the record of model ownership rights holders.
Once the blockchain completes the watermark negotiation process with users, the mapping between the user and the watermark is established. When a user claims ownership of a model, the smart contract searches this data structure using the user's address and the mapping of X and K to retrieve the model token owned by the user, then verifies through decentralized validation mechanisms whether the watermark can be detected in the model. If a transaction is required, updating the contract data structure records the latest holder information along with the transaction timestamp, completing on-chain data updates.
Additionally, it's essential to note that on-chain watermark validation requires a certain fault tolerance, it cannot be 100% accurate. Assuming we want to insert an n-bit watermark and the tolerance for the error rate is r, it means if the number of different bits between the original watermark and the extracted watermark is less than rn, we still confirm the copyright. The Hamming code indicates the count of different bits between two bit streams when designing an error-correcting code. In this scenario, the Hamming distance between the original watermark of different users must meet the minimum distance of 2rn. Otherwise, due to the possibility of generating the same decoding result from two watermarks within the fault tolerance range, the validation process will be confused, leading to ownership conflicts.