Robust Distortion-free Watermarks for Language Models

Рет қаралды 635

18 күн бұрын

A Google TechTalk, presented by John Thickstun, 2024-07-02
A Google Algorithms Seminar. ABSTRACT: We describe a protocol for planting a watermark into text generated by an autoregressive language model (LM) that is robust to edits and does not change the distribution of generated text. We generate watermarked text by controlling the source of randomness--using a secret watermark key--that the LM decoder uses to convert probabilities of text into samples. To detect the watermark, any party who knows the key can test for statistical correlations between a snippet of text and the watermark key; meanwhile, our watermark is provably undetectable by anyone who does not know the watermark key. We instantiate our watermarking protocol with two alternative decoders: inverse transform sampling and Gumbel argmax sampling. We apply these watermarks to the OPT-1.3B, LLaMA 7B, and instruction-tuned Alpaca 7B LMs to experimentally validate their statistical power and robustness to various paraphrasing attacks. ArXiv: arxiv.org/abs/2307.15593
ABOUT THE SPEAKER: John Thickstun is a postdoctoral researcher at Stanford University, working with Percy Liang. Previously, he completed a PhD at the University of Washington, advised by Sham M. Kakade and Zaid Harchaoui. His current research focus is to improve the capabilities and controllability of generative models. His work has been featured in media outlets including TechCrunch and the Times of London, recognized by outstanding-paper awards at NeurIPS and ACL, and supported by an NSF Graduate Fellowship and a Qualcomm Innovation Fellowship.

Пікірлер: 2

@wolpumba4099 16 күн бұрын

*Summary* *Problem:* - Detecting AI-generated text is challenging, especially with improving realism of language model outputs. *(**0:14**)* - Traditional watermarking techniques introduce distortion, impacting text quality. *(**2:40**)* *Proposed Solution:* A distortion-free statistical watermarking technique for language models. *(**2:54**)* *Key Features:* * *Leverages LLM generation process:* Watermark is embedded during text generation by controlling the random number generator (RNG) used by the decoder. *(**3:57**)* * *Secret Key:* Watermark is detectable only by those possessing the secret key used to seed the RNG. *(**9:31**)* * *Statistical Correlations:* Detection relies on testing for statistical correlations between the generated text and the watermark key. *(**9:31**)* * *Distortion-Free:* Watermarking process does not alter the distribution of generated text, ensuring output quality. *(**12:23**)* * *Robustness:* Remains detectable even after moderate text editing or paraphrasing. *(**15:06**)* *How it Works:* 1. *RNG Sequence Generation:* A long sequence of random numbers is pre-generated using the secret key. *(**13:07**)* 2. *Random Subsequence Selection:* A random subsequence of these numbers drives the text generation process. *(**13:07**)* 3. *Watermark Detection:* *(**15:32**)* - Search for all possible alignments of the test text with the pre-generated RNG sequence. *(**15:43**)* - Calculate the optimal alignment using Levenshtein distance, accounting for potential edits. *(**16:04**)* - Test for statistical correlations between the aligned text and random numbers. *(**22:30**)* *Advantages:* * No trade-off between robustness and distortion. *(**41:00**)* * Provably undetectable without the secret key. *(**9:31**)* *Limitations:* * Detection complexity scales with RNG sequence length, which needs to be large. *(**41:00**)* * Less effective on low-entropy text with limited creative choices. *(**39:43**)* *Future Work:* * Resolve tension between robustness, distortion-freeness, and detection complexity. *(**52:03**)* * Design more powerful hypothesis tests and explore alternative decoder functions for stronger watermark correlations. *(**52:45**)* i used gemini 1.5 pro to summarize the transcript