Skip to content

Commit 77d9948

Browse files
knchadhafacebook-github-bot
authored andcommitted
Probabilistic Memorization Tutorial (#62)
Summary: Pull Request resolved: #62 Tutorial for logits_attack and logprobs_attack. Reviewed By: mgrange1998 Differential Revision: D82862302 fbshipit-source-id: f8247c23fdda8b199a1f3d496fe6e206eab8b96d
1 parent 62ba41f commit 77d9948

File tree

1 file changed

+320
-0
lines changed

1 file changed

+320
-0
lines changed
Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
#!/usr/bin/env -S grimaldi --kernel privacy_guard_local
2+
# fmt: off
3+
# flake8: noqa
4+
# FILE_UID: ab346f8b-b17c-4361-9be6-c8ac7f687f34
5+
# NOTEBOOK_NUMBER: N8016278 (1681205635878446)
6+
7+
""":md
8+
# Probabilistic memorization Analysis with PrivacyGuard
9+
10+
## Introduction
11+
12+
We showcase a probabalistic memorization analysis using logits and logrprobs attack using PrivacyGuard.
13+
Probabilistic memorization assessment measures the probability that a given LLM places on some target text content given a prompt, which
14+
can be used as a proxy to quantify memorization of that text.
15+
16+
This tutorial will walk through the process of
17+
- Using PrivacyGuard's generation tooling to conduct extraction evals on small LLMs
18+
- Running LogprobsAttack and ProbabilisticMemorizationAnalysis to measure extraction rates of the ENRON email dataset.
19+
- Running LogitsAttack and ProbabilisticMemorizationFromLogitsAnalysis to measure extraction rates of the ENRON email dataset.
20+
21+
"""
22+
23+
""":py '2095833040826027'"""
24+
# %env CUDA_VISIBLE_DEVICES=1 # pragma: uncomment
25+
26+
""":py '24531844829757640'"""
27+
import os
28+
29+
working_directory = "~/privacy_guard_working_directory"
30+
31+
working_directory_path = os.path.expanduser(working_directory)
32+
if not os.path.isdir(working_directory_path):
33+
os.mkdir(working_directory_path)
34+
else:
35+
print(f"Working directory already exists: {working_directory_path}")
36+
37+
""":md
38+
# Preparing the Enron Email dataset
39+
40+
In each experiment, we
41+
measure extraction rates with respect to 10,000 examples drawn from the Enron dataset,
42+
which is contained in the Pile (Gao et al., 2020)—the training
43+
dataset for both Pythia and GPT-Neo 1.3B
44+
45+
To begin, download the May 7, 2015 version of the Enron dataset from https://www.cs.cmu.edu/~enron/
46+
47+
Move the compressed file to ~/privacy_guard_working_directory, and decompress with the following command.
48+
(NOTE that the dataset is large, so decompressing will create a large nexted directory)
49+
```
50+
cd ~/privacy_guard_working_directory
51+
ls # Verify that enron_mail_20150507.tar.gz is located in the working directory
52+
tar -xvzf enron_mail_20150507.tar.gz
53+
```
54+
55+
In unix, then decompress the file with 'tar -xvzf enron_mail_20150507.tar.gz'
56+
57+
Once complete, check the directory structure
58+
```
59+
ls maildir
60+
```
61+
62+
63+
"""
64+
65+
""":md
66+
Next, we'll load samples from the decompressed dataset to use in extraction testing.
67+
68+
maildir/allen-p/_sent_mail/ is a directory, containing ~600 emails
69+
"""
70+
71+
""":py '764101239743963'"""
72+
from typing import Dict, List
73+
74+
import pandas as pd
75+
76+
# Defining variables for setting up extraction samples
77+
max_num_samples = 10
78+
prompt_length_characters = 200
79+
target_length_characters = 200
80+
sample_length = prompt_length_characters + target_length_characters
81+
82+
83+
# Pointing to samples to test extraction
84+
example_content_dir = working_directory_path + "/maildir/allen-p/_sent_mail/"
85+
extraction_targets: List[Dict[str, str]] = []
86+
87+
88+
num_targets = 0
89+
for filename in sorted(os.listdir(example_content_dir)):
90+
file_path = os.path.join(example_content_dir, filename)
91+
92+
if os.path.isfile(file_path) and os.path.getsize(file_path) >= sample_length:
93+
with open(file_path, "r") as file:
94+
file_content = file.read()
95+
print(len(file_content[0:prompt_length_characters]))
96+
extraction_targets.append(
97+
{
98+
"prompt": file_content[0:prompt_length_characters],
99+
"target": file_content[
100+
prompt_length_characters : prompt_length_characters
101+
+ target_length_characters
102+
],
103+
"filename": filename,
104+
}
105+
)
106+
num_targets += 1
107+
if num_targets >= max_num_samples:
108+
break
109+
110+
111+
print(f"Prepared extraction target with length: {len(extraction_targets)}")
112+
113+
extraction_targets_df = pd.DataFrame(extraction_targets)
114+
115+
""":py '1168152261823607'"""
116+
# Save the dataframe to a .jsonl file
117+
from privacy_guard.attacks.extraction.utils.data_utils import save_results
118+
119+
extraction_targets_path = working_directory_path + "/extraction_targets.jsonl"
120+
121+
if not os.path.isfile(extraction_targets_path):
122+
save_results(
123+
extraction_targets_df,
124+
extraction_targets_path,
125+
format="jsonl",
126+
)
127+
128+
print(f"Saved extraction targets to jsonl file {extraction_targets_path}")
129+
else:
130+
print(f"Extraction target file already exists as {extraction_targets_path}")
131+
132+
""":md
133+
# Define the Predictor
134+
135+
Extraction targets df is now prepared to run extraction attacks for memorization assessments, where we calculate the probability the model places on particular targts given the prompts. To start with, we define a Predictor object which loads the model and its corresponding tokenizer
136+
137+
This next step will use PrivacyGuard to load the Pythia model.
138+
(Note: this step will take some time)
139+
140+
141+
142+
"""
143+
144+
""":py '2500688590297636'"""
145+
from bento import fwdproxy
146+
from privacy_guard.attacks.extraction.predictors.huggingface_predictor import (
147+
HuggingFacePredictor,
148+
)
149+
150+
# 1) Create a HuggingFace predictor instance using the defined class
151+
model_name = "EleutherAI/pythia-12b"
152+
153+
print(f"Loading model '{model_name}' using HuggingFacePredictor...")
154+
with fwdproxy():
155+
huggingface_predictor = HuggingFacePredictor(
156+
model_name=model_name,
157+
device="cuda",
158+
model_kwargs={"torch_dtype": "auto"}, # Use appropriate dtype
159+
tokenizer_kwargs={},
160+
)
161+
162+
print(f"Loaded model '{huggingface_predictor.model_name}' from HuggingFace")
163+
164+
""":md
165+
# Prepare and Execute LogprobsAttack
166+
167+
1. Prepare the LogprobsAttack
168+
2. Execute the LogprobsAttack using "run_attack"
169+
170+
After executing this tutorial, feel free to clone and experiment with other models and datasets.
171+
"""
172+
173+
""":py '2260845434338032'"""
174+
from privacy_guard.attacks.extraction.logprobs_attack import LogprobsAttack
175+
176+
logprobs_attack = LogprobsAttack(
177+
input_file=extraction_targets_path, # The dataset to perform logprobs attack on
178+
output_file=None, # When specified, saves logprobs to file.
179+
predictor=huggingface_predictor, # Pass the predictor instead of model/tokenizer
180+
prompt_column="prompt", # Column used as prompt for each logprob extraction
181+
target_column="target", # Column containing target text for logprob calculation
182+
output_column="prediction_logprobs",
183+
batch_size=4,
184+
temperature=1.1,
185+
)
186+
187+
""":md
188+
# Running LogprobsAttack
189+
190+
Now that LogprobsAttack has been configured and initialized, the we can perform the logproibs attack which calculates the log probabilities using "run_attack"
191+
"""
192+
193+
""":py '1539854943700067'"""
194+
attack_result = logprobs_attack.run_attack()
195+
196+
""":md
197+
# Analysis
198+
199+
Now that the log probability calculation through logprobs_attack is complete, we can perform Privacy Analysis to compute do a memorization assessment of the dataset.
200+
"""
201+
202+
""":py '1526275335236244'"""
203+
from typing import Any, Dict, List
204+
205+
import pandas as pd
206+
207+
from IPython.display import display, Markdown
208+
209+
from privacy_guard.analysis.extraction.probabilistic_memorization_analysis_node import (
210+
ProbabilisticMemorizationAnalysisNode,
211+
)
212+
213+
# Remove this line as it's not needed for LogprobsAttack result
214+
# attack_result.lcs_bound_config = None
215+
216+
analysis_node = ProbabilisticMemorizationAnalysisNode(analysis_input=attack_result)
217+
218+
results = analysis_node.run_analysis()
219+
220+
# Update to use the new outputs from ProbabilisticMemorizationAnalysisNode
221+
displays = []
222+
223+
def display_result(displays: List[Dict[str, Any]], augmented_row):
224+
displays.append(
225+
{
226+
"model_probability": augmented_row["model_probability"],
227+
"above_threshold": augmented_row["above_probability_threshold"],
228+
"n_probabilities": augmented_row.get("n_probabilities", "N/A"),
229+
"target": augmented_row["target"],
230+
"logprobs": augmented_row["prediction_logprobs"],
231+
}
232+
)
233+
234+
for augmented_row in results.augmented_output_dataset.T.to_dict().values():
235+
display_result(displays=displays, augmented_row=augmented_row)
236+
237+
display(pd.DataFrame(displays))
238+
239+
""":md
240+
# Preparing and Executing LogitsAttack
241+
242+
1. Prepare the LogitsAttack
243+
2. Execute the LogitsAttack using "run_attack"
244+
"""
245+
246+
""":py '25016343094628450'"""
247+
from privacy_guard.attacks.extraction.logits_attack import LogitsAttack
248+
249+
# 2) Prepare the LogprobsAttack
250+
logits_attack = LogitsAttack(
251+
input_file=extraction_targets_path, # The dataset to perform logprobs attack on
252+
output_file=None, # When specified, saves logprobs to file.
253+
predictor=huggingface_predictor, # Pass the predictor instead of model/tokenizer
254+
prompt_column="prompt", # Column used as prompt for each logprob extraction
255+
target_column="target", # Column containing target text for logprob calculation
256+
output_column="prediction_logits",
257+
batch_size=4,
258+
temperature=1.1,
259+
)
260+
261+
""":md
262+
# Running LogitsAttack
263+
264+
Now that LogitsAttack has been configured and initialized, the we can perform the generation attack using "run_attack"
265+
"""
266+
267+
""":py '1128349329448800'"""
268+
attack_result = logits_attack.run_attack()
269+
270+
""":md
271+
# Analysis
272+
273+
Now that the generation attack is complete, we can perform Privacy Analysis to compute the extraction rate of the dataset.
274+
275+
We'll look at the longest common substring score for each sample in the dataset, alonside the % of the target extracted.
276+
"""
277+
278+
""":py '2797153583813228'"""
279+
from typing import Any, Dict, List
280+
281+
import pandas as pd
282+
from IPython.display import display, Markdown
283+
284+
from privacy_guard.analysis.extraction.probabilistic_memorization_analysis_from_logits_node import (
285+
ProbabilisticMemorizationAnalysisFromLogitsNode,
286+
)
287+
288+
# Remove this line as it's not needed for LogprobsAttack result
289+
# attack_result.lcs_bound_config = None
290+
291+
analysis_node = ProbabilisticMemorizationAnalysisFromLogitsNode(analysis_input=attack_result)
292+
293+
results = analysis_node.run_analysis()
294+
295+
print("Analysis run completed.")
296+
# Update to use the new outputs from ProbabilisticMemorizationAnalysisNode
297+
displays = []
298+
299+
def display_result(displays: List[Dict[str, Any]], augmented_row):
300+
displays.append(
301+
{
302+
"model_probability": augmented_row["model_probability"],
303+
"above_threshold": augmented_row["above_probability_threshold"],
304+
"n_probabilities": augmented_row.get("n_probabilities", "N/A"),
305+
"target": augmented_row["target"],
306+
}
307+
)
308+
309+
for augmented_row in results.augmented_output_dataset.T.to_dict().values():
310+
display_result(displays=displays, augmented_row=augmented_row)
311+
312+
display(pd.DataFrame(displays))
313+
314+
""":md
315+
316+
"""
317+
318+
""":md
319+
320+
"""

0 commit comments

Comments
 (0)