Dissecting Recall of Factual Associations in Auto-Regressive Language Models

This is the official code repository for the paper Dissecting Recall of Factual Associations in Auto-Regressive Language Models, Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson. 2023.

Setup

The Jupyter Notebook includes the code for running our experiments in the paper over GPT2-xl. The file can be opened in Google Colaboratory.

All the experiments in our work were conducted in a Python 7 environment, with a single GPU (V100 for GPT2-xl and A100 for GPT-J). Required python packages with specific versions are listed in requiements.txt.

Experiments

The notebook has 5 sections. The first section sets up the environment (i.e., importing the relevant packages, loading the data for evaluation and the model), and the second section includes an implementation of various hooks on different modules in the network, that are needed for intermediate information extraction and interventions.

The next three sections cover our main experiments, according to the three main sections in the paper:

Information Flow Analysis, which includes our attention knockout method for studying information flow during inference.
Attribute extraction, including projection of hidden representations and sublayer outputs to the vocabulary and patching of hidden states.
Attributes rate evaluation (subject enrichment). This includes our evaluation process against paragraphs, and sublayer knockout experiments.

For guidance on how to employ this code to other models, please see the comment below.

Parameters inspection in the vocabulary space

Attention heads: To interpret the parameters of attention heads in the network, we have used the method introduced by Dar et al., ACL 2023, as implemented in the official code repository released by the authors: https://github.com/guyd1995/embedding-space.

MLP sub-updates: To interpret the parameters of the MLP sublayers, we used the code of LM-Debugger, an open source tool introduced by Geva et al., EMNLP Demo 2023.

Adjustments for Other Models

Our code can be applied as-is for other sizes of GPT2, and can be easily adjusted to other transformer-based models available on Huggingface. To adjust the experiments to other models, the following adaptations are needed:

(1) Code modifications. Different models on Huggingface have different naming schemes (e.g., the first MLP matrix is called c_fc in GPT2 and fc_in in GPT-J). The hooks used to extract intermediate information and intervene on the network computation are applied to modules by their names. Therefore, the part in the code that needs to be adjusted when switching to other models is the hooks, where the modification merely should adjust the names of the hooked modules. One way to inspect the names is to look at the source code in Huggingface (e.g. for GPT2 and GPT-J)

(2) Data generation. Generating a set of input queries where the model's next token prediction matches the correct attribute. This in order to make sure that the analysis focuses on predictions that involve an attribute extraction. To do this, one can follow the simple procedure described in Meng et al., NeurIPS 2022.

This is not an officially supported Google product.

google-research

Dissecting Recall of Factual Associations in Auto-Regressive Language Models

Setup

Experiments

Parameters inspection in the vocabulary space

Adjustments for Other Models

Использование cookies

google-research

DDDaniel DuckworthAdd demo notebook for SMERF6 месяцев назадf9150d

Dissecting Recall of Factual Associations in Auto-Regressive Language Models

Setup

Experiments

Parameters inspection in the vocabulary space

Adjustments for Other Models

Использование cookies