google-research

In human-human conversations, Context Tracking deals with identifying important entities and keeping track of their properties and relationships. This is a challenging problem involving several subtasks such as entity recognition, attribute classification, coreference resolution and resolving plural mentions. The Contrack tool approaches this problem as an end-to-end modeling task where the conversational context is represented by an entity repository containing the entities mentioned so far, their properties and relationships between them. The repository is updated incrementally turn-by-turn, thus making it computationally efficient and capable of handling long conversations.

Contributions to the codebase are welcome and we would love to hear back from you if you find this codebase useful. Finally if you use Contrack for a research publication, please consider citing:

Towards a Unified Approach to Entity-Centric Context Tracking in Conversations, Ulrich Rückert, Srinivas Sunkara, Abhinav Rastogi, Sushant Prakash, Pranav Khaitan

Installation

The following instructions are for installing on Ubuntu 18.04.

Make sure you have python3 and bazel installed. Follow the instructions here to install bazel.

Download the contrack subdirectory:

svn export https://github.com/google-research/google-research/trunk/contrack
# Or
git clone https://github.com/google-research/google-research.git

Create and enter a virtual environment (optional but preferred):

virtualenv -p python3 contrack_env
source ./contrack_env/bin/activate

Install the dependencies:
```
cd contrack
python3 configure.py
```
If you want to use an existing installation of tensorflow and gensim, run the configuration tool with the no-deps flag to skip dependency installation:
```
python3 configure.py --no-deps
```

Compile the source code:

bazel build //:preprocess //:train //:predict

Usage

Here is an example on how to preprpocess a small example data file and train a model on it.

Download word2vec data used during preprocessing:

mkdir /tmp/contrack_data
export DATA_DIR=/tmp/contrack_data
wget -c -P $DATA_DIR "https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz"
gunzip $DATA_DIR/GoogleNews-vectors-negative300.bin.gz

Run the preprocess tool to convert text conversations to TFRecord format.

mkdir /tmp/contrack_example
export BASE_DIR=/tmp/contrack_example
./bazel-bin/preprocess --input_file=data/example_conversations.txt \
  --output_dir=$BASE_DIR \
  --tokenizer_handle="https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3" \
  --bert_handle="https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/3" \
  --wordvec_path=$DATA_DIR/GoogleNews-vectors-negative300.bin \
  --logtostderr

Train a model on the TFRecord data. (A GPU is not necessary, but recommended for faster training.)

cp data/example_config.json $BASE_DIR/config.json
./bazel-bin/train --train_data_glob $BASE_DIR/example_conversations.tfrecord \
  --config_path $BASE_DIR/config.json --model_path $BASE_DIR/model \
  --mode=two_steps --logtostderr

Apply model on some dataset. The accuracy measures on the dataset will be output to the logfile.

./bazel-bin/predict --input_data_glob $BASE_DIR/example_conversations.tfrecord \
  --model_path $BASE_DIR/model --logtostderr

google-research

DDDaniel DuckworthAdd demo notebook for SMERF6 месяцев назадf9150d

Contrack

Installation

Usage

Использование cookies