Installing prerequisites for working with BERT on Korenvliet

We need to make sure our development environment on Korenvliet can successfully run the code/experiments we want to submit to the cluster.

Log in to Korenvliet and check whether you have a (non-empty) ~/.profile and ~/.bashrc in your home folder. On korenvliet:

    $ ls -la

if you don't see these files in your home folder, copy them from /etc/skel/.profile and /etc/skel/.bashrc (!)

  $ cd
  $ cp /etc/skel/.profile .
  $ cp /etc/skel/.bashrc .

We will use these files to activate certain properties of the development environment, such as a specific versions of Python and CUDA.

Cluster nodes do not always have the latest release version available, so you need to check here which is the most recent installed version in the cluster.

At the time of writing, the latest official release of Python is 3.11. The latest version available on Korenvliet however is version 3.10, so we'll just have to work with that.

Add this line to your .bashrc file: /home/your_ad_username/.bashrc on Korenvliet:

  module load python/3.10.7

Watch out: when you want to run the Python interpreter, always use $ python3 or python3.10, as $ python might in unexpected cases revert to an older python (3.8.5 being the default version at the time of writing), leading to weird errors.

We are going to need CUDA support (support for running pytorch/transformers on a GPU instead of CPU). Check the software list of the HPC cluster to find the latest version of CUDA that is supported by the cluster. Add the following line to your .bashrc file: /home/your_ad_username/.bashrc

  module load nvidia/cuda-11.2

Log out of Korenvliet and log in again to reload your environment and make the changes to your bashrc persistent. You can check whether it worked by running $ python -V after login. It should show Python version 3.10.7.

Before installing or running anything, double-check that you are running the right version of Python and Pip:

  $ python3 -V should print "Python 3.10.7"
  $ pip -V should print "pip 19.x.x from /deepstore/software/python/3.10.7/lib/python3.10/site-packages/pip (python 3.10)"

To work with BERT we need Python library transformers (by huggingface). Transformers documentation. Transformers uses either pytorch or Tensorflow for the back-end. In this guide we use pytorch.

If you want to use CUDA and Tensorflow, check out this compatibility overview.

To install Transformers, follow the regular installation instructions.

TL;DR:

  create a venv with Python version 3.10:
      $ python3.10 -m venv venv
      $ source venv/bin/activate
  install Pytorch according to the docs here (select the CUDA version that you added to .bashrc in an earlier step)
      You command will look something like this (with different torch versions): $ pip install torch==1.7.0+cu101 torchvision==0.8.1+cu101 torchaudio==0.7.0 -f https://download.pytorch.org/whl/torch_stable.html
  install transformers with pip:
      $ pip install transformers

Now we have Pytorch and Transformers with CUDA support. Testing Transformers on Korenvliet

Make sure the venv is active: $ source venv/bin/activate

And then run the following oneliner to check transformer-functionality: $ python3 -c “from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))”

Output:

/home/your_ad_username/git/affective_BERT/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 8000). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0
Downloading: 100%|██████████████████████████████████████████████████████| 629/629 [00:00<00:00, 303kB/s]
Downloading: 100%|███████████████████████████████████████████████████| 268M/268M [00:04<00:00, 56.5MB/s]
Downloading: 100%|████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 618kB/s]
Downloading: 100%|██████████████████████████████████████████████████████| 230/230 [00:00<00:00, 200kB/s]
[{'label': 'POSITIVE', 'score': 0.9998704791069031}]

This takes 3 seconds to run (=slow!), but it works! Python complains that it does not use the right module for CUDA (8 instead of 10.1). That may be because we run this python script interactively on Korenvliet (just once, for testing), and not as job on the cluster nodes (which is what we would do normally). You can see Transformers automatically downloads the default models for performing sentiment analysis (current: BERT models) to our hard drive. Finally, it gives us a sentiment rating for “We love you”: a POSITIVE label, with a valence rating of 0.99987(…).