Accessing your python data science stack on the remote host with jupyter notebooks

How to install a decent python data science stack on the remote host and access the jupyter notebook from the local browser

At my place of work, we have access to a computing cluster, on which I am allowed to log in and run interactive shell sessions. One of my first use-cases was to log on in order to take advantage of the 12-core processor on the node computer to do some not-so-heavy parallel processing using a python script that I had written. I quickly ran into a couple of issues, though, because the default python installation was 2.7 and I had already written everything in python 3.5– not the biggest deal, however, I also needed to install some python packages for the whole thing to work. Furthermore, I found out that I wanted to quickly assess the output of my script with a couple of graphs without having to download all the data to my computer. Actually, I didn’t want the data on my computer anyway, because I only have 128 GB of storage, which is almost always running out on me.

Since anaconda wasn’t installed by default, I thought it was appropriate to install miniconda locally to take care of my package management. I also like the anaconda virtual environment functions, which are very easy to use. I could also use an environment file from a virtual environment on my local host in order to automate the process.

In this short tutorial, I will walk you through, first installing a miniconda installation in your home directory on a remote linux environment, followed by setting up an virtual environment in which you can run an interactive jupyter notebook session, which you can access from your browser on you computer. This is a fairly simple, but multi-step task, which I break down into the following three steps:

  1. install miniconda
  2. set up a virtual environment
  3. start the notebook server and login from your computer

1. install miniconda

Installing the miniconda installation is as easy as it gets. First, you’ll need to be logged into your home directory on the remote host. Then, use the following commands to download and install:

remote_host$: wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
remote_host$: bash Miniconda3-latest-Linux-x86_64.sh

 

After performing these commands, you will be led to the interactive legaleze terms-and-conditions prompt. You’ll press enter to get through it, but do it slowly. You’ll need to type yes to the install at the last step. If you press enter one too many times, then you will have the joy of going through the terms and conditions one more time.

You will also be asked where to locate the installation. To this you will want to answer you home directory. On the remote host it should be the following path: /home/<username>/miniconda3

After the installation, you’ll want to prepend the path to miniconda in your PATH inside your .bashrc script by adding the following line: export PATH="/home/<username>/miniconda3/bin:$PATH"

Do this with nano or your favorite text editing program:

# in your home directory
remote_host$: nano .bashrc

 

Now, you’ll need to restart your session by logging out and back in again.


2. set up virtual environment

Now it’s time to install your favorite python scientific stack. I recently used the following .yaml environment file to perform this same task, and can only highly recommend it. I have adapted this file from the one I downloaded here: https://drivendata.github.io/pydata-setup/

So, here’s how the environment file looks:

name: dsstack
dependencies:
- python=3
- jupyter_console
- jupyter_core
- libpng
- markupsafe
- matplotlib
- mistune
- nbconvert
- nbformat
- notebook
- numpy
- openssl
- pandas
- path.py
- pickleshare
- pip
- ptyprocess
- pygments
- pyparsing
- pyqt
- python
- python-dateutil
- pytz
- pyzmq
- qt
- qtconsole
- readline
- scikit-learn
- scipy
- seaborn
- setuptools
- simplegeneric
- singledispatch
- sip
- six
- sqlite
- ssl_match_hostname
- terminado
- tk
- tornado
- traitlets
- wheel
- xlrd
- zeromq
- zlib
- pip:
  - backports-abc
  - backports.ssl-match-hostname
  - folium
  - ipython-genutils
  - jupyter-client
  - jupyter-console
  - jupyter-core
  - pexpect

 

If you would like to customize the name of the environemnt, please change the string next to the name key at the top of the file. This environment will be called dsstack for data-science-stack.
Copy the contents of this .yml into a file called environment.yml and copy it to your home directory via scp:

local$: scp environment.yml <username>@remote_server:/home/<username>

 

After copying the enivronment file, go back to your home directory on the remote server and type the following command in the directory where the environment file is located:

remote_host$: conda env create -f environment.yml

 

After the environment succesfully installs, you will need to activate it. Use the following code:

remote_host$: source activate dsstack

3. start the notebook server and login from your computer

Now, it’s time to get the jupyter server running. This will allow you to work directly with files on the remote host as well as run python on the remote server, where you will have much more power if you need it. Before we start, I suggest that anytime you have a project that you are working on that you create a folder for this project. In this folder I would create a README.md file, which let’s you and anybody else know what your intentions are with this project. Also, create a folder called notebooks in this project folder. It is from this folder that you will start your jupyter notebook session.

I want to thank Nikolaus for posting these instructions in one of his blog posts. I had actually been looking for this exact solution for some time. Now go to your notebook folder, and type the following command:

remote_host$: jupyter notebook --no-browser --port=8889

 

Here, you are telling the jupyter notebook that you don’t have a browser on the machine (it could possibly throw an error if you don’t type this, complaining about needing javascript…). The second option tells the notebook on which port it should be available. Now, go to your local machine and run the following code:

local$: ssh -N -L localhost:8888:localhost:8889 <your_last_name>@remote_host

 

The first option -N tells SSH that no remote commands will be executed, and is useful for port forwarding. The last option -L lists the port forwarding configuration (remote port 8889 to local port 8888).

Open the browser on your local machine to the following address: localhost:8888

If everything went well, you should have an interactive jupyter notebook session running in your browser.


Alright, if you have come this far, you have done it. Congratulations! If you have any questions, please feel free to post a comment, or just tell me what you think.