How and why you should use conda environments

I recently received a distress email from a buddy of mine, who is fairly wet behind the ears when it comes to python. He was having an issue getting a package on GitHub to run on his computer, because it was written for python 2.7, and he had installed python3 on his computer with an anaconda installation. So, obviously, my good friend hasn’t been introduced to the wonderful world of conda environments. Now, if you are scratching your head right now, because you don’t know what I’m talking about, or, maybe, you heard about these mystical unicorn like environments, but still don’t know if they are for you — just stick with me.

Basically, creating environments with conda is the perfect way to keep your main python installation on your computer clean, avoiding any type of conflicts that might arise due installing different package versions, because certain packages require certain package versions. So, anytime that you need to start a new project, which requires you to install new packages, then please think about this post. Also, if you are forking a GitHub repo and it requires special packages, as is the case with my friend, just say to yourself “conda environments”.

conda environments,  here we come. So, the first place to look is at the official documentation at http://conda.pydata.org/docs/using/index.html. Here you will find out everything you need to know, and probably a little bit more.

But in a nutshell, here’s is what to do. Go to your command line and create a new environment wit the following line of code:

$: conda create --name myenv python=2.7

 

Now, this line of code will create a basic python2.7 installation on your computer. It will be saved in the anaconda installation directory, which for me is in the home directory: ~/anaconda/envs/myenv.

In order to use it, you will follow the directions at the end of the installation, which tell you how to activate and deactivate your environment:

#to activate
$: source activate myenv
# the command line will change to reflect this by showing your env name in parenthesis.
(myenv) $:

 

In order to deactivate your environment, you would do the following:

(myenv) $: source deactivate
$:

 

So, once your environment is up and running, just activate it, and start installing what ever you need to use for your new project. Actually, if you know what you need, you can input this as an arguement at the time that you create the environment.

$: conda create --name myenv python=2.7 important-package

 

For more information on creating a conda environment which comes with a python data science development stack, please have a look at my other blog post: https://python-wrangler.com/accessing-your-python-data-science-stack-on-the-remote-host-with-jupyter-notebooks/

Okay, that’s enough for today. I think this was enough to answer my friend’s question, and get him back to work. Let me know if this was useful, or tell me how you use conda environments for your particular use-case in the comments section below.

 

 

Accessing your python data science stack on the remote host with jupyter notebooks

How to install a decent python data science stack on the remote host and access the jupyter notebook from the local browser

At my place of work, we have access to a computing cluster, on which I am allowed to log in and run interactive shell sessions. One of my first use-cases was to log on in order to take advantage of the 12-core processor on the node computer to do some not-so-heavy parallel processing using a python script that I had written. I quickly ran into a couple of issues, though, because the default python installation was 2.7 and I had already written everything in python 3.5– not the biggest deal, however, I also needed to install some python packages for the whole thing to work. Furthermore, I found out that I wanted to quickly assess the output of my script with a couple of graphs without having to download all the data to my computer. Actually, I didn’t want the data on my computer anyway, because I only have 128 GB of storage, which is almost always running out on me.

Since anaconda wasn’t installed by default, I thought it was appropriate to install miniconda locally to take care of my package management. I also like the anaconda virtual environment functions, which are very easy to use. I could also use an environment file from a virtual environment on my local host in order to automate the process.

In this short tutorial, I will walk you through, first installing a miniconda installation in your home directory on a remote linux environment, followed by setting up an virtual environment in which you can run an interactive jupyter notebook session, which you can access from your browser on you computer. This is a fairly simple, but multi-step task, which I break down into the following three steps:

  1. install miniconda
  2. set up a virtual environment
  3. start the notebook server and login from your computer

1. install miniconda

Installing the miniconda installation is as easy as it gets. First, you’ll need to be logged into your home directory on the remote host. Then, use the following commands to download and install:

remote_host$: wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
remote_host$: bash Miniconda3-latest-Linux-x86_64.sh

 

After performing these commands, you will be led to the interactive legaleze terms-and-conditions prompt. You’ll press enter to get through it, but do it slowly. You’ll need to type yes to the install at the last step. If you press enter one too many times, then you will have the joy of going through the terms and conditions one more time.

You will also be asked where to locate the installation. To this you will want to answer you home directory. On the remote host it should be the following path: /home/<username>/miniconda3

After the installation, you’ll want to prepend the path to miniconda in your PATH inside your .bashrc script by adding the following line: export PATH="/home/<username>/miniconda3/bin:$PATH"

Do this with nano or your favorite text editing program:

# in your home directory
remote_host$: nano .bashrc

 

Now, you’ll need to restart your session by logging out and back in again.


2. set up virtual environment

Now it’s time to install your favorite python scientific stack. I recently used the following .yaml environment file to perform this same task, and can only highly recommend it. I have adapted this file from the one I downloaded here: https://drivendata.github.io/pydata-setup/

So, here’s how the environment file looks:

name: dsstack
dependencies:
- python=3
- jupyter_console
- jupyter_core
- libpng
- markupsafe
- matplotlib
- mistune
- nbconvert
- nbformat
- notebook
- numpy
- openssl
- pandas
- path.py
- pickleshare
- pip
- ptyprocess
- pygments
- pyparsing
- pyqt
- python
- python-dateutil
- pytz
- pyzmq
- qt
- qtconsole
- readline
- scikit-learn
- scipy
- seaborn
- setuptools
- simplegeneric
- singledispatch
- sip
- six
- sqlite
- ssl_match_hostname
- terminado
- tk
- tornado
- traitlets
- wheel
- xlrd
- zeromq
- zlib
- pip:
  - backports-abc
  - backports.ssl-match-hostname
  - folium
  - ipython-genutils
  - jupyter-client
  - jupyter-console
  - jupyter-core
  - pexpect

 

If you would like to customize the name of the environemnt, please change the string next to the name key at the top of the file. This environment will be called dsstack for data-science-stack.
Copy the contents of this .yml into a file called environment.yml and copy it to your home directory via scp:

local$: scp environment.yml <username>@remote_server:/home/<username>

 

After copying the enivronment file, go back to your home directory on the remote server and type the following command in the directory where the environment file is located:

remote_host$: conda env create -f environment.yml

 

After the environment succesfully installs, you will need to activate it. Use the following code:

remote_host$: source activate dsstack

3. start the notebook server and login from your computer

Now, it’s time to get the jupyter server running. This will allow you to work directly with files on the remote host as well as run python on the remote server, where you will have much more power if you need it. Before we start, I suggest that anytime you have a project that you are working on that you create a folder for this project. In this folder I would create a README.md file, which let’s you and anybody else know what your intentions are with this project. Also, create a folder called notebooks in this project folder. It is from this folder that you will start your jupyter notebook session.

I want to thank Nikolaus for posting these instructions in one of his blog posts. I had actually been looking for this exact solution for some time. Now go to your notebook folder, and type the following command:

remote_host$: jupyter notebook --no-browser --port=8889

 

Here, you are telling the jupyter notebook that you don’t have a browser on the machine (it could possibly throw an error if you don’t type this, complaining about needing javascript…). The second option tells the notebook on which port it should be available. Now, go to your local machine and run the following code:

local$: ssh -N -L localhost:8888:localhost:8889 <your_last_name>@remote_host

 

The first option -N tells SSH that no remote commands will be executed, and is useful for port forwarding. The last option -L lists the port forwarding configuration (remote port 8889 to local port 8888).

Open the browser on your local machine to the following address: localhost:8888

If everything went well, you should have an interactive jupyter notebook session running in your browser.


Alright, if you have come this far, you have done it. Congratulations! If you have any questions, please feel free to post a comment, or just tell me what you think.