The problem? I have many Jupyter Notebooks, and I need an easy to search through them all. I am always remembering that I have some snippet of code somewhere in these notebooks so need an easy to way to find it.

Enter Elasticsearch

Elasticsearch "is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents" That means that you can put lots of documents full of text into it, and it will index all of this, and make it easy to search

The Elasticsearch folks have also built Kibana, which is "proprietary data visualization dashboard software for Elasticsearch". That means that Kibana is a handy GUI tool you can use to quickly search your data, similiar to using something like a Google searchbox

In this notebook, I will demonstrate how to:

  1. Setup Docker containers for Elasticsearch and Kibana on a shared Docker network
  2. Use nbconvert to convert your .ipynb files to a list of strings
  3. Upload that list of strings into a Elasticsearch database
  4. Search through the notebooks using either the Python Elasticsearch library, or Kibana

Prerequisites:
That you know how to create a Jupyter Notebook and save it somewhere, and that you have Docker installed.

Caveat:
This assumes your use case is that you want to easily be able to search through your own notebooks as part of your workflow, and as such I am going to ignore some Elasticsearch security options. Which is why I have an ignore warnings filter in this notebook. But don't use this approach if you are planning somehing that is not dev.

This will all take about 5 minutes.

Set up and Hello World

Let's start with some setup. Open a Terminal on your machine (bash on Linux, PowerShell on Windows, whatever). Then run the following commands.

  1. docker network create elastic
    This wil tell Docker to create a docker network called 'elastic'
  2. docker pull docker.elastic.co/elasticsearch/elasticsearch:7.13.1
    This will pull down an image of the latest version of elasticsearch (7.13.1) at the time of writing this
  3. docker run --name es01-test --net elastic -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.13.1
    This will create a docker container on the docker network 'elastic', expose some ports so you can acess it. If you go to localhost:9200 you will see a welcom message

So that's it for installing and getting Elasticsearch up and running. Now let's do the same for Kibana. Open a new shell and run the following commands:

  1. docker pull docker.elastic.co/kibana/kibana:7.13.1
    This will pull down and image of Kibana
  2. docker run --name kib01-test --net elastic -p 5601:5601 -e "ELASTICSEARCH_HOSTS=http://es01-test:9200" docker.elastic.co/kibana/kibana:7.13.1
    This will create a container for Kibana. It will see the Elasticsearch instance and be connected to it. You can go to localhost:5601 and see the Kibana homepage

So looks like it is working. Let's make sure our notebook can see it to:

So this notebook can connect to the Elasticsearch instance. Note that I am using host.docker.internal in my URL in that get request. This is because I have set up my Jupyter up in Docker as well (details at: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/selecting.html). If you have installed an Anaconda instance or something, this URL would be localhost rather than host.docker.internal

Now, using our Python elasticsearch library, let's create a connection to Elasticsearch

Let's create an index (think of this as a db) and put some data into it:

We can go to http://localhost:9200/testing-index/_search?pretty=true&q=*:* and see the data now exists in Elasticsearch. Or we could just retreive it using the Python elastic search library

So that works. Let's delete it now:

Extracting Text from Jupyter Notebooks

Now let's do something a little more substantial. I have a folder full of Jupyter Notebooks and I always need code from one or another. So let's create a function to extract all the text from the notebooks. First I need a list of names of the Jupyter Notebooks from the directory in which they are located:

Now, let's create function to extract for the notebooks, and then we will just iterate each of the notebooks to extrat the code:

We get back a list of lists of all the notebooks. Let's check we are on the right trck by looking at, say, the fith item in the first notebook:

Now let's iterate through all those notebooks converted to text, and push them into Elasticsearch. Elasticsearch will want something JSON like so that's is what we will give it. I notice it seems to return an empty object but still appears to work:

Searching Notebooks

So now all the data is in Elasticsearch. Now we want to search it. There are three options to do this:

  1. You can use the Python elasticsearch library to run queries
    This can be quite handy I will cover some examples below
  2. Use Kibana to search
    This if fun to use, but I probably won't use it enough remember a Kibana proprietry query langage, (I can barely remember SQL these days). But this does allow a GUI search box and filters and all that.
  3. I could pass query params in a url to search, such as http://localhost:9200/testing-index/_search?pretty=true&q=*:*
    If you are into this kind of thing, like if you love Postman or something it could be handy I guess. For our purposes I wouldn't do this, and won't cover it

Option 1: Using Python

This is my preferred way of doing it. Here are some handy getting started searches you can do to look through your data that has been put into Elasticsearch:

It supports all kind of queries to match text, partial match, etc. Here is another example, the use case here is that I know I have a notebook where I have done some work on Baltimore Crime Data, but can't remember where. So will put in the prefix "crim" and let Elasticsearch do its thing.

Note that things can get a bit messy, so I would advise you to keep you query in a seperate Python dictionary, and then just pass that into the search:

Option 2: Using Kibana

Kibana's cool, but after a while I did get annoyed at the UI But if you are going to use it

  1. Go to http://localhost:5601/app/home#/ which should be up and running
  2. From the dropdown on the left, go the "Stack Management" menu item. This will take you to http://localhost:5601/app/management
  3. Choose the index pattern option. You will be taken to http://localhost:5601/app/management/kibana/indexPatterns
  4. Go to http://localhost:5601/app/management/kibana/indexPatterns/create
  5. Choose your index/database tha tis listed, and follow the prompts to set it up
  6. Then go back to http://localhost:5601/app/home#/ and choose "Discover" from the left hand index

From there you will have a search box and some filters, and all kinds of cool things you can check.

Enjoy!