• Skip to primary navigation
  • Skip to main content
  • Skip to footer

In Out Code

Your comprehensive guide to optimized code

  • Data Structures
  • Python
    • Control Flow
    • HackerRank
    • Input and Output
    • Modules
  • AWS

How to Setup Neo4j on AWS EC2

You are here: Home / AWS / How to Setup Neo4j on AWS EC2

June 24, 2020 by Daniel Andrews

Neo4j is one of the world’s leading graph database management systems, with support for the AWS, Azure, and Google Cloud platforms.

In this article, we’ll walk you through how to get setup in AWS. This includes:

  • Hosting the Neo4j Community edition on EC2
  • Using the Neo4j Python driver to execute transactions against the database
  • Creating a Lambda function to access the database

Create the EC2 Instance

  1. Sign in to your AWS account
  2. Open the EC2 service
  3. Click on Instances, then Launch Instance
  4. Select the AWS Marketplace menu item, then search for Neo4j
  5. Neo4j Community Edition EC2 Marketplace AMI

  6. Select Neo4j Graph Database – Community Edition
  7. Select the EC2 Instance Type. Neo4j currently recommend m4.large or higher
  8. Neo4j Community Edition EC2 Instance Type

  9. Click Next: Configure Instance Details
  10. Select a VPC and subnet to launch the EC2 instance into
  11. Click Next: Add Storage
  12. Select your Root volume size and attach any additional EBS volumes you need. (Neo4j stores its data on local volumes, so ensure you have enough space to store the required data)
  13. Neo4j Community Edition EC2 Add Storage

  14. Select Review and Launch

Connect to the EC2 Instance

  1. Look up the public IP of the EC2 instance, then go to https://MY_PUBLIC_IP:7473
  2. Connect to Neo4j

  3. Enter neo4j as the username, and neo4j as the password
  4. After connecting the first time, it will prompt you to setup a new password. Once this is provided, click Change Password
  5. You should now be connected to the Neo4j database
  6. Neo4j Graph Databases on EC2

Extract Data Using AWS Lambda

So, you’ve got a Neo4j database hosted on an EC2 machine, and can access the GUI via your web browser. But what if you want to run queries against the database programatically?

One way to achieve this is using Lambda, Python, and the Neo4j Python driver.

Creating the SSM Parameters

When starting a new Neo4j session using the Python driver, you need to specify the uri, username, and password.

The values for each of these items can be stored in the AWS SSM Parameter Store. Just remember to attach the necessary IAM policy to your Lambda function role (AmazonSSMReadOnlyAccess policy will work, as we only need to run SSM:GetParameter).

To create our SSM parameters:

  1. Open AWS Systems Manager, then click on Parameter Store
  2. Create your 3 parameters. uri and username can be String, with password being SecureString
  3. Neo4j Login Details in AWS SSM Parameter Store

We can now retrieve the values from SSM using our Lambda function, instead of having to hard code them.

Creating the Lambda Function

Our Lambda function will be written in Python, but Neo4j also have drivers to support a wide range of programming languages. The full list can be found here.

import logging
import traceback
import boto3
import os
from neo4j import GraphDatabase

ssm = boto3.client('ssm')

logging.getLogger().setLevel(logging.INFO)

def lambda_handler(event, context):
    logging.info("Running handler")
    
    # Connect to the Neo4j database and open a new session
    db_uri = ssm.get_parameter(Name='/Prod/Neo4j/uri')
    username = ssm.get_parameter(Name='/Prod/Neo4j/username')
    password = ssm.get_parameter(Name='/Prod/Neo4j/password', WithDecryption=True)

    uri = db_uri['Parameter']['Value']
    username = username['Parameter']['Value']
    password = password['Parameter']['Value']
    
    session = connect_db(uri, username, password)
    
    # Read data from the database
    treatment_data = read_from_db(session)
    
    # Close our database session 
    disconnect_db(session)
    
    return(treatment_data)

def connect_db(uri, user, password):
    try:
        driver = GraphDatabase.driver(uri, auth=(user, password))
        session = driver.session()
    except Exception as error:
        msg = "".join(traceback.format_tb(error.__traceback__))
        logging.info(
                "error connecting to Neo4j database. %s:%s\n%s",
                type(error),
                error,
                msg,
                )
    logging.info("Successfully connected to Neo4j database")
    
    return session

def disconnect_db(session):
    logging.info("Closing Neo4j session")
    session.close()

def read_from_db(session):
    result = session.read_transaction(data_to_read)
        
    return result

def write_to_db(session):
    result = session.write_transaction(data_to_write)
    
    return result

def data_to_read(tx):
    cypher_query = '''
    CYPHER_QUERY
    '''
    
    result = tx.run(cypher_query)
    
    result_list = [record["field_name"] for record in result]
    
    return result_list

def data_to_write(tx):     
    cypher_query = '''
    CYPHER_QUERY
    '''
    
    result = tx.run(cypher_query)
    
    result_list = [record["field_name"] for record in result]
    
    return result_list

You can also pass parameter values into your Cypher query.

Example:
If we wanted to pass in status and name as variables, we would use $status and $name in our Cypher query, then pass in the values using result = tx.run(cypher_query, {'status':'ACTIVE', 'name':'Untreated'}).

Creating a Lambda Layer for Neo4j

When you’re writing your Python code in the inline code editor of Lambda, you’ll encounter issues if you try to access any library from the neo4j package. e.g. from neo4j import GraphDatabase

To resolve this, you’ll need to upload the Neo4j package files to a new Layer in Lambda.

Creating the Neo4j Lambda Layer:

  1. Open a local Terminal window (Mac) or CMD (Windows)
  2. Create a new folder called Neo4j
  3. Navigate to that folder, then run: pip install neo4j -t .
  4. Now compress the contents of the Neo4j folder (not the directory itself) e.g. Neo4j.zip
  5. Open the AWS Lambda service via the AWS Console
  6. Select Layers, then Create Layer
  7. Upload your Neo4j.zip file
  8. Select the compatible runtimes. In our case, this all of the available Python versions
  9. Neo4j Layer Creation in Lambda

  10. Name your Layer e.g. Neo4j_v4_0_0, then click Create

With the Neo4j Lambda Layer created, we can now create our Neo4j Lambda function.

Creating the Neo4j Lambda Function:

  1. Open the AWS Lambda Service
  2. Select Functions, then Create Function
  3. Choose to Author from scratch and provide a name for your function
  4. Change the Runtime to one of the 3.x Python versions, then Create function
  5. From the Configuration tab, select Layers, then Add a layer
  6. From the Name dropdown, you should be able to select your Neo4j Layer. Version will be 1, if this is a new layer
  7. Add Layer to Lambda Function

  8. Click Add
  9. You should now be able to call from neo4j import GraphDatabase from your Lambda function without error

If you’re still seeing the error, another option is to package all your dependencies up in a zip file alongside your lambda function code, then upload this zip to Lambda.

Uploading Neo4j as a Lambda package:

  1. Open a local Terminal window (Mac) or CMD (Windows)
  2. Create a new folder called Neo4j
  3. Navigate to the new Neo4j folder, then run: pip install neo4j -t .
  4. Neo4j Pip Install

  5. Add your lambda_function.py file inside the same folder
  6. Now compress the contents of the Neo4j folder (not the directory itself)
  7. Open the AWS Lambda service via the AWS Console
  8. Select Functions, then Create Function
  9. Name the function and select the compatible runtimes. In our case, this all of the available Python versions
  10. At the Function Code window, click the Actions dropdown and select Upload a .zip file
  11. Upload your zip file
  12. Neo4j Lambda Function File Structure

  13. Save the function, then create and run a test event
  14. This should now be successful
  15. Neo4j Lambda Function Test Success

Useful Resources

  • AWS Lambda Environment Variables
  • Using AWS SSM Parameter Store with Python
  • Build a Python Lambda Deployment Package

Category iconAWS Tag iconLambda,  neo4j

About Daniel Andrews

Passionate about all things data and cloud. Specializing in Python, AWS and DevOps, with a Masters degree in Data Science from City University, London, and a BSc in Computer Science.

Footer

Recent Posts

  • How to Setup Neo4j on AWS ECS (EC2)
  • How to Setup Neo4j on AWS EC2
  • How to List AWS S3 Bucket Names and Prefixes
  • Amazon Redshift Tutorial (AWS)
  • Big O: How to Calculate Time and Space Complexity

.