Exploring Neo4j Using Python - The Graph DBMS
I offer an immense pleasure for you showing up here. Being a computer science devotee proficient in Microservices Development, Cloud Computing & Data Engineering, I have always looked forward in learning, sharing & gaining knowledge with overwhelming support of brilliant minds like you. With my deep regards, I hope this article will be able to reach out to the desire you wish to read ahead. Thankyou!
— Kshitiz
Kudos for putting up the first step in learning Neo4j concept! In this article we will be covering a beginner lesson towards a Graph DBMS i.e. Neo4j passing through the following agenda :
- Understanding Graph DBMS
- A brief about Neo4j
- Prerequisites & Installations
- Comparing RDMS, NoSQL DBMS & Graph DBMS
- Creating Project, Graph DBMS & Database
- Creating nodes and relationships
- Specifying relationship while creating nodes
- Creating a new Neo4j user
- Connecting Neo4j using Python
- Designing API to query multiple Neo4j databases
Understanding Graph DBMS
Graph DBMS, also called graph-oriented DBMS or graph database, represent data in graph structures as nodes and edges, which are relationships between nodes. They allow easy processing of data in that form, and simple calculation of specific properties of the graph, such as the number of steps needed to get from one node to another node.
Graph DBMSs usually don’t provide indexes on all nodes, direct access to nodes based on attribute values is not possible in these cases. Most popular graph DBMSs are:
- Neo4j
- Microsoft Azure Cosmos DB
- ArangoDB
- Virtuoso
- OrientDB
A brief about Neo4j
Neo4j is a scalable, ACID-compliant graph database designed with a high-performance distributed cluster architecture, available in self-hosted and cloud offerings. It delivers graph technology that has been battle tested for performance and scale ground-up. The high-performance distributed cluster architecture of Neo4j enables customers to run the most challenging OLTP and data science workloads, while preserving ACID compliance and data integrity. With Neo4j, customers get the freedom of choice to deploy in a self-hosted, hybrid, or multi-cloud platform. Some of the application sectors of Neo4j are:
- Real-Time Recommendations
- Master Data Management
- Identity and Access Management
- Network and IT Operations
- Fraud Detection
- Anti Money Laundering / Tax Evasion
- Graph-Based Search
- Knowledge Graphs
- Graph Analytics and Algorithms
- Graph-powered Artificial Intelligence
- Smart Homes
- Internet of Things / Internet of Connected Things
Additionally, Neo4j is compatible with all topmost frequently used programming languages in development:
- .NET
- Clojure
- Elixir
- Go
- Groovy
- Haskell
- Java
- JavaScript
- Perl
- PHP
- Python
- Ruby
- Scala
Prerequisites & Installations
Though Neo4j is quite structured in itself a way that is very friendly to understand, but it’s always good to have an understanding of Database systems & Graph Theory.
Having a good knowledge of Java & Spring Framework would be a cherry on the cake!
To start with Neo4j, we need to have an IDE setup at our local environment & in this case we will be using Neo4j Desktop which can be easily downloaded from here.
Since I am using Windows, I have downloaded the latest desktop version which was available for my OS. Upon downloading, installing & launching, you should be able to see the following screen with all checks getting passed.
Comparing RDMS, NoSQL DBMS & Graph DBMS
It is always good to compare and relate the concepts that we have learned so far with the new ones coming up. Moreover, it helps us better to visualize the overall picture and understand how the technology has evolved over past years.
Here is a tabular comparison of the terminology that differs in each DBMS and the purpose of their functionality implemented:
Creating Project, Graph DBMS & Database
To begin with Neo4j, we need to create a project before we implement anything else. In our problem statement, we have created a project named ‘Netflix’.
Inside a project, there can be multiple Graph DBMSs but only one can be active at a time. Here we have created two DBMSs under Netflix, named Movies, & Web Series.
For each DBMS created, a default ‘neo4j’ database is created. Additionally, we can create multiple databases under a DBMS. Likewise, we will be creating two databases under Movies, named bollywood & hollywood.
Make sure that the database names are all small letters and without any special character. Here is the snapshot of Neo4j after above explained setup has been completed:
Creating nodes and relationships
Now since we have our database ready, we can open the database using Neo4j browser which is automatically opened while entering inside a DBMS.
The programming language used in Neo4j is Cypher
To create our first node in hollywood database, here is the cypher script we will use:
CREATE (
: animation {
movie_title: 'Kung Fu Panda',
release_year: 2008
}
)
In the above script, ‘animation’ is the node label and the JSON data is the properties of that node. Similarly to create our second node, the following script is used:
CREATE (
: animation {
movie_title: 'Kung Fu Panda 2',
release_year: 2011
}
)
The above scripts after execution will create two nodes with property-specific data under a single node label i.e. animation. Here is the browser snapshot for the same:
And now we are good to define relationship among these nodes. Since this is an example of movies, I would be creating a relationship named ‘sequel’ for these nodes. Here is the cypher script to create a relationship:
MATCH
(x: animation),
(y: animation)
WHERE x.movie_title = 'Kung Fu Panda 2'
AND y.movie_title = 'Kung Fu Panda'
CREATE (x)-[: sequel]->(y)
This command will match the respective nodes and create a relationship among the nodes as defined:
Specifying relationship while creating nodes
It’s not necessary to create a relationship only after the nodes have been created. We can create both in one shot!
Here is an example of creating a node along with the relationship with an already created node:
MATCH
(x: animation)
WHERE x.movie_title = 'Kung Fu Panda 2'
CREATE (
: animation {
movie_title: 'Kung Fu Panda 3',
release_year: 2016
}
)-[: sequel]->(x)
Similarly, it is also possible to create both the nodes in a single query along with the relationship associated among them:
CREATE (
: animation {
movie_title: 'The Angry Birds Movie 2',
release_year: 2019
}
)
-[: sequel]->(
: animation {
movie_title: 'The Angry Birds Movie',
release_year: 2016
}
)
The above cypher query will create two nodes, and then refer ‘The Angry Birds Movie 2’ as the ‘sequel’ of ‘The Angry Birds Movie’.
Also, we can create nodes in a different node labels in a single query along with the relationship associated among them:
CREATE (
: thriller {
movie_title: 'Inception',
release_year: 2010
}
)
-[: same_lead_actor]->(
: romance {
movie_title: 'Titanic',
release_year: 1997
}
)
Creating a new Neo4j user
The Neo4j DBMS provides a default username & password i.e. ‘neo4j’ & ‘password’.
But in order to ensure security and making an isolated access of the database, it is always recommended to create a new credential for code connectivity. User can be created in Neo4j browser itself and here I have created a new user with the username as ‘test’ and have assigned it the ‘admin’ & ‘public’ role.
We have this option of Force Password Change also but I have skipped that part for simplicity as of now. Once the user is created, we can see it in the user list also:
Connecting Neo4j using Python
Now since we have got familiar with the fundamental concept of Graph DBMS, we are good to proceed for our actual problem statement i.e. connecting and interacting Neo4j using Python.
Python Package Required:
pip3 install neo4j
This is the only package required to make python ready to connect with Neo4j database. Connecting Neo4j using python is only a matter of couple of lines:
from neo4j import GraphDatabaseneo4j_session = GraphDatabase.driver("neo4j://localhost:7687", auth=("test", "password"))
The Neo4j DBMS runs on ‘localhost:7687’ by default. All we need to specify are the correct user credentials which we have created in previous step. Now the ‘neo4j_session’ is pointing to the complete DBMS. But in order to refer a particular database inside it, we need to create a database reference:
db = neo4j_session.session(database="hollywood")
Now the database reference is ready, lets develop a microservice which will run queries on our database!
Designing API to query multiple Neo4j databases
To start with, we will first create a flask server template and microservice structure enabling the API to be driven through postman:
Python Packages:
Flask==1.1.2
Flask-API==2.0
Flask-Cors==3.0.9
Werkzeug==1.0.1
Basic codebase to setup flask server:
from flask import Flask, request, jsonify
from flask_cors import CORSapp = Flask(__name__)
CORS(app)# add neo4j connection@app.route("/add-movie", methods=["POST"])
def add_movie():
data = request.get_json()
# add movie details in Neo4j database
return jsonify({'status': 'Success'}), 200if __name__ == "__main__":
app.run(host='0.0.0.0', port='8080')
Now is the time to embed our Neo4j connection codebase in the above script. The API request will have four parameters:
- Movie Title
- Movie Genre
- Movie Type
- Release Year
Based on the provided data, the API should be able to create nodes in the correct database. CREATE query will be used to insert data in the databases.
Here is the complete API code once the above requirement is implemented:
from flask import Flask, request, jsonify
from flask_cors import CORS
from neo4j import GraphDatabaseapp = Flask(__name__)
CORS(app)neo4j_session = GraphDatabase.driver("neo4j://localhost:7687", auth=("test", "password"))@app.route("/add-movie", methods=["POST"])
def add_movie():
data = request.get_json()
if data['movie_type'] == "hollywood":
db = neo4j_session.session(database="hollywood")
else:
db = neo4j_session.session(database="bollywood")
properties = "{movie_title: '" + data['movie_title'] + "', release_year: " + str(data['release_year']) + "}"
query = "CREATE (:{movie_genre} {properties})"
query = query.format(movie_genre=data['movie_genre'], properties=properties)
db.run(query)
return jsonify({'status': 'Success'}), 200if __name__ == "__main__":
app.run(host='0.0.0.0', port='8080')
Testing the implementation through Postman:
Neo4j browser shows the created nodes in correct databases:
References
Thanks again for spending those precious couple of minutes of yours reading the article. For any query or suggestion please feel free to reach out: