• Skip to primary navigation
  • Skip to main content
  • Skip to footer

In Out Code

Your comprehensive guide to optimized code

  • Data Structures
  • Python
    • Control Flow
    • HackerRank
    • Input and Output
    • Modules
  • AWS

Graphs Data Structure Guide

You are here: Home / Data Structures / Graphs Data Structure Guide

August 24, 2019 by Daniel Andrews

What is a Graph?

A graph is an ordered set of paired vertices and a set of edges, used to create an interconnected network.

Edges can be directed or undirected. Weight can be applied to an edge to represent relative cost of moving between vertices (nodes).

This guide walks you through everything you need to know about the graph data structure, with clear, commented code samples.

  • Pros and cons of graphs
  • Types of graphs
  • Graph data structures
  • Big O space and time complexity
  • Graph search: BFS and DFS
  • How to code a Graph in Python
  • Coding Breadth First Search in Python
  • Coding Depth First Search in Python

Pros and Cons of Graphs

Pros:

  • Great for representing links and relationships. e.g. Facebook friendships or roads connecting cities.

Cons:

  • Most graph algorithms scale poorly. Big O time complexity of O(n log n).

Types of Graphs

  • Directed: One-way arrow connecting the nodes
  • Undirected: Two-way arrow connecting the nodes, or no arrows
  • Connected: Path exists between every pair of vertices
  • Cyclic: Graph contains a cycle
  • Acyclic: Graph without cycles
  • Weighted: Each edge has a weight
  • Unweighted: No weights have been applied to the edges

Graph Data Structures

There are three main graph data structures:

  1. Edge Lists
  2. Adjacency Matrices
  3. Adjacency Lists

Edge Lists

  • Simple to implement, but slow to search for an edge. O(E) time.
  • Take every edge in a graph and represent in array
  • e.g. [0,1] for the edge from vertices 0 to vertices 1

Sample edge list in Python:

 
graph = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [4, 6], [5, 1], [5, 2]]

Adjacency Matrices

  • An NxN boolean matrix, where N is number of nodes
  • Edges are represented by ‘true’ value
  • If graph is undirected, matrix will be symmetric
  • Fast O(1) lookup, but not space efficient
  • Only really useful when almost all nodes connected to all nodes
  • Represented in a matrices

Sample adjacency matrix (directed) in Python:

 
  graph = [
    [0, 1, 0, 0, 1, 0],
    [0, 0, 1, 0, 1, 0],
    [0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0]
]

Sample adjacency matrix (undirected) in Python:

 
  graph = [
    [0, 1, 0, 0, 1, 0],
    [1, 0, 1, 0, 1, 0],
    [0, 1, 0, 1, 0, 0],
    [0, 0, 1, 0, 1, 1],
    [1, 1, 0, 1, 0, 0],
    [0, 0, 0, 1, 0, 0]
]

Adjacency List

An adjacency list is the most common way to represent a graph. A hash table data structure is used, whereby each key is a vertex, and the value is the vertices it’s connected to (usually in the form of a list).

Properties of an adjacency list:

  • Fast lookup, easy to find neighbours
  • Each vertex (node) stores a list of adjacent vertices
  • Can also be represented using two classes: Node and Graph
  • e.g. Vertex 0 has edges pointing to 1 and 4, so, 0 -> [1, 4]

Sample adjacency list in Python:

 
  graph = {
    1: [2, 5],
    2: [1, 5],
    3: [2, 4],
    4: [3, 5, 6],
    5: [1, 2, 4, 6],
    6: [4, 5],
}


Big O Space and Time Complexity

In the majority of cases, the best option will be to use an adjacency list. The main advantage being the efficiency of the O(V) lookup for a vertex. They’re also incredibly space efficient.

Graph StructureStorageAdd VertexAdd EdgeRemove VertexRemove EdgeQuery
Adjacency ListO(V + E)O(1)O(1)O(V + E)O(E)O(V)
Edge ListO(V + E)O(1)O(1)O(E)O(E)O(E)
Adjacency MatrixO(V²)O(V²)O(1)O(V²)O(1)O(1)

* V = No. of vertices. E = No. of edges.


Graph Search: BFS and DFS

Two most common ways to search a graph are breadth-first search (BFS) and depth-first search (DFS).

This section will cover how to implement the following in Python:

  1. A graph
  2. Breadth First Search
  3. Depth First Search

Which is better, DFS or BFS?

Depth First:

  • Simple to implement than BFS when using recursion
  • Better when the item searched for is faraway
  • e.g. Simulating chess to end of game combinations

Breadth First:

  • BFS always finds the shortest path, given an undirected and unweighted graph
  • Often requires more memory than a DFS, due to storing all neighbours
  • Better when the item searched for is near the top
  • e.g. Social networks

Depth-first Search

From a start node (often the root), visit the first child (node one step away). Now visit the first child of this child (node two steps away from start node). Continue until you reach the end of this path. Go back to the start node, and repeat the process for each of its other child nodes until all nodes are visited.

Steps to implement in code:

  1. Use a stack (LIFO)
  2. For the first node, add it to stack, then immediately pop it off
  3. Add first node’s direct neighbours to the stack, in any order
  4. Mark them as ‘visited’ in the graph
  5. Pop the first of these neighbours off the stack
  6. Add its unvisited neighbours to the stack, and mark them as ‘visited’
  7. Pop one of these neighbours off the stack, and mark it as visited
  8. Repeat until you reach a node with no unvisted nodes connected to current node
  9. Pop remaining nodes off the stack, adding and popping their neighbours as you go

Breadth-first Search

From a start node (often the root), visit all the immediate children (nodes one step away). Now visit the children of the children (nodes two steps away from start node). Continue until you reach the end of the graph (visited all nodes).

The time complexity of BFS is O(V + E), where V is the number of nodes and E is the number of edges.

Steps to implement in code:

  1. Use a queue (FIFO)
  2. For the first node, add it to queue, then immediately pop it off
  3. Add first node’s direct neighbours to the queue, in any order
  4. Mark them as ‘visited’ in the graph
  5. Pop the first of these neighbours off the queue
  6. Add its unvisited neighbours to the queue, and mark them as ‘visited’
  7. When hit a node in the queue that has all its neighbours marked as ‘visited’, pop the next one off the queue
  8. Repeat until all nodes popped off the queue and marked as ‘visited’

Useful resources:

  • MIT Algorithms and Data Structure Course

Bidirectional Search

Bidirectional search can be used to find the shortest path between a source node and destination node.

This process requires two simultaneous breadth-first searches, one using the source node as its starting point, and the other using the destination node. When their searches collide, this is the path.


Coding a Graph in Python

Before we can traverse a graph, we first need to create it. The following code allows you to create a graph that follows the adjacency list structure.

Basic operations are:

  • add_vertex: Creates a new vertex.
  • add_edge: Creates an edge between two vertices.
  • show_vertices: Displays a list of all vertices in the graph.
  • show_edges: Displays a list of tuples representing edges between vertices.
  • find_path: Finds a path between a start vertex and end vertex.

 
class Graph:

    def __init__(self, graph_dict = {}):
        """Initializes a graph object.
        
        If no graph exists, a new, empty dictionary will be used.
        
        Attributes:
            graph_dict: A Python dictionary (hash table) to hold new items.
        """
        self.graph_dict = graph_dict

    def add_vertex(self, vertex):
        """Adds a new vertex to the graph, if it doesn't already exist.
        
        A new vertex is added to the graph, with no connected edges.
        If the vertex already exists, raises error.
        
        Args:
            vertex: The new vertex being added to the graph.
            
        """
        if vertex not in self.graph_dict:
            self.graph_dict[vertex] = []
        else:
            raise ValueError("Vertex already exists in graph.")
               
    def add_edge(self, start, end): 
        """Adds a new edge to the graph, if it doesn't already exist.
        
        If the start node doesn't already exist, it will be created.
        If the end node doesn't already exist, it will be created.
        If the edge doesn't exist, it will be created.
        
        Args:
            start: The start vertex for the edge.
            end: The end vertex for the edge.
            
        """
        if end not in self.graph_dict:
            self.add_vertex(end)
            
        if start not in self.graph_dict:
            self.graph_dict[start] = [end]
        else:
            self.graph_dict[start].append(end)

    def show_vertices(self):
        """Get a list of vertices in the graph.
        
        Returns:
            Python list of all vertices in the graph.
        
        """
        return list(self.graph_dict.keys())                                

    def show_edges(self):
        """Display each graph edge as a tuple. (node, neighbour)."""
        for node in self.graph_dict:
            for neighbour in self.graph_dict[node]:
                print("(",node,", ",neighbour,")")

    def find_path(self, start, end, path = []):
        """Search for a path between two vertices in the graph.
        
        Args:
            start: Starting vertex.
            end: End vertex.
            path: Python list to store the sequence of vertices in a path
                  between the start vertex and end vertex.
                
        Returns:
            A list containing the sequence of vertices from start vertex to
            end vertex, if a path exists.
            
        """
        path = path + [start]   
        if start == end:
            return path

        for node in self.graph_dict[start]:
            if node not in path:
                newPath = self.find_path(node, end, path)

            if newPath:
                return newPath

            return None


Coding Breadth First Search in Python

For the following implementation of BFS, we will be using an adjacency list to represent the graph.

 
# Create an adjacency list as our graph structure
graph = {1: [2, 3],
         2: [4, 5],
         3: [5],
         4: [6],
         5: [6],
         6: [7],
         7: []}

def bfs(graph, start):
    """Performs a Breadth First Search (BFS).
    
    Adds starting vertex to queue, then immediately pops it off.

    Loops through each neighbour of each vertex, adding them to the
    set of visited vertices, before moving on to the next vertex.

    Next vertex to check is always taken from the front of the queue.

    Args:
        graph: The graph.
        start: The starting vertex for the search.

    Returns:
        Python list of vertices in the order they were traversed.
    
    """
    visited = set()
    queue = [start]   

    visited.add(start)

    while queue:
        vertex = queue.pop(0)
        
        for neighbour in graph[vertex]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(neighbour)

    return visited

bfs(graph, 1) # {1, 2, 3, 4, 5, 6, 7}


Coding Depth First Search in Python

For the iterative implementation of DFS found below, we will be using an adjacency list to represent the graph.

 
# Create an adjacency list as our graph structure
graph = {1: [2, 3],
         2: [4, 5],
         3: [5],
         4: [6],
         5: [6],
         6: [7],
         7: []}

def dfs(graph, start):
    """Performs a Depth First Search (DFS).

    Adds starting vertex to stack, then immediately pops it off.

    Loops through each neighbour of a vertex, adding them to the
    stack. Once all neighbours are on the stack, they are popped off
    and added to the path. Repeats for each vertex.

    Next vertex to check is always taken from the top of the stack.

    Args:
        graph: The graph.
        start: The starting vertex for the search.

    Returns:
        Python list of vertices in the order they were traversed.
    
    """
    path = []
    stack = [start]

    while stack:
        vertex = stack.pop()
        
        if vertex in path:
            continue

        path.append(vertex)

        for neighbor in graph[vertex]:
            stack.append(neighbor) 

    return path

dfs(graph, 1) # [1, 3, 5, 6, 7, 2, 4]

Category iconData Structures Tag iconData Structure Guide,  Graphs

About Daniel Andrews

Passionate about all things data and cloud. Specializing in Python, AWS and DevOps, with a Masters degree in Data Science from City University, London, and a BSc in Computer Science.

Footer

Recent Posts

  • How to Setup Neo4j on AWS ECS (EC2)
  • How to Setup Neo4j on AWS EC2
  • How to List AWS S3 Bucket Names and Prefixes
  • Amazon Redshift Tutorial (AWS)
  • Big O: How to Calculate Time and Space Complexity

.