Union Find in Python

alogrithms python

Introduction

According to Wikipedia, dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a tree. Once the the dynamic connectivity structure is built, it should adapt itself such that it can give quick answers to queries of the form “is there a path between x and y?”. The Union-Find algorithm is one approach build this connectivity structure. There are multiple approaches the Union-Find algorithm, the most optimal approach is the Weighted Quick Union algorithm. In this algorithm, we maintain an array id[] to determine whether two indexes are connected. The connections between elements (or nodes) can be considered to be a tree. Now let’s define how we use the array id[] to solve the Quick-Union problem:

For the element i, id[i] gives the immediate parent of i in the tree.

For the element i, follow the parent of i until id[x] == x. Here, x is considered the root of i.

A path exists between x and y if root of x and root of y is the same.

In Quick-Union algorithm, we have two methods:

  • Union(p, q): This method will establish a connection between elements p and q in the tree, i.e., it will set q as the parent of p in the array id[]
  • Find(p): This method will find the root of the element p

In order to implement this algorithm, let’s consider a real problem where you want to merge email accounts that might belong to the same person. Two accounts definitely belong to the same person if there is some common email to both accounts. Note that even if two accounts have the same name, they may belong to different people as people could have the same name. A person can have any number of accounts initially, but all of their accounts definitely have the same name. The input to this problem will look this:

[
	['Alice', 'alice@example.com', 'alice_wonderland@example.com'],
	['Bob', 'bob@example.com', 'bob_marley@example.com'],
	['Alice', 'alice@mail.com', 'alice@example.com'],
	['Alice', 'alice@mail.com'],
]

The solution expected is to merge the email accounts that belong to the same person, which means, the output for the above example is:

[
	['Alice', 'alice@example.com', 'alice@mail.com', 'alice_wonderland@example.com'],
	['Bob', 'bob@example.com', 'bob_marley@example.com'],
]

We can use the weighted Quick-Union algorithm here to solve the problem. We can establish a dynamic connectivity structure between the email accounts and figure out what accounts belong to the same person. The Union and Find method both have a run time complexity of $logn$ where n is the number of elements or nodes in the structure.

In order to optimize the Quick-Union algorithm, when using Union method, always put the smaller tree (with lesser size or weight) lower than the larger tree. This ensures that the tree doesn’t get too tall, thereby reducing the depth of the tree. Note that size of the tree here indicates the number of nodes in the tree.

A second optimization is in Find method, where we flatten the tree using path compression, every node we visit can be made to point to it’s grandparent.

Here is the complete code, with comments, that implements Weighted Quick Union algorithm to solve merging of email accounts problem:

# quick_union/main.py

class MergeAccounts:
    def __init__(self, accounts):
        self.id = {} # the id dict that holds the dynamic connectivity structure
        self.owner = {}  # tells us the owner (name) of an email account
        self.size = {} # stores the size or weight of the tree
        self.accounts = accounts

		# We initialize the id, owner and the size dicts here
        for account in accounts:
            name = account[0]
            for email_id in account[1:]:
                self.owner[email_id] = name
                self.id[email_id] = email_id
                self.size[email_id] = 1

    def merge_accounts(self):
        for account in self.accounts:
            p = account[1]
            for q in account[2:]:
                self.Union(p, q)

        temp = {}
        for k in self.id:
            v = self.Find(k)
            temp.setdefault(v, []).append(k)

        merged = []
        for k, v in temp.items():
            name = self.owner[k]
            merged.append([name] + sorted(v))

        return merged

    def Find(self, i):
        while i != self.id[i]:
            # optimization: path compression, i.e. point node to it's grandparent.
            # keeps the tree completely flat.
            self.id[i] = self.id[self.id[i]]
            i = self.id[i]
        return i

    def Union(self, p, q):
        i = self.Find(p)
        j = self.Find(q)
        if i == j:
            return
        # optimization: always put the smaller tree lower, i.e. root of smaller tree is bigger tree.
        if self.size[i] < self.size[j]:
            self.id[i] = j
            self.size[j] += self.size[i]
        else:
            self.id[j] = i
            self.size[i] += self.size[j]

if __name__ == '__main__':
    accounts = [
        ["Alice", "alice@example.com", "alice_wonderland@example.com"],
        ["Bob", "bob@example.com", "bob_marley@example.com"],
        ["Alice", "alice@mail.com", "alice@example.com"],
        ["Alice", "alice@mail.com"],
    ]
    m = MergeAccounts(accounts)
    print(m.merge_accounts())

The output from the above code is:

$ python3 quick_union/main.py
[['Alice', 'alice@example.com', 'alice@mail.com', 'alice_wonderland@example.com'], ['Bob', 'bob@example.com', 'bob_marley@example.com']]

The full source code is available in the Github repository https://github.com/abvarun226/blog-source-code/tree/master/union-find-in-python

I hope this was helpful and if you find any errors or issues in this post, please do let me know in the comments section below.

In case you would like to get notified about more articles like this, please subscribe to my substack.

Get new posts by email

Comments

comments powered by Disqus