Counting Sort in Python

Comparison Model

In computer science, sorting a list of items is a very basic as well as an important problem, that is discussed in some great detail. We basically start by using some of the simplest and intuitive techinques to sort items, like Bubble sort or Insertion sort, which are not very efficient. By efficiency, I mean their runtime is very large for very large list of items. These basic algorithms have a time complexity of O(n^2). Then we come across techinques like Mergesort, Heapsort and Quicksort that are slightly complex to understand, but are very efficient. They can sort very large list of items in shorter time compared to the previously mentioned algorithms. They have a time complexity of O(nlogn).

All these techniques have one thing in common, they all use the comparison model for sorting items in a list. These techniques compare items in a list with one another to decide their respective positions in the sorted list. It has been proved that the comparison model has a Ω(nlogn) lower bound for sorting and we cannot do any better using this model. So to get a better efficiency in sorting, we have to look at other techniques like counting sort that does not use the comparison model.

Counting Sort

Counting sort is a very efficient algorithm to sort a sequence of items, but all the items in the list must be non-negative integers. The time complexity of counting sort is O(n+k) where n is the number of items to be sorted and k is the largest element in the list.

Some of the properties of counting sort are:

  1. Items have to be non-negative integers, with the range of input, k, not significantly greater than the number of items to be sorted, n

  2. It is a stable sorting algorithm, i.e., if two items with same values appear in the same order in sorted output as they appear in the input list

  3. It has a O(n+k) time complexity, which is better than quick sort or merge sort

  4. It has a O(k+n) space complexity

Implementation

Let arr be the input list and k be the largest non-negative integer in the list. Counting sort requires that each of the n input items in the list range from 0 to k.

For each item in the list, counting sort determines the number of items less than it and then uses this information to place each element directly into the correct position in the output list. We will need a temporary array to store this information.

Step 1

The first step in the implementation is to determine how many items of a particular value the input list has. We will initialize a temporary list called position and initialize each element to 0. We will then iterate over the input list and increment the corresponding index in the position list for each item.

""" 
arr      => input array
position => temporary array
result   => sorted array
"""

n = len(arr)
k = max(arr) + 1

# initialize the position list
position = [0] * k

# increment index v by 1
for v in arr:
    position[v] += 1

Step 2

In the second step, for each item v in arr, we determine how many items in the list are greater than v and update the corresponding index that represents v in position list.

    s = 0
    for i in range(0, k):
        temp = position[i]
        position[i] = s
        s += temp

Step 3

In the final step, we place the items directly in it’s final sorted position in the result list using the information in the position list. Initialize the result list by None

    result = [None] * n
    for v in a:
        result[position[v]] = v
        position[v] += 1

    return result

Complete Code

def counting_sort(arr):
    """ 
    arr      => input array
    position => temporary array
    result   => sorted array
    """

    n = len(arr)
    k = max(arr) + 1

    # initialize the position list
    position = [0] * k

    # increment index v by 1
    for v in arr:
        position[v] += 1

    s = 0
    for i in range(0, k):
        temp = position[i]
        position[i] = s
        s += temp
    
    result = [None] * n
    for v in arr:
        result[position[v]] = v
        position[v] += 1

    return result

samples = [
    [4, 1, 3, 2, 3],
    [8, 0, 1, 3, 4, 10],
    [1, 1, 1],
    [10]
]

for sample in samples:
    res = counting_sort(sample)
    assert res == sorted(sample), f"incorrect result {res}"

In case you would like to get notified about more articles like this, please subscribe to my substack.



Comments

comments powered by Disqus