Faster Lists In Python

See what list actions are the fastest in Python to speed up your programs.

Faster Lists In Python
Source: Inverse.com

Python lists are beautifully dynamic data structures. They are the choice data structure for so many things, so it is of the utmost importance to be conscious of the speed of each operation. This is commonly referred to as the Big O or time complexity of the solution.

List Creation

There are many ways to create arrays in python so let's take a look at which take the least time. Here is the setup we used:

import random
from time import time
import numpy as np
import pandas as pd
# Initilize Random
random.seed(time())
#### List Tests ####
# How many test passes to run
num_tests = 25
# Create a dict for which to store tests
data = {'length':[], 'listComprehension': [], 'append': [], 'preAllocate': [], 'while': []}
for i in range(0, num_tests):
# Randomize list length
end = random.randint(100_000, 10_000_000)
startTime = time()
listComprehension = [i for i in range(0, end)]
endTime = time()
# Add timed entry
tmpdata['listComprehension'].append(endTime - startTime)
# Get the median for all the rows
df = pd.DataFrame(data)
print(df.describe())

List Comprehensions are the first method for creating arrays. They are simple and easy to read which makes them just as easy to write.listComp = [i for i in range(0, end)]

Append is the next method. I have a tendency to jump to this solution as it is one of the easier to use. It is similar to how you add to lists in most other languages.append_list = []
for i in range(0,end):
 append_list.append(i)

Pre-Allocating is the final method we will test. This only works for arrays with a predetermined length. This involves creating all of the array objects beforehand and then modifying their values by index.preAllocate = [0] * end
for i in range(0, end):
 preAllocate[i] = i

Results: While list comprehensions don’t always make the most sense here they are the clear winner. I am not recommending you try and jam every for loop into a comprehension. Where it makes sense is when you are using simple for loops or creating especially large arrays.Method:            Median Time:
listComprehension  0.5556 ms
preAllocate        1.1466 ms
append             1.3842 ms

Comparing Lists

The test for this was similar to the one above. We created two lists with 50,000 random numbers between 0 and 50,000. Then we compare the two lists to see if

Double For Loop: This should have a time complexity of O(n²). For each element, we will have to iterate over part or all of the second list.for i in list_a:
 for j in list_b:
   if i == j:
     break

In Search: The time complexity of ‘in’ is confusing. Knowing that the double for loop executes in O(n²) time, it would make sense for the ‘in’ access to operate in a similar manner. This does not seem to be the case. It seems to operate in closer to O(n log n) time. If someone happens to be familiar with the ‘in’ implementation in CPython leave a comment below and I will include the answer here.for i in list_a:
 if i in list_b:
   continue

Set Search time complexity is a little different. The implementation of set in Python is essentially that of a hash table so it has O(1) access. Therefore because we are going through the list one time and checking in the second list is an O(1) operation the set search should operate in O(n) time.unique = set(list_b)
for i in list_a:
 if i in unique:
   continue

Results: Comparisons with sets are great. For anyone wondering, the double for loop was one I never expected to do well. What did surprise me was just how much less time using the ‘in’ operator took than the double for.Method:       Median Time:   Big O:
Set Syntax:   0.0046ms       O(n)
In Syntax:    3.4710ms       O(n log n)
Double For:   42.3418ms      O(n^2)

Sorting Lists

For our last section, we will talk about sorting lists in python. There are many common sorting algorithms we will compare. To see a more full list of sorting algorithms check out StackAbuse for more.

When talking about sorting we need to remember that there are stable and unstable sort algorithms. An algorithm is stable if it preserves the original order of equal items. Here is an example of why stability may be important:

Stability Example

Say we have a bug in our code and we want to see what went wrong. We might go to the log file and sort the requests by the user. Here is our sample log file:user10: { 'request_number': 2 }
user11: { 'request_number': 1 }
user10: { 'request_number': 1 }

In an unstable sort we might end up with something like this, where the requests are sorted by the user, but because it is unstable we have lost the original sequence of the events. We might never find our bug because we wouldn't be able to see that user10 had request 2 reach the server before request 1.# Unstable Sort Result
user10: { 'request_number': 1 }
user10: { 'request_number': 2 }
user11: { 'request_number': 1 }# Stable Sort Result
user10: { 'request_number': 2 }
user10: { 'request_number': 1 }
user11: { 'request_number': 1 }

Heapsort has a typical time complexity of O(n log n). This is not a stable sorting algorithm however so there is no guarantee that items will be in their original order. This algorithm works by placing items on a binary tree where the root node is the maximum value. Then it will deconstruct that tree to form the sorted array.

# Courtesy of https://stackabuse.com/sorting-algorithms-in-python/
def heapify(nums, heap_size, root_index):
# Assume the index of the largest element is the root index
largest = root_index
left_child = (2 * root_index) + 1
right_child = (2 * root_index) + 2
# If the left child of the root is a valid index, and the element is greater
# than the current largest element, then update the largest element
if left_child < heap_size and nums[left_child] > nums[largest]:
largest = left_child
# Do the same for the right child of the root
if right_child < heap_size and nums[right_child] > nums[largest]:
largest = right_child
# If the largest element is no longer the root element, swap them
if largest != root_index:
nums[root_index], nums[largest] = nums[largest], nums[root_index]
# Heapify the new root element to ensure it's the largest
heapify(nums, heap_size, largest)
def heap_sort(nums):
n = len(nums)
# Create a Max Heap from the list
# The 2nd argument of range means we stop at the element before -1 i.e.
# the first element of the list.
# The 3rd argument of range means we iterate backwards, reducing the count
# of i by 1
for i in range(n, -1, -1):
heapify(nums, n, i)
# Move the root of the max heap to the end of
for i in range(n - 1, 0, -1):
nums[i], nums[0] = nums[0], nums[i]
heapify(nums, i, 0)
view raw heapSort.py hosted with ❤ by GitHub

Mergesort is a divide and conquer algorithm that has a typical time complexity of O(n log n). This algorithm separates the list in half and those halves in half until there are individual elements left. Then the elements are paired up, sorted, and merged repeatedly until the list is complete again. This algorithm is stable so it is great where that is necessary.

# Courtesy of https://stackabuse.com/sorting-algorithms-in-python/
def merge(left_list, right_list):
sorted_list = []
left_list_index = right_list_index = 0
# We use the list lengths often, so its handy to make variables
left_list_length, right_list_length = len(left_list), len(right_list)
for _ in range(left_list_length + right_list_length):
if left_list_index < left_list_length and right_list_index < right_list_length:
# We check which value from the start of each list is smaller
# If the item at the beginning of the left list is smaller, add it
# to the sorted list
if left_list[left_list_index] <= right_list[right_list_index]:
sorted_list.append(left_list[left_list_index])
left_list_index += 1
# If the item at the beginning of the right list is smaller, add it
# to the sorted list
else:
sorted_list.append(right_list[right_list_index])
right_list_index += 1
# If we've reached the end of the of the left list, add the elements
# from the right list
elif left_list_index == left_list_length:
sorted_list.append(right_list[right_list_index])
right_list_index += 1
# If we've reached the end of the of the right list, add the elements
# from the left list
elif right_list_index == right_list_length:
sorted_list.append(left_list[left_list_index])
left_list_index += 1
return sorted_list
def merge_sort(nums):
# If the list is a single element, return it
if len(nums) <= 1:
return nums
# Use floor division to get midpoint, indices must be integers
mid = len(nums) // 2
# Sort and merge each half
left_list = merge_sort(nums[:mid])
right_list = merge_sort(nums[mid:])
view raw mergeSort.py hosted with ❤ by GitHub

Quicksort is one very commonly used algorithm as it is easy to implement and has an average time complexity of O(n log n). The problem with this is everything is based around picking a good pivot point that is already in the proper place and sorting around it. Picking a poor pivot point could lead to an O(n²) time complexity. This tied with the fact that it is not stable leads me to not recommend it.

# Courtesy of https://stackabuse.com/sorting-algorithms-in-python/
def partition(nums, low, high):
# We select the middle element to be the pivot. Some implementations select
# the first element or the last element. Sometimes the median value becomes
# the pivot, or a random one. There are many more strategies that can be
# chosen or created.
pivot = nums[(low + high) // 2]
i = low - 1
j = high + 1
while True:
i += 1
while nums[i] < pivot:
i += 1
j -= 1
while nums[j] > pivot:
j -= 1
if i >= j:
return j
# If an element at i (on the left of the pivot) is larger than the
# element at j (on right right of the pivot), then swap them
nums[i], nums[j] = nums[j], nums[i]
def quick_sort(nums):
# Create a helper function that will be called recursively
def _quick_sort(items, low, high):
if low < high:
# This is the index after the pivot, where our lists are split
split_index = partition(items, low, high)
_quick_sort(items, low, split_index)
_quick_sort(items, split_index + 1, high)
_quick_sort(nums, 0, len(nums) - 1)
view raw quickSort.py hosted with ❤ by GitHub

Timsort is a sort that you may not have ever heard of. This sort, named for its inventor Tim Peters, is a combination of a binary search, insert sort, and a merge sort. Its average time complexity is O(n log n) however it is able to hit this with better consistency and has a best-case complexity of O(n). Also, this is a stable algorithm which makes it great to use anywhere.

The best part is using Timsort in python only takes one line. That's right. It is the built-in sort function.array.sort()

Results: Timsort outperforms all the others which is why it is the core sorting algorithm for python. Next Mergesort and Quicksort are neck and neck. Finally, Heapsort is pulling up the rear… well, the rear of this list at least. There are plenty of worse sorting algorithms such as random sort.Algorithm:   Time:          
timsort      0.001594 ms
mergesort    0.033845 ms
quicksort    0.097097 ms
heapsort     0.365555 ms

Here is a chart that shows the Big O of all of these sorting algorithms. Also, we have space complexity which is how much extra space is needed to complete the algorithm.

That is all for this list. Now go forth and use these in your day to day Python and follow along for more tips and tricks.