Intuitively Choose the Right Data Structure

Do what is right, not what is easy.

Many times you might have observed this. You struggle a lot to solve a problem, likely a hard one, while your friend is done submitting the solution and getting it accepted as well. When you ask them about how they solved it, they just say they had the intuition about it. Where does this intuition come from? It comes from solving a variety of problems, struggling to solve the hard problems but not giving up, constantly thinking of better ways to solve the problem, and, of course, looking at the other solutions to learn more. In this post, we will see some common scenarios that are generally a part of your solution for the original problem and what data structure would you use in that scenario. When you want to fasten a screw in your house, even without thinking you get a screwdriver and apply it. In the same way, it should be intuitive to you which data structure to use.

As you would have seen in my other posts, each data structure has its own strength and weaknesses. An important thing to note is that in an interview when you are solving a problem, you almost always choose a data structure for its strength. In other words, you don’t usually choose a data structure just because it has fewer weaknesses and has a good average-case performance. For example, if you want to find a maximum number frequently, priority queue/heap is most likely the best choice. If you want to store some value associated with a key, a hashmap is likely the best choice. Let’s see the strengths of each data structures with examples –

  1. An array is best for its simplicity when it comes to storing some elements. The access based on an index is O(1), so if you just want to store the elements and iterate over them, then an array is a great choice.
    1. Good:
      1. Simply store some elements – O(n) for `n` numbers.
      2. Iterate over all elements – O(n).
      3. Modify elements in place – O(1) per element.
    2. Bad:
      1. Check if some element exists in the list. It would take O(n) time.
      2. Add, remove elements – O(n) per addition/removal.
      3. Find max, min – O(n).

  2. Now say you want to store elements and need to frequently check if some element exists, the set (HashSet) can be the best choice. 
    1. Good
      1. Check if a certain number exists in the set and that is a frequent operation. O(1) per check on average.
      2. The set will naturally remove duplicates from the data.
      3. Add, remove elements at any time. O(1) per addition/removal.
    2. Bad
      1. If there are duplicates, set would keep only one copy and hence you won’t be able to tell if some number appeared multiple times in the data.
      2. Finding max, min is inefficient – O(n).

  3. What if you care about how many times an element is present but also want to frequently check if a certain element exists, map (Hashmap) might be the best choice. More importantly, when you want to store key-value pairs i.e. some data associated with elements, a map is a natural choice.
    1. Good
      1. Check if certain element exists – O(1).
      2. Know how many times each element exists – O(1).
      3. Add new elements, remove existing elements – O(1) per addition/removal.
      4. Store key-value pairs – O(1) per pair.
    2. Bad
      1. (Nothing really! And hence this data structure is the most common data structure for solving coding interview questions. However, it uses extra memory and a bad hash function can lead to worse time complexities than O(1)).
      2. If you want to frequently find max, min; map would be a wrong choice in most cases – O(n).

  4. Array, set, and map, all three are not good for frequently finding max/min in data. Here comes the priority queue to the rescue.
    1. Good
      1. From given data, the priority queue can easily find max element – O(1).
      2. Adding new elements is not too bad – O(logn).
      3. Can remove the max element efficiently – O(logn).
    2. Bad
      1. Inserting takes O(logn), so to set up a priority queue of size n, it would generally take O(nlogn) time unless you have all the data with you and converting it into a priority queue structure.
      2. Deleting any element is not supported. Only the max element can be removed.
      3. Checking if a certain element exists is not possible.
      4. Modifying some element is also not possible.
      5. Can not read elements in priority queue without removing each one.

  5. Wherever you want Last In First Out structure, a stack is a go-to choice e.g. pairing up the parenthesis, function callstack.
    1. Good
      1. O(1) insert operation, O(1) remove operation.
      2. Checking the element at the top of the stack is O(1) too.
    2. Bad
      1. Checking if a certain element exists is not possible.
      2. Can not access any element other than the top at any time. So reading the entire stack is not possible.

  6. Whenever you want to have a First In First Out structure, a queue is the best choice e.g. maintaining order requests, processing tasks.
    1. Good
      1. O(1) insert operation, O(1) remove operation.
      2. Checking the element at the front is also O(1).
    2. Bad
      1. Checking if a certain element exists is not possible.
      2. It cannot access any element other than the front at any time. So reading the entire queue is not possible.

  7. In all of the above examples, we were looking at storing the data/elements individually own their own, but what if there are connections between the elements, what if one element depends on the other? A graph is the best way to store whenever data has a relationship within itself e.g. a set of people and some of them are friends of each other, a set of cities and some of them are connected by road, a set of people & topics of interests and some people like some topics. In each of these examples, there were elements (people, cities, people & topics) and they had a connection among themselves (friendships, roads, interests, etc. ). A graph can be represented in primarily two ways –
    1. Adjacency Matrix
      1. Good
        1. Checking if a direct connection exists between two elements is O(1).
      2. Bad
        1. Checking if there is a path from one node/vertex/element the other is O(V^2) where V is the number of vertices/elements.
        2. Depth-first traversal, breadth-first traversal, the topological sort is O(V^2) too.
    2. Adjacency List
      1. Good
        1. Checking if there is a path from one node the other is O(V+E) where V is the number of vertices and E is the number of edges.
        2. DFS, BFS, topological sort are O(V+E) too.
      2. Bad
        1. Checking if there is a direct connection between two nodes is O(V+E).

  8. Similar to a graph, if you have relationships but in a hierarchical structure, a tree data structure is a natural choice. The most commonly used variation of a tree is a binary tree where each node has up to 2 children. E.g. if you want to represent an organization chart and employee-manager relationship, a tree is a great way to represent. The major advantage includes finding all nodes under a certain node, finding how nodes are connected and relationships between them (parent-child, ancestor, sibling, etc.). The most useful version of the tree is a balanced binary search tree. 
    1. Good
      1. The worst-case time complexity of the search, insert, update, delete operations on a balanced binary search tree is O(logn).
      2. Traversing the entire tree is O(n) operation.
      3. For these reasons, tree-set & tree-map are implemented as a balanced binary search trees.
    2. Bad
      1. If the tree is not balanced, it can have the worst-case time complexity of O(n) for all search, insert, update, delete operations.
      2. If the tree is not a binary search tree, search & other operations are O(n).

  9. Let’s say you want to store a lot of strings together and searching whether a string exists or not is one of the common operation, one of the best data structure to use is a hash-set. However, what if you want to find all strings that start with some prefix? If you use a hash set, you have to iterate over it which will take O(n) time and check individually if that string starts with the prefix. What if you can first filter based on the prefix instead and then list down every string that is not filtered out already! This is exactly how a Trie data structure can help. Trie data structure can store a lot of strings in a compact structure with efficient operations- search, insert, update, delete, etc. One can use a trie to implement a map where keys are strings.
    1. Good
      1. A lot of strings can be stored in a small space.
      2. Search, insert, update, delete operations are all O(1) if you assume string length to be a small constant, or O(L) where L is the length of the string.
      3. It can find all strings that start with a certain prefix efficiently compared to other data structures. If there are K strings that start with the prefix, the time complexity of the operation is O(K).
      4. Auto-completion and spell checker are two major applications of a trie.
    2. Bad
      1. If the number of strings is small, it can use a lot more energy than using other data structures.
      2. Limited use cases. 

Summary

We discussed how various data structures are good at various applications and you must have understood how important it is to know each data structure. Without knowing the best data structure for the task, you won’t be able to write the most optimal solution. Before your coding interview, certainly learn each data structure well. If you have never implemented your own stack, queue, or a graph – I strongly recommend getting that experience. When you get your hands dirty, you understand the thing well, clearly, and in a way that you will never forget.

I hope that this post will be useful to you to make a mental map of what data structure to use for what application. Let me know your thoughts and if I missed anything in the comments below. 

As always, don’t forget to like & share the post, and subscribe to the blog.

Happy Coding!


Related Posts


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s