# Data Structures

Data structure – a data structure is a way to structure data. There’s more to data than just simple numeric types and strings, as those are limited in their usefulness. In college, there are often computer science classes that teach data structures and algorithms, because an algorithm is a set of steps to solve a problem, and solving a problem might require the use of more advanced data structures as opposed to only using primitive things. So it’s good not only to understand what the individual data structures are, but you also need to know how and why you’d use them, and the pros and cons of using one over another.

Tree – a tree is a data structure where there are a bunch of nodes, with different levels, with nodes that are either parents or children (depending on how you are looking at it). A binary tree is a very commonly used type of tree data structure, where each parent node can only have two child nodes. This is often associated with a searching algorithm called binary search, which has logarithmic time complexity because each subsequent level reduces the search space by half. In plain English, that means a binary tree is very fast for searching.

Here is an example of a basic binary search: I am thinking of a number between 1 and 128. If you guess incorrectly, I will tell you that your answer is either too high or too low. What is the fastest way to find the answer?

You start by guessing 64. Then, if it’s 64, you’re done. But if it’s too high, you guess 32. Alternatively, if it’s too low, you guess 64+32, or 96.

So let’s say in this case it was less than 64, so your next guess is 32. Then 32 is too low, so you go in between 32 and 64, or 32+16=48. Then 48 is too high, so you guess 40. 40 is too low, so you guess 44. That’s too low, so you guess 46. However, that’s too high, so the only possible guess is 45, which is right.

64 (too high)

32 (too low)

48 (too high)

40 (too low)

44 (too low)

46 (too high)

45 (correct)

Notice how guesses are in between one another. That’s not a coincidence.

There were 128 possibilities, and you could guess it with a maximum of only 7 guesses. Much less than if you guessed every single number. There is a special relationship between the number of guesses and the number of possible numbers to guess. 27 = 128. That’s exponentiation. But the opposite of exponentiation is a logarithm. log2(128) = 7. That means it will take a maximum of 7 steps to find it, but you could potentially find it on the first try. That’s why there are separate categories of big O notation for best case, worst case, and expected case.

Binary search trees can make searches much faster. The time complexity of a binary search is much better than just going through every possible thing. Searching in a dumb way where you guess and check every single thing would have a big O notation of O(n), which is linear time, meaning it increases when you have a larger search space. By contrast, binary search is O(log(n)), which means you can increase the search space by lot and still not impact performance by much.

Not all trees are binary trees. And not all binary trees are evenly or fully fleshed out. An incomplete binary tree might have nodes that only have one child rather than two, for example. There might also be nodes with no child nodes, which are called leaf nodes.

This algorithm’s effectiveness is contingent on the proper data structure. It would not be possible to do this same thing with simple iteration or an array, or something like that. This algorithm is firmly wedded to the binary tree data structure.

The below figure shows an incomplete, unbalanced binary tree.

A heap is another kind of tree structure. Heaps are trees that can useful for things like finding the minimum or maximum value, using things called min heaps or max heaps. They can also store things quickly in a not 100% accurate way. They can be sorted approximately. This makes it easy to put stuff onto a heap, but very difficult to search for things. There are some situations where you’d rather have something be roughly correct and fast as opposed to 100% accurate and slow.

You can represent a heap using an array. Below is an example of an array-based implementation of a max heap:

In a binary max heap, the greatest value is the root node, which is the topmost node. Every parent node is greater than its child nodes. A max heap also has to be complete, meaning that, for each level except for the last one, each parent node needs to have two child nodes. Using an array to represent a max heap structure, the zeroth index is not used, at least in some cases. The way to calculate a parent node’s child nodes is the take the node’s index, double it, and for the second one, and add one. In other words, 2n and 2n+1. So look at the array and look at index 4 as an example. To see element 4 (value of 26)’s child nodes, take (2*4). That gives you 8. Index 8 is 4, and if you look at index 9, it’s 6. You can look at the diagram to verify that it’s true.

Some array-based heaps will put the root node in 0, rather than ignoring it like in this example. In that case, the way to find the child nodes of a parent node is to take 2n+1 and 2n+2 instead.

It’s okay if you don’t understand all of this right now. Data structures are sometimes taught in the 2nd or 3rd year of a computer science undergraduate degree. It’s just good to be aware that data structures exist. If all you code is just basic variables and control flow, without knowing useful stuff like classes or data structures, then your software will be very limited.

Map – a map is a collection of keys and values. Maps are sometimes called associative arrays. Maps are kind of like dictionaries. In Python, you can only use dictionaries, not maps, but they are similar. In Java, you are more likely to use something called a hash map. A hash map uses a hashing function to make searches faster. If you’re familiar with hashing algorithms like md5 or shasum, which are often used for file checksum verification to ensure the integrity of a file you’ve downloaded, you will notice that you can have a huge file, multiple gigabytes, and the hash is still just a small string. There are performance benefits to using hashing instead of a non-hashed map. But maps, hash maps, and dictionaries serve basically the same purpose. You might want to refer to something by its name, or key, rather than its value. If you want to use a hash map in Java, you can do so with the HashMap class.

Hash tables are similar to hash maps, though there are a couple of key differences.

Graph – a data structure with nodes and relationships between them. Dijkstra’s algorithm is a well-known algorithm that utilizes graphs. You could think of his algorithm as like trying to figure out the fastest way to get from your home to a destination, and there can be many streets and intersections that will take you there, but each has a different amount of time that will take to go down that particular street. There is a branch of mathematics called graph theory which is concerned with graphs.

The following image is an example graph data structure. Think of the lines as streets, and the numbers as the time in minutes it takes to drive down that section. Can you find out which route is the fastest from A to J? You can do it manually, but Dijkstra’s algorithm is a more concrete realization of a lot of the things we do in our heads subconsciously, albeit less error-prone or time-consuming.

The dots are called nodes or vertices (vertex for singular), and the lines between nodes are called edges. In the above example, I didn’t make the edges directional, so they’re bidirectional. But in other kinds of graph data structures, edges might be one-way only, so they will have an arrow to indicate the direction, such as from A to B, but not from B to A. Think of traffic on a road: maybe there is a ton of traffic one way and very little traffic the other way, meaning the time it takes to travel on that section of road depends on which direction you’re going. And in some cases, roads are one-way only.