Notes for the last midterm in data structures.

read more
Notes for Exam 4

For BST, maximum depth and maximum height is N-1 where N is the number of nodes. 

Minumum depth/height is the floor of log(n). 

Average insert, remove, find etc, is O(logN). 

Worst case for BST, insert, find, etc.
is O(n), and best case is O(log n). 

AVL trees: 

Just a balanced BST. The height is Log(N). 


Each rotation takes constant time. Which means it’s pretty much negligible in the O(n) notation. So O(height) is the insert time, so O(logN) is the insertion time. 

Find can be done in O(logN) time, same with insert and remove (lazy delete). 

Traversals, or “sorting the contents of the tree”, can be done in O(n*logn). 

Don’t really understand traversals. 


Now the keys are the “values” essentially. So you store keys in the table, and they each correspond to a bucket. And there’s some function that relates keys to buckets. 

Collisions are when you map different key to the same bucket. This can be resolved by using chaining, which is to let the first guy in the hash point to the next guy (closed addressing). Or using probing, which is where you specify a step size and you start with the bucket that you found is full and then you keep looking through the table for the first available bucket, each time stepping up by the step size (mod table size, so that you’ll loop around when you get to the end of the table) and once you find a bucket you put it in there. This is called open addressing. 

Widely used hash function is key mod T. where T is some integer. But choosing the right T is important. Good idea is have T be prime. 

You could convert ASCII characters to integers, and then hash by just doing key Mod T. 

O(1) for insert, and O(N) for search and remove. <— Worst case. Because every element would be mapped to one hash and you’d have to go through that entire list. 

Load factor is lambda and is number of elements divided by table size. Obviously if Lambda is greater than 1, we should rehash, which is just choosing a larger prime number. Make sure that the prime number is always bigger than the desired table size. 


For probing, you can do linear probing: which is just probing a constant amount. So looking at the current +2, then +4, then +6, etc. Or quadratic, which would be +1, +4, +9, etc. 

Have to use lazy delete for probing. 

Secondary clustering is having a bunch of keys attached to the same bin, and primary clustering is just when you have a large hash table, but all your keys are mapped to bins that are near each other. 

Usually like to have lambda be <= 1/2. So hash table be twice number of expected number of keys. 

Clusters increase the expected huber of probes needed to insert, find, remove. 

Double hashing is just hashing again WITHIN A SINGLE BUCKET. 

So Cuckoo hashing is like chaining hash tables, and because they have independent functions, you should be able to fit anything anywhere. So say you try to shove something in table 1 and there’s something already in there, then put it in the right spot in table 1 and move the existing element into table 2 which should be a different location. 

Sink vertex is only in directed graphs, and it’s where all of the vertices are connected to it, but it’s not connected to any. 

Topological sort: only defined where there are no cycles and the graph is directed (DAG, or directed aclyclic graph). For each (i,j) in the graph, you make it so that i comes before j in the list. So you could have (i,j) and (j,k) which would result in i,j,k. done. 

Indegree is the incoming number of edges. 

An algorithm for top sort is just start with the node that doesn’t depend on anything, and then do it, and then decrement the in degrees of all the nodes that it connects to, and then do it again until everything’s done.