How to Build a Versatile Traverse Function from Scratch
This article will break down the process of building a versatile traverse function from scratch. But first, we need to clarify what is it exactly.
What is the traverse function?
Here is one of the plainest explanations I could find;
A data structure contains elements, which contain data
Traversing a data structure means: "visiting" or "touching" the elements of the structure, and doing something with the data.
The keywords in this definition are visiting elements and data structure.
First, there is no "visiting" if there is nothing to visit (elements).
Second, "data structure" is a very broad term. In the most general terms, a data structure is a collection of data stringed together somehow, arranged in some logical order, where one data point is connected to another, and so on.
A data structure could be a dictionary, a linked list, an array of items, a tree of nodes, and the list goes on. These are common concepts in computer science, and they have been implemented in almost every programming language albeit they might go by different names.
Data structures are almost geometrical in nature. If you were to draw a representation on a whiteboard, they would look like strips, squares, triangles, circles, and trees too.
The simpler the shape, the easier it is to traverse it, and many programming languages have built-in functions for it.
The easiest shape would be "strips", like arrays. These simple data structures are trivial to traverse and the term iteration is typically used instead, which marks a sort of distinction in simplicity.
The kind of traversing we are interested in discussing in this article is the less trivial kind, that one used to visit nodes on a complex data structure.
Iteration
Now that we know what traversing means, we can start sketching what the function would look like.
const traverse = (target: unknown): void => {
/** Do something here **/
}
traverse({ ... })
There is no point in traversing unless we want to do something. What we want to do on each data point is not important, only that we are able to do so. What is needed then is a callback function to which we can delegate whatever task we want to run on each data point.
const traverse = (target: unknown, callback: Function): void => {
/**
* Do something here, eventually invoke callback().
**/
}
const callback = (data: unknown): void => {
/** Do something here with data point **/
}
traverse({ ... }, callback)
We will repeat the operation for each data point, which we call iteration.
Support Any Data Structure
The first problem with creating a versatile traverse function is that it’ll need to know how to navigate to each data point in a data structure.
We are using TypeScript for the code examples. So in JavaScript, the obvious place to start is being able to navigate to each item or key-value pair in an array or object.
But wait, we are missing out on other data structures? What about Map, WeakMap, and Set support? We can add code to iterate for those too.
Except that the traverse function may also run into class instances created by anyone. Sometimes data points don’t necessarily map to all properties on a class instance. But we don’t know what we don’t know.
To overcome this problem, we need to "inform" the traverse function of other data structures and give it a set of instructions to get references to each data point.
const registeredIterableClasses: IterableClassEntry[] = []
const registerIterableClass = (entry: IterableClassEntry): void => {
registeredIterableClasses.push(entry)
}
const traverse = (target: unknown, callback: Function): void => {
/**
* 1. If target is a non-iterable type, exit early.
* 2. If the target is iterable, access registeredIterableClasses
* and find the correct strategy to read the iterable’s references.
* 3. Invoke callback() on each data point traversed.
**/
}
const callback = (data: unknown): void => {
/** Do something here with data point **/
}
traverse({ ... }, callback)
Knowing how to access references to a data structure is a start. Remember, we want to make traverse() as versatile as possible, perhaps that callback should be able to change values or remove values, what if we want to clone the data?
We now realize, to give the necessary instructions to traverse() deal with a data structure we need to give it more.
interface IterableClassEntry {
classRef: any // Reference to class itself as id
instantiate: Function // use to create a new instance of the class
getKeys: Function // use to get a list of references to each data point
read: Function // use to read a value
write: Function // used to update a value
remove: Function // used to remove a reference to a value
}
Support Nesting
We now have addressed the how to traverse data structures. The next issue to tackle is that data structures may contain other data structures.
We can use recursion for that, so we call the same traverse function on each data point being traversed so that it in turn does the same for any data points in them as well and so on until it encounters no more iterable data structures.
const registeredIterableClasses: IterableClassEntry[] = []
const registerIterableClass = (entry: IterableClassEntry): void => {
registeredIterableClasses.push(entry)
}
const traverse = (target: unknown, callback: Function): void => {
/**
* 1. Immediately invoke callback(target).
* 2. If the target is iterable, access registeredIterableClasses
* and find the correct strategy to read the iterable’s references.
* 3. Invoke traverse() on each data point traversed.
**/
}
const callback = (data: unknown): void => {
/** Do something here with data point **/
}
traverse({ ... }, callback)
Dealing with Circular References
Last but not least, there are data structures that at some point reference themselves. It is a plausible scenario, you may find this on a "circular" - looking type structure like a graph.
If you were to encounter that and the algorithm does not know how to detect it, it will most likely cause an infinite loop or livelock and cause your program to run out of memory and freeze or terminate.
This topic could be another article on its own, and so I will write about it later.
Optimizations
A traverse function might be quite expensive. Sometimes you may only want to run this against a slice or "depth range" and or have an exit early strategy to avoid iterating unnecessarily on other data points once you have the outcome you need from the callback.
Final Result
I hope this quick guide has helped you understand the key details to implement your own traverse function.
If you can't be bothered creating your own, you are in luck. I’ve already created this production-ready traverse() function for anyone to use.
https://github.com/enio-ireland/enio/tree/develop/packages/data-ferret