Comonads: Because it wasn’t confusing enough already

On this blog, I’ve featured a four part series on monads. I figure I can give at least one post to comonads. To start off, I will review CS monads at a high level (as opposed to categorical monads), and then begin to introduce the defining features of a CS comonad. Finally, I will give an example.

We are using Haskell syntax today, because it’s very lightweight and convenient for this.

Monads are types that encapsulate a value, and provide functions called bind and return. Return is the function that takes a value, puts it into the monad, and then returns the monad. If your monad is m, and your type is a, then the type of return is a -> m a. Bind is the function that allows for you to work on the data in the monad; it does a computation and returns a monadic value of the result. It’s type is m a -> (a -> m b) -> m b.  Not essential for a monad mathematically, but included in a CS monad is the notion of fmap, or flatMap, with type signature (a -> b) -> f a -> f b, where f is a functor. This is called flatMap, as it flattens contexts out. In the original monad series, I gave the example in Scala of Option<Option<Int>> being equivalent to Option<Int>.

I have gone over the applications of this before, you can see for yourself that there are many. Beyond this, the obvious (1) question becomes “So how do we do it backwards?”.

You may have guessed that this is what a comonad is. So if a monad is all about encapsulating context-sensitive values, then what does that mean for comonads? Take a moment to think about that. If you chose not too, it’d be a real shame if I were to include a long filler sentence to make you wait anyway. Tension is exciting. Comonads are a way to take new data out of a source irrespective of context (2).

A comonad has three defining functions, extract, extend, and duplicate. Extract is analogous to return. If we have a comonad, which we usually denote as w, because it’s an upside down m, and that comonad is  of a type a, then extract has type w a -> a. The symmetry is obvious. This is the easy to understand function; the other two are more in depth.

Extend has type signature (w a -> b) -> w a -> w b. For those of you that read Haskell type signatures well, the behavior of this function is easily determined. Extend takes a function that extracts b‘s out of w a‘s, a w a, and then returns a w b. As comonads usually store a collection of values, this is a map-like operation. This is symmetric to Bind.

I’ve drawn a diagram. The first diagram is for a monad. It represents Bind. Given an m a from somewhere unspecified (I drew the arrow, as this m a is usually a function call), and a function that maps from a to m b, you map your m a to your m b.diag2In the second diagram, we have extend. I am using M for a comonad in this diagram because the symbols are arbitrary for the demonstration, and paint is really bad for editing text. Extend takes a function that takes a b out of our comonadic a, a function that takes a c out of our comonadic b, and returns a function for taking a c out of a comonadic a. This parallels Bind’s ability to chain together computations, as you can plug an extract into another extract just like you can plug a bind into another bind.


The last function of note is duplicate. This is analogous (I keep saying analogous, though the mathematical term is dual) to fmap. Duplicate nests contexts, and has type signature w a -> w (w a). The use of this is that in a functional program, a container has a focus. This focus is one element of the container; it’s what we call a context. This looks like it complicates things, and it does, but hear me out. What duplicate returns is a container of containers that each has a different focus. This allows for functions like fmap to work on all of the elements at once.

Now, I can’t just dump this on you without giving an example. Believe it or not, streams are comonads. Intuitively, this makes sense, as streams operate on codata. Before I go further, let’s define a stream. In Haskell, data stream s = Cons s (Stream s). This is pretty much a list without a base case. Extract is to return the current element, duplicate, on a Cons of s and stream, is to return Cons (Cons s stream) (duplicate stream). Extending a function f on a stream is to flatMap that function applied to the duplicate function.

(1) This had never occurred to me, even though I had been doing it for a while. Streams are comonads, as I will introduce later.

(2) “Irrespective of context” is a confusing statement, so I’ll give some examples. Representing time sensitive data asynchronously is a use of comonads. Representing read only memory in a mutable environment is a use of comonads.

Posted in CS

Don’t Go To Sleep With Broken Code

I have a rule that’s served me well all these years. Actually, I have many, but there’s one in particular we’re going to discuss today. You already know it because it’s the title of this post. Actually, I hate when people do a reveal of a topic at the start of a video but the topic was in the title the whole time. Oh well; I think it’s less egregious in text. Sticking with it. Anyway, we’re discussing software development today.

A fair amount (1) of the time I spend writing code is at night, in my bed. (2) As such, there is an even greater temptation than usual to just put everything down and go to sleep when a problem pops up. It’s nice in the moment; all of the problems go away and I get a good night’s rest. The issue is that later in the day, I don’t want to go back to writing the code.

Working on a large program is like making an elaborate cake. If you just leave it out while you’re working on it without moving it to a freezer or putting foil over it, it’ll rot. You have to take the time to fix the instabilities and put everything back before you leave. This is actually a great metaphor; I’m amazed I haven’t heard this yet. (3)

I’ve definitely lost sleep over this rule, but I’ve finished many more things because of it. It seems like something that requires a lot of discipline to follow, but unlike many other “don’t do this” kind of rules, failing to follow it only reinforces its need. If I’m getting tired and my code makes no sense, I just think about all of the things I never finished, and I’m kept awake by a crushing sense of guilt and inadequacy. Just kidding. Sort of.

(1) By ‘fair amount’, it is meant ‘very large portion’

(2) Coupled with the fact that I’m not into high end gaming, this is why I don’t own a desktop computer.

(3) Debugging a large program is like pulling ants out of a large cake; it takes a long time and your hand is covered in junk food residue by the end.

Posted in CS

A Spreadsheet Argument

What if I told you that you’ve been writing functional programs and didn’t event know it? It sounds crazy, but it’s true. In this post, I will present an argument that spreadsheets are functional programs.

When I say spreadsheet, I mean google sheets without the scripting interface. Though useful, it allows for impure code.

First off, what makes a program functional? The metrics I will use here are that all values are immutable, all functions return a value, and functions are first class citizens.

In a spreadsheet, cells can only be written to when you turn them into either a function or a value. This is exactly analogous to defining constants and functions when writing the code of a program. Check that one off the list.

The notion of returning a value in this context is very simple and intuitive. The returned value is just the value of the cell. This is actually a really interesting feature for a functional program, as it demonstrates the exact similarity of function definition and value in real time.

Lastly, functions are first class citizens. This one is a slight bit of a stretch, but using ranges along with indexing and matching functions, cells (and therefore functions) are first class citizens in the sense that they can used in the same way values can.

Now, if that doesn’t convince you, here’s a spreadsheet I made/program I wrote that calculates a factorial recursively:

Normally, spreadsheets don’t allow for recursive function calls. You get a circular reference error; this is because the way the call-stack works in a spreadsheet is less complex than how it works in a compiler. This doesn’t mean that we should demand GHC to be included in every spreadsheet, all it means is that we lose a bit of expressiveness and need to do recursion the pain in the ass way.

The normal way to do recursion is to define a function that calls itself. What this does is create a stack of function calls like so: (Factorial 10 as an example)

10 * factorial(9)

10 * 9 * factorial(8)

10 * 9 * 8 * factorial(7)

The first function call, factorial 10, is evaluated last when the above code finishes, and the last function call, factorial 1, is evaluated first. This is first in last out, the same as a stack.

When recursive function calls aren’t available, you have to define the stack yourself and push and pop function calls. This is what the spreadsheet does.

Now that a means of traversing a data structure is defined, as well as function/value behaviour, it is quite evident that spreadsheets are functional programs.

Why is this useful or good to know? I have no clue.


Posted in CS

Life Update, For All Of Us.

Now that I have a cold and am spending several hours lying in bed and doing nothing, I have time to make a blog post. Unfortunately, there hasn’t been much time to be thinking and writing about theoretical computer science since I moved in to University (One would hope for the opposite. Eventually…). Nevertheless, interesting things have happened, for example, I’ve joined several clubs and organizations (Including a juggling club!). The dorm is actually pretty good; the bed is comfortable, and I have a good roommate. I’ve stopped worrying about how dog-accessible my food is, and my clothes have never been cleaner. The problem with Maddie not being here is that now I can’t lie to myself about how much of the hair everywhere is mine.

Oh, so you thought you could get away with reading a blog post that didn’t have anything technical in it? Never again; we are going super low level today. Like, hardware level. (1)

Something of interest that I’ve learned about recently is hardware acceleration of artificial intelligence, specifically neural networks. I’ve done some work with neural nets as of late; specifically neural style transfer. Every image takes me an hour to run because, being the massive scrub I am, I’m running it off CPU. Listen, when you have Linux running on a virtual machine, getting GPU to work is non-trivial. Anyway, with the release of the iPhone X and it’s “NPU” hardware, I thought I’d write for a bit about the different technologies used in this field.

To start off, I should define what hardware acceleration is. Most programs, if not all, on your computer are processed on your CPU. This makes sense, as it is a central processing unit. A CPU can do anything (2), however it trades some performance for this flexibility. With the advent of cool video games with super awesome graphics, the demand was created for higher performance in doing graphics computation. This kind of work was more suited to high precision parallel computation (Meaning with a ton of cores) than what a CPU could offer. To remedy this, a new piece of hardware was brought into the mainstream: the graphics card. A graphics card has a shitload of cores that do high precision arithmetic (3). While the CPU processes your video game, and it comes across a graphics computation, it send it to the graphics card and continues its work elsewhere. This works extremely well. In fact, it works so well, a new technology for computation developed out of it.

Someone once got the bright idea of, “Hey, what if we took a graphics card and made it do things that are not graphics?” While it sounds stupid on the surface, if you remember what a GPU does (parallel high precision arithmetic), this is a great idea. The concept of General Purpose GPU, or GPGPU, lets you write “compute shaders”, which are regular computation programs put into graphics form for a GPU to work on. GPGPU is, in many cases, much faster than CPU for a task. Cryptocurrency mining, like Bitcoin or Ethereum, is an application of this. In fact, that’s why GPU prices get hiked; people run out and buy them en masse when a new mining opportunity arises. As I hinted at in an earlier paragraph, GPGPU can speed up neural nets. If you know what a neural net is, this makes sense how parallel arithmetic can help.

“So what’s the next step? Can you get better than GPGPU for AI?” Why yes you can! Thanks for asking conveniently placed hypothetical reader! It turns out, you don’t really need super high precision arithmetic for AI. You’re kind of wasting time beyond a certain number of decimal places. You can use something called a Field-Programmable Gate Array, or FPGA, to get a card that can do parallel low precision arithmetic for testing purposes, but for an actual enterprise-class (4) device, you’ll want an Application Specific Integrated Circuit, or ASIC. This is sort of an umbrella term for a hardware acceleration device, but we really don’t have a good name for this kind of hardware yet (5). This is what’s in the iPhone; a specially made piece of hardware tuned exactly to the kind of computation required for neural nets. I can’t really tell you anything about it outside of parallel low precision arithmetic because of Apple’s business model with the proprietary everything, but trust me, this is extremely cool.

Looking to the future, if ASIC AI acceleration becomes mainstream, as I suspect it will be due to the new iPhone, AI will take a much bigger role in our lives. We’ve seen it in history how technologies catch on when they get cheap. The price with AI is not just money, but also time, and hardware acceleration will solve that. The discussion is getting too philosophical, so I’ll end it here. Perhaps it’ll continue in a future post.

  1. Ten bucks says everyone on Facebook stops reading here.
  2. Within mathematical and physical limitations. New project: halting accelerator card; HPU.
  3. This is a gross oversimplification of the different processes that GPUs do, but it’s good enough for our discussion here. Perhaps one day I’ll write a post on Anatomy of a GPU.
  4. yes
  5. Unless Apple’s name, the NPU (Neural Processing Unit), catches on.
Posted in CS

What’s your type?

I talk a whole lot about type systems and their importance to writing clean and effective code. In this post, I will talk about what exactly a type is, and why these definitions can be useful.

Let’s start off like anything programming related and google it. The first definition is for the noun: “a category of people or things having common characteristics.” Aside from the use of the word category that means something far different in the context of this blog, this definition is rather intuitive and nice. The problem is that it isn’t very formalized, and is therefore susceptible to ambiguity.

We could go for the extreme mathematics approach, and state the equivalence of types and propositions. This means that a type A is the proposition stating There exists a term that has type A, and the proposition A is the type (Class, in other and less self referential words) of all proofs of A. This is incredible formal and precise, however one would have to be absolutely mad* to use this to define an integer.

The definition that I like, as a nice compromise between the too abstract and too simplistic, is the definition of a type as a space of values. This proves to be incredibly useful in performing algebra on types, and is relatively useful in programming compared to other definitions. With this definition, a type T is the space of all possible values that a term of type T can be bound too. The Boolean type is the space off True and False. The type of integers is the space of all integers between -32,768 and 32,767. The type of void is the empty set, and the unit type is a space with one point.

This can be extended to algebraic type operators. A tuple of type A and B has a space with size A * B, and an either type of A and B has a space with a size of A + B. Functions get even more interesting, representing exponentiation, but it’s slightly more involved.

As promised, this is convenient to algebraic type theory. What is the behavior of the type that is (int, Either<Bool,Unit>)? It is the type with a space that contains  65,527 * (2 + 1) distinct points, of course.

Here’s a well written article that goes through an example of the uses of this formalization for understanding recursive types:

I can highly recommend giving it a look.

*Looking at you Russell and Whitehead

Posted in CS

An Argument or Two

This post is about partial applications of multivariate functions. I’m going to begin with some vocabulary, talk about the definitions at hand and their uses, and then clear up a big misconception.

Arity is the number of arguments that a function takes. A unary function takes one argument, a binary function takes two, a trinary function takes three, and so on. We can say that a function that takes n arguments is n-aric. Another term for unary and binary is monadic and dyadic, but I do not like these terms because a monad is something much different and more complex.

This is the technically correct term for these functions, but when I say multivariate, people know what I mean, and that is what matters.

The next definition isn’t anything new, but we need to talk about what goes on when I say a = b in any language.

Binding is when you give something (a value or an expression) a name. The behavior of this varies from language to language. In C++, if I set a = 3, b = a, and then a = 4, b will still equal 3. I assigned b to the value of a, not the abstract concept of what a stands for. This is pass by value. If b was instead a pointer to a, which means (at a high level) that b represents what a represents, not what a represented at that moment. This is pass by reference. In a functional program, which is what we care about, this is not a problem because variables cannot be assigned twice. Assume that all binding is pass by reference for this post.


Partial application is when I have a multivariate function, say f(w,x,y,z). To apply this function, I call it with values in the variable names. f(1,2,3,4) for example. To partially apply this function, I call f without all of it’s arguments, like f(1,2,y,z). What this example does is, instead of returning a value*, it returns a 2-aric function. If this function is g, I can call g(someValue,someOtherValue), and get the result of f(1,2,someValue,someOtherValue). I like the Scala implementation of this the best, where you put a _ in for the arguments. As another example, you could create the exponential function from the power function by calling pow(e,_).

A related subject that people often get confused with partial application is something called currying. Currying is when you break a multivariate function into a sequence of unary functions. Haskell’s type signatures help to understand this. Haskell has, in my opinion, the gold standard of type systems. Let’s say I have a function that takes three arguments: a list of type a, a function that takes a value of type a and returns a boolean (this is called a predicate), and integer, and then it returns a string. Wonder what this function does all you want, it doesn’t actually matter for our example. The type signature of this function would be [a] -> (a -> Bool) -> Int -> String. A way to look at this is that this is a function that takes a list of type a and returns a function that takes a predicate and an int to return a string. That function has type signature (a -> Bool) -> Int -> String, which is really just a function that takes a predicate and returns a function with type signature Int -> String. You get the idea.

Fun fact, currying is named after the same guy that Haskell is named after, the American mathematician Haskell Curry. His middle name, Brooks, is also used in the programming language Brook.

Posted in CS

I Am Not An Engineer

It’s come time to admit that I am not an engineer. This seems obvious to some; they may say, “Well of course, you’re going into computer science, not computer engineering.”, but I think it goes deeper than that. All throughout high school, I used to love discussing and making jokes about engineering with my friends on the robotics team. Constant jests about zip ties and duct tape, and memes about technical systems we’ve built before (“Hey guys, how about we put dual drive on it?”) were said frequently. It made sense, what we were doing was engineering. My friends are becoming engineers. Both of my friends who did the same thing that I did on the team, the programmers, are both becoming engineers; one a chemical engineer, and the other a robotics engineer. I’m doing something fundamentally different.

Engineering, to me, is finding a solution to a problem. Pretty cool, right? Damn straight. I love solving problems. Pretty much every post on this blog is about solving problems. Well, that’s not entirely true, and that’s what this post is about.

I’ve realized that I don’t give one single fuck as to what exactly the solution to a problem is.

This is where the english language really fails. The word “solution” encompasses too wide of a concept for my purposes. The engineering solution to a problem just has to work. It makes sense; what more does a solution need to do than work? Isn’t good enough just that by definition? The solutions that I like are elegant and perfect in every way; entirely analytic. I thrive on abstraction and generalization, because these do more than allow one to solve a problem, but to gain insight into how that problem is solved, and what the perfect solution is. I have no answers to the questions I posed 5 sentences ago, and I don’t care to find any. I don’t feel like I need a reason for this, an elegant solution to me has its own value.

This is far from saying that I don’t care about engineering. I love engineering and engineers; I love the culture of curiosity and the determination to overcome obstacles. I love the sound of the word and everything I associate it with, from the tiny clicks and whirrs of precision machinery in a busy workshop to the loud, unconstrained roar of a giant engine let loose on open asphalt. All I’m saying is that no matter how cool I find it, it’s not what describes me.

On March 14th in middle and high school, we celebrated pi day. We would learn cool facts about pi, and have a contest to see who could memorize and recite the most digits of pi. I never won any of those contests, nor did I care to; what use does memorizing digits of pi have? What insight does it give you? You only need 39 digits of pi to calculate the circumference of the observable universe to within a hydrogen atom, so even with engineering solutions, how does that help? The digital representation is entirely arbitrary due to it’s dependance on base 10. This is all we did on pi day, wasting time and energy on something entirely useless, and never something like an infinite series or continued fraction representation that gives insight into the nature of this fundamental constant. This infuriated me, and continues to do so.

What I look for is what’s true. You can get all philosophical and conclude “Nothing is true! Looks like we’re done here.”, but I think that that’s no fun at all and tells you literally nothing about anything. I’m a scientist by nature.

My mother asked me to go grocery shopping with her recently. She grabbed a cart and started pushing it around. I noticed that when I bumped into her, I got shocked. My immediate reaction was, “Huh, I wonder if that will happen again?” and proceeded to tap her arm, and got shocked. I got the idea that the wheel may be collecting charge due to a faulty bearing, and explained this to her. I tested the hypothesis in various situations throughout the shopping trip, such as different cart speeds, and different parts of the supermarket, and it held up. The engineering solution to that situation, that my mother definitely would have preferred early on, is to realize after the second shock that it’s probably due to touching her and I should just follow at a safe distance and not get shocked. I couldn’t accept that, because then I wouldn’t know why it was happening. After I loaded the groceries into the car, I had to put the cart back. On my walk there, a woman asked if I was done with the cart, and if she could have it. I gave it to her and said, “One of the wheels is slightly broken and is gathering charge; you’ll shock people you touch when pushing it.” She got this grin on her face upon hearing that, and it made my goddamn day.

There isn’t always a payoff to insight or a detriment to a lack thereof like in the pi day and grocery shopping stories, but the few times make it worth it to always look for it.

If I had just decided to stop shocking myself by not bumping into my mom, it would have worked. My guess from two data points would have been accurate enough to solve the problem of “I am getting shocked”. My entire point with this post is that I don’t care, and wouldn’t do that, no matter how cool it is that it’s possible to be accurate like that. Of course this is only one example, and a rather mundane one at that, but I don’t have a personal anecdote that better shows my own desire to solve a problem in a way that reveals why the problem exists and what it implies.

This is why I love the highly abstract. This is why I talk about monads and recursion and functional programming, even if it’s slower than conventional methods. I don’t care about the speed; speed tells me nothing about the problem. I’m perfectly okay with my two line java program that computes the fibonacci sequence thats sixteen times as slow as a different method, because it lets me learn new things about what can be done with stream processing.

These things are what describe me. Horribly inefficient programs with weird algorithms. Shocking yourself repeatedly on purpose in the middle of a Wegman’s. Going up to the front of the class on Pi Day during a digit memorizing competition, saying 3, and walking back.

I am a scientist.

A Neat Trick for Stream Processing

I had seen an example of streams in Scala with a two line program that prints out the Fibonacci sequence out to some bound. I wondered if I could translate this to Java while still using two lines, so I set out to do that. In doing so, I discovered something very interesting and useful. Here is the Scala program; look at it and see any glaring problems that would get in the way of a Java translation:

lazy val fibs : Stream[BigInt] = BigInt(0) #:: BigInt(1) #:: => n._1 + n._2)
fibs take 100 foreach println


The first problem is that seeding the stream, starting off with 0 and 1, is not as easy in Java. I hesitate to say impossible, because it might be possible, but I prefer my solution to trying to figure out how to do it. The next problem is that Java doesn’t like recursion in an object initialization, so I can’t define fibs in terms of fibs. The last problem is that the Java Stream interface doesn’t have nice methods like zip and tail.

I needed an iterative method to generate Fibonacci numbers. The problem here is that the Fibonacci sequence relies on past elements, which seems inconducive to stream processing, or as I have said before, “not very streamy”. to Before you say “Ackchyually, Josh, you can solve the recurrence relation and get an explicit formula for the nth Fibonacci number, so this is trivial”, I counter with the fact that that is no fun for my self imposed golf challenge, and that I wouldn’t have discovered a cool thing if I had done that.

I have now referenced this supposed “cool thing” twice without any hint of what it is beyond being used in an iterative Fibonacci generator, so I’ll get to it. Here’s my code:

Stream<BigInteger[]> fibs = Stream.iterate(new BigInteger[]{BigInteger.ONE,BigInteger.ONE}, i -> new BigInteger[]{i[1], i[0].add(i[1])}).limit(100);
fibs.forEach(i -> System.out.println(i[0]));

I used an array. When you process a stream of integers, you care about two things: the current value, and the successor function. For a normal integer stream, the current value is just a number, and the successor function is generally just x + 1. What I did was to make a stream of arrays of integers (Yes, I did use BigIntegers, but that’s a trivial difference for my result) of size two. The current value was the last two fibbonaci numbers (I started it off with one and one), and the successor function was the move the values down one, to rotate the array in technical terms, and then put my actual, number theoretical, successor function in the last index of the array. It’s a bit hard to see with arrays of size two, but let’s say I have an array of size n.

for(int i = 0; i < array.length-1; i++) {
    array[i] = array[i+1];
array[array.length-1] = someSuccesorFunction();

By processing a stream like this, I can handle n elements at a time instead of just 1! This opens up many new possibilities for stream based algorithms, as well as for other problem solving approaches that can usually only handle one element at a time, such as recursive functions and list iterators.

Now, of course I had to do a speed test between this and the explicit formula. Here’s the code I used to do so:

static final double PHI = (1.0 + Math.sqrt(5.0))/2.0;

public static void main(String[] args) {

    long[] time1 = new long[10];
    long[] time2 = new long[10];

    for(int t = 0; t < 10; t++) {

        long time = System.nanoTime();
        Stream<BigInteger[]> fibs1 = Stream.iterate(new BigInteger[]{BigInteger.ONE, BigInteger.ONE}, i -> new BigInteger[]{i[1], i[0].add(i[1])}).limit(10);
        fibs1.forEach(i -> System.out.println(i[0])); //Used to force value computation

        time1[t] = System.nanoTime() - time;

        time = System.nanoTime();
        Stream<Integer> fibs2 = Stream.iterate(1, i -> i + 1).map(n -> (int) (Math.round((Math.pow(PHI, n) - Math.pow(-PHI, -n)) / Math.sqrt(5)))).limit(10);
        fibs2.forEach(i -> System.out.println(i)); //Used to force value computation
        time2[t] = System.nanoTime() - time;


    long timeAvg1 = 0;
    long timeAvg2 = 0;
    for(int i = 0; i < 10; i++) {
        System.out.println("Stream 1: " + time1[i]);
        timeAvg1 += time1[i];
        System.out.println("Stream 2: " + time2[i]);
        timeAvg2 += time2[i];
    System.out.println("Stream 1 Average: " + timeAvg1/10);
    System.out.println("Stream 2 Average: " + timeAvg2/10);

Limits of ten were set as the explicit formula had overflow errors and couldn’t use BigIntegers.  I got a result of:

Stream 1 Average: 4176485
Stream 2 Average: 262163

This isn’t very surprising. My method is about 16 times slower than the explicit formula, but it can handle arbitrarily large input size, and is much cooler with far reaching implications for problem solving.

Posted in CS

You Don’t Even Compare

Ω(n log n)

This statement represents the run time of any comparison based sort. A comparison sort is an algorithm that sorts a list based on some operation that compares two elements and tells what order they go in. Examples of this are > and < on numbers. The big omega is like the big O, except that it represents a lower bound. Comparison based sorts can be worse than n log n (Known as linearithmic time, and also loglinear time), but that’s the best they can do. Popular algorithms like quicksort and mergesort, possibly the focus of a future post, run in linearithmic time. This is sort of depressing to think about; you can’t do any better. Or can you?

I implore the reader to pause here dramatically. I can’t really make you pause, I’m just words. The best I can do is make you read this paragraph. Get rekt sucker.

Yes, you can do better! It requires some specific circumstances though; there is a tradeoff between speed and generality. I’m going to talk about my favorite non-comparison based sort, and mention a few others.

Counting Sort is my favorite one. It is best described by General Chuck Yeager’s description of a good airplane engine, “Simple, few parts, easy to maintain, very strong”. To describe the algorithm, I will give an example.

Here is an array: [1,5,2,3,5,3,2,5,6,3,2,1,4,6,7,3,2,2,4,7,3,2]. It is not sorted. Well, maybe it is, but that is one weird definition of sorted you have. Let’s use counting sort.

The basic conditions for counting sort are as follows: 1, you are sorting integers, 2, you know the minimum value and the maximum value, and 3, though this one is more of a recommendation, there are more numbers to be sorted than the range they cover (max – min).

Step one, make an array of size (max – min) and initialize every index to zero.

Step two, iterate through all the things that need to be sorted. For every number, subtract min from it and increment that index in the array. In our example, every occurrence of a 1 would increment index 0, every occurrence of a 3 would increment index 2. Our array would look like [2,6,5,2,3,2,1]

Step three, iterate through your array and for each value, loop that many times and output min + the index. We would output [1,1,2,2,2,2,2,2,3,3,3,3,3,4,4,5,5,5,6,6,7]

Done. Linear time sort.

There are two others of note, however they both involve a step “Sort this collection”, so not exactly standalone* algorithms. They are bucket sort and radix sort. Bucket sort involves defining any number of subsets, or buckets, of the collection to be sorted, putting the elements into those buckets, sorting the buckets (usually more efficient), and then looping over the buckets and outputting them. Fun fact, with two buckets you essentially have quicksort.

Radix sort is for sorting sequences, like strings. It employs counting sort very well as a subroutine. There are two versions, most significant digit radix sort (MSD) and least significant digit radix sort (LSD). These two variants work either backwords (LSD) or forwards (MSD) along the sequence, and drop them into buckets of known order (alphabetical, numerical), and then does this in each bucket. If we have [170, 45, 75, 90, 802, 2, 24, 66], and then we do LSD, no not like that, we first end up with [[170, 90], [802, 2], [24], [45, 75], [66]], where the numbers are sorted by the smallest digit. This nested context is flattened, but I showed the grouping to help you understand. Next we sort by the next least significant digit, this time the tens place, and end up with [[802, 2], [24], [45], [66], [170, 75], [90]]. Once again, groupings are flattened. Finally, we sort by the next least significant digit, the hundreds place, and end up with [[2,24,45,66,75,90],[170,802]]. This is flattened, and we’re sorted! Mathematically, this runs in linearithmic time, however in practice at non-asymptotic input (Things aren’t infinite), it is faster.

*Technically you can bucket sort the buckets recursively until you bottom out at one item, but that is slow. Insertion sort is a popular choice for this.

Lists and Arrays

Lists and arrays are two of the most common and useful data structures. Today I’ll talk shortly about their definitions and properties. Let’s get right to it.

We’ll start with arrays because the definition is easier to understand. This won’t be mathematical, because lists and arrays don’t really differ that much at a high level of abstraction, they both just store elements in a sequence. An array encapsulates a type, maybe integers, maybe strings, anything. Let’s call this type A. The array is a direct sequence in memory of elements of type A next to each other. The number of these elements is the size of the array.

Let’s say we have an array of integers of size 6 containing [1,43,5,12,62,13]. It might look like this* in memory:array.png

*a massive simplification. I might do a post on what memory actually looks like in the future

Notice how there is no random crap between the numbers. Each successive index, meaning place, in the array refers to a successive element in memory. The index is how to refer to something in the array. Indices, the plural, start at zero (Meaning the first element has index zero) because they represent offset in memory from the starting position. In our array, 1 has zero offset from 1, 43 is one memory space away from the first element, and therefore has index one. 5 has index two, and so on.

A list is similar in the sense that in encapsulates a type and stores elements. Lists are of mathematical interest actually, but today we’re dealing with the machine level. If our list contains the same data as our array, [1,43,5,12,62,13], it could look like this: list.png

There are three things of note. The first is my excellent skill at scribbling. I’ve had years of practice, so don’t feel bad if your scribbling isn’t this good. The second thing is that the data are scattered around, and the third thing is that there are extra numbers next to them, as well as that black thing. I know this looks horrible, but we’ll get to the advantages and disadvantages of this structure later.

The numbers next to the actual data are called pointers. They refer to a location in memory. I’ve simplified them here, so that going left to right and top to bottom, it starts at 0 and ends at 47. As always, we know the location of the first element, but not necessarily the nth element. No matter! If we need to find, say, the fifth element, we simply add one to the index of the last element we know and go there until we’ve done this four times. The pointer on the last element is the null pointer. It points to nothing, and is how you know you’ve reached the end (Or you could just count). Because of all this, this is also called a linked list. If you include a pointer to the previous element with each piece of data, then you have a doubly linked list that allows you to go backwards along it as well as forwards.

Now that these two data structures are defined, let’s talk about their differences through two fundamental operations:

-Reading a value at a specific index

-Writing a value at a specific index

To read a value at a specific index in an array is easy. For example, if your whole array starts at memory location n, and you want the element at index 17, then just read whatever value is at n+17. The amount of time it takes to read any element does not depend on the size of the array, so we say that it takes constant time or O(1) pronounced Big-O of 1. This property of arrays is called random access.

To read a value at a specific index in a list is a bit harder, as you don’t know the location of the value immediately. For example, to find the 36th element, you need to follow the trail of pointers 35 times first. In a list with n elements, to find any element will take on average n/2 operations. As n goes to infinity, the function n/2 acts just like the function n, so we say that is operation takes O(n), a.k.a. linear time. Lists do not have random access. Wow, lists seem like they really suck, right? Wrong.

Writing a value in an array in any index that isn’t the end is a hassle. If my array is of size 5 and is [1,3,4,5,_] where _ is a blank, and I want to put a 2 in between 1 and 3, I have to go through every element from the end and copy it to the next index up first. On average for an array of size n, this will take n/2 times, and is therefore O(n). I had to specify a blank, because there is also a chance that major problems are caused if the array is already full. Unlike a list, an array is of fixed size, so it cannot* be extended. When you accidentally do this, you might have a good language that sees it happening and causes an error, or you might not and just overwrite whatever was sitting next to your array in memory, be it cat photos, your essay due tomorrow morning, sensitive personal information, you name it. This is called array overflow or memory leak. I’ve also heard the fun colloquialism falling off your array.

Writing a value in any index in a list is easy. Plop it down somewhere, add a pointer to the next element, and change the pointer of the previous element to point to it. This doesn’t have anything to do with the rest of the list, so it is a constant time operation (O(1)).

To summarize, arrays are easier to write to, and lists are easier to read from. Knowing which one to pick is an important and useful step in solving many problems.

*In modern languages, arrays can be automatically “extended” by making a new array of larger size and copying the data over, then changing the pointer on the variable to the new one when the array isn’t in use. This is sort of like murdering someone and replacing them with a clone that’s better at something when no one is looking. The point of that analogy is that dynamic arrays are a great idea and I love using them to solve problems.