Batch process optimization 101: interactive edition

Note: if you're reading this on your cellphone and the right half of the animations are clipped off, set your browser to desktop mode. Sorry about that!

This blog post is a written version of the talk I gave at CPAIOR last year, which in turn was an animated, pop-sci version of the original paper published with J.C. Beck (Springer Link, PDF).

To make things short, we were working on the following problem: imagine, for the moment, that you wanted to construct some marvelously complex machine, like an airplane, and that your airplane consisted of, say, a hundred parts (ha!). Suppose now that you are on a tight schedule, and so you need all 100 parts ready at specific points in time. It just so happens that 50 of these parts need to be processed by one particular machine in your factory, and it becomes clear that this machine will be the bottleneck of your production line. What to do?

Now, just so we're on the same page, this is a real problem. Many airplane parts, for example, are made of lightweight composite materials. Composites are made of a so-called matrix of glue-like stuff (e.g. resin) for bulk and a woven fabric (e.g. of carbon fiber) to hold it together. For the matrix to harden, it must go into a curing oven for a certain minimum amount of time. Here is a video of such a machine:

When we were considering this problem, a French doctorand named Arnaud Malapert had just published (PDF) the results of his dissertation on our airplane factory problem. His approach was successful and sophisticated—maybe too sophisticated, we thought. Could we come up with a simpler technique?

Modelling the problem

Since all of those 50 parts are needed to make the plane, it doesn't matter if 49 parts are ready on time if it means that the 50th will be delayed by two weeks. From that perspective, the completion date of our airplane really only depends on the very last part to be ready for assembly. Since some of the 50 parts need to be painted or tested after curing, that part isn't necessarily the last one to leave the curing oven; but it is the one to leave the curing oven with the greatest delay.

The video above showed a single elephantine cylinder being rolled into the oven chamber on a pushcart. Of course, many of the parts are small enough to go onto a cart together, so the central question becomes: which parts can we put onto a cart together, and which one of those carts gets to go first?

If you've done integer programming before, or any similar kind of discrete-math algorithmicking, then this will strike you as one of those mind-boggling combinatorial problems where we have to search through all the zillions of possible solutions to find the best one. And indeed, it's one of those. So we will use an appropriate piece of software CPLEX to do the searching for us. But the search is performed cleverly. As we construct a potential solution by assigning parts onto carts, we know that the greatest delay will only get worse with every assigned part, never better. So whenever we get to a point in building a solution where the greatest delay is already worse than that of the best known complete solution (i.e., one with all parts assigned), then we should stop and try something else. This way, we'll never waste time looking through solutions that we know won't be better than what we already have. This common-sense strategy is called branching-and-bounding and a good theoretical introduction is found here. But don't worry about the details, they're not important here.

Nomenclature

The search software (called solver) needs a way to know about the relationship between parts, carts, and delays, so that it can distinguish promising solutions from inauspicious ones as it systematically searches through them. Because of the particular technique (called linear programming) that the solver uses to avoid testing infeasible solutions, for instance those with carts running over, we must formulate these relationships as linear inequalities. As you will see, that is easy.

Before we can start, let's agree on some terminology. As is customary in the AI scheduling community, we will call every part a job and give it an index \(j\). Likewise, we will refer to the carts as batches and number them by the order in which they go in the oven; let's use the index \(k\).

Every job \(j\) has three known properties:

  1. a processing time \(p_j\): the minimum time it takes for the part to cure, so our airplane is safe,
  2. a size \(s_j\): the amount of room it takes up on a cart,
  3. a due date \(d_j\): the point at which the part should leave the oven to get painted; we'll use this to calculate the part's delay.

The following animated figure should give you a rough visual idea of what we're dealing with.

All the jobs.

Speaking of the delay: scheduling people like to use the term lateness. And what we're trying to minimize, specifically, is the maximum lateness, or \(L _ \text{max}\). We could call it a minimax optimization, of sorts.

A naive MIP model

After thinking about this batching thing for a short while, it becomes obvious that what matters to a batch (in terms of lateness) are two things: the job with the longest processing time, and the job with the earliest due date. The former determines for how long the batch will stay in the oven; the latter is an unlucky candidate for having the maximum lateness. All the other jobs in the batch—well, all they have to worry about is whether they fit onto the cart at all.

The fundamental rule we can determine with this problem is that to minimize \(L_{max}\), we need to find a solution in which all the batches are sorted by increasing due date. Sure, other optimal solutions might exist, but they can all be reshuffled to have their batches in due date order. We'll refer to this rule as the earliest-due date rule, or EDD.

Our set of inequalities revolves around a matrix of binary variables \(x _ {jk}\). An entry in that matrix is set to 1 if job \(j\) is assigned to batch \(k\), and 0 if not. Once we've agreed on that, everything else falls into place:

¶\text{Min.}\quad© ¶L _ \text{max}©
¶\text{s.t.}\quad© ¶\sum _ {k \in B} x _ {jk}= 1\quad \forall j \in J©
¶\sum _ {j \in J} s _ j x _ {jk} \le b\quad \forall k \in B©
¶P _ k \ge p _ j x _ {jk} \quad \forall k \in B©
¶C _ k = C _ {k-1} + P _ k \quad \forall k \in B©
¶D _ k \ge D _ {k-1} \quad \forall k \in B©
¶D _ k \le (d _ \text{max} - d _ j)(1- x _ {jk}) + d _ j\quad \forall k \in B©
¶L _ \text{max} \ge C _ k - D _ k \quad \forall k \in B©

The first line, as is convention, announces the expression we desire to minimize. The first constraint, then, following the conventional "s.t." for "subject to", declares that each job can only be assigned to one batch: for each job \(j\) in the set of jobs \(J\), the sum over all \(x _ j\)s of different \(k\)s must be exactly one.

Similarly, the second constraint ensures that the sum of job sizes in a batch doesn't exceed the cart's size \(b\). The third constraint establishes a value for \(P _ k\), the processing time of batch \(k\): it's at least as long as the longest of all of the jobs assigned to it. The fourth constraint postulates the obvious: a batch's completion time \(C _ k\) equals the completion time of the previous batch plus its own processing time. Then we formulate the EDD rule; simple stuff. The next constraint looks complex, but only makes sure that the aggregate due date of a batch, \(D _ k\), is at most that of it's earliest-due job (unless it's an empty batch, in which case we'll substitute a pointlessly huge number \(d _ \text{max}\)). Finally, we'll declare \(L _ \text{max}\) to be the greatest of all the delays suffered by the individual batches.

Phew. Not so bad, was it?

The above looks like a run-of-the-mill MIP (Mixed-Integer Programming) model. The price for this elegant, declarative programming style is that the solver's performance is absolutely unpredictable. And—surprise!—in this case it was nothing short of pathetic. Arnaud's model was able to really shine in comparison.

Remodelling the problem

As it so often happens in life, the serendipitous mistakes bear more fruit than all honest attempts at systematic work. I had just spent about half a year trying various heuristics, decompositions etc., with one of them being more disappointing than the next, when suddenly, after mistakenly hitting compile in the middle of writing a constraint, the solver started spitting out (correct) results at blazing speeds. The model made absolutely no sense, and sure enough, some of the results ended up being incorrect. But it inspired a new way for me to think about the problem at hand.

Instead of just talking about assignments of jobs to batches, we think of an initial schedule (which we know is terrible) and then imagine making improvements to it by moving jobs, one by one, into other batches. Permitting only re-assignments (moves) that improve the maximum lateness provides a mental framework to reason about constraints—even though at the end of the day, there is no initial schedule, and the moves are still just assignments.

For reasons of simplicity, our initial schedule will be the schedule in which each job is assigned to one batch, namely the batch that matches its index number (\(j = k\)). Oh, I forgot to mention: to do that, we'll sort the jobs by increasing due date. This way, all batches are in order of their due dates (EDD) and they all contain a single job. Let's call it the single-EDD schedule.

Here's an example to demonstrate what happens when you play with the due date. The following figure shows a single-EDD schedule.

Now, going from this single-EDD schedule, we can reach any solution by moving jobs from their own batches into other batches. Indeed, we can look at any final schedule as a set of moves starting from the single-EDD schedule—a set of steps in move-space, if you will, starting from the origin. And we can rebuild our MIP model from that mental starting point.

Before we consider constraints on processing and completion times, let's think about the EDD rule. Starting from a single-EDD schedule, the one way to violate the EDD order of batches is to move a job into a later-scheduled batch. So let's ban such moves. In terms of the assignment variable ¶x _ {jk}©, we can't have assignment of a job ¶j© into a batch with an index ¶k© greater than ¶j©:

™x _ {jk} = 0 \quad \forall j, k \,|\, j < k.ĸ

The animation below shows what that means in practice.

The animation demonstrates an important point: yes, there are ways to move jobs into later batches without violating EDD. For example: if two jobs scheduled in sequence share the same due date, then we can move the first one into the second batch. But of course, we could also move the second one into the first batch. Similarly, when the jobs are very small compared to the capacity \(b\), we could move a whole sequence of jobs into a later batch, which would then assume the due date of the first of those jobs. But again: we could just as well move all of those jobs into the first of the batches into the sequence, complying with our rule and yet achieving the same end result.

The subtext here, and this is worth pointing out, is that batches are set up from the start. They can be empty (which gives them a processing time of zero, and a due date of infinity), but they will stay in place. They are the board on which we move the jobs like gamepieces.

Host jobs and guest jobs

But here's another constraint we have to introduce: when you put a job into a batch that just became empty, you end up messing with the EDD order. In other terms: a job can never be a guest in a batch that doesn't have its host anymore. Here's an example:


Expressing this as a linear constraint is straightforward:

™x _ {kk} \ge x _ {jk} \quad \forall j, k ,|, j > kĸ

In other words: when a job is no longer in its batch (i.e. \(x _ {kk} = 0\)), then none of the later jobs at all can be assigned to that batch \(k\).

There is a reason we named them host jobs and guest jobs: it is to reinforce the notion that a job owns its batch, because it determines its due date: since we have agreed to only move jobs into earlier-scheduled batches, a batch's due date will never change. In fact, we can give up the constraint

™ D _ k = \min _ {j\;in\;k} d _ j \quad \forall kĸ

in favour of the much simpler

™D _ k = d _ k \quad \forall k.ĸ

Calculating maximum lateness

This simplification comes in very handy for our piece de résistance: calculating \(L _ {max}\), because not only do all batches have an inherent due date, they also all have an inherent lateness, namely their lateness in the single-EDD schedule: \[ L _ {k,single} = C _ {k,single} - D _ k \quad \forall k.\] To get from the single-EDD lateness to the batch's final lateness, all we have to do is find out how the individual moves we made change the batch's completion time. As we'll see in a second, that is easy.

First, I quickly need to mention that both right-hand terms in the above equation are constants that we know up front: the single-EDD completion time is simply the sum of all the processing times leading up to batch \(k\). And \(D _ k\) is simply \(d _ k\), as we've found out above.

Here's how we'll calculate a batch's lateness \(L _ k\): the table below the animation shows you a row for \(L _ {k,single}\). Below, we'll tally up wherever completion times are changed for the better or the worse. The bottom line (literally) shows you the values for \(L _ k\) as we try out various moves.

¶k©12345678
¶p _ k©2176141118198
¶L _ {k,single}©01082223365456
F → C –18–18–18
+12+12+12+12+12+12
B → A –17–17–17–17–17–17–17
+15+15+15+15+15+15+15+15
¶L _ k©01082223365456

Let's write out what we just did:

™L _ k = L _ {k, single} - \sum _ {h \le k} x _ {hh} p_h + \sum _ {h \le k} P' _ h \quad \forall kĸ

The first term is known. The second term represents all the batches \(h\) before \(k\) that are empty: we can subtract their jobs’ processing times from our lateness. On the other hand (and this is the second term), we have to add the overhangs (i.e. extra delays) we introduced when we moved long guest jobs into shorter host batches. The overhang \(P _ k\) is defined as

¶P' _ k \ge©¶x_{jk} p_j - p _ k©¶\quad \forall k©
¶P' _ k \ge©¶0©¶\quad \forall k©

Old vs. new model

Here are the two models, side by side, without the foralls. We've gotten rid of two sets of variables: \(D _ k\) and \(C _ k\). But not only that: we've also strengthened our constraints. Here again the old model, then the new one:

¶\text{Min.}\quad© ¶L _ \text{max}©
¶\text{s.t.}\quad© ¶\sum _ {k \in B} x _ {jk}= 1©
¶ © ¶\sum _ {j \in J} s_j x _ {jk} \le b©
¶ © ¶P _ k \ge p_j x _ {jk}©
¶ © ¶C _ k = C _ {k-1} + P _ k©
¶ © ¶D _ k \ge D _ {k-1}©
¶ © ¶D _ k \le (d _ \text{max} - d_j)(1- x _ {jk}) + d_j©
¶ © ¶L _ \text{max} \ge C _ k - D _ k©

vs.

¶\text{Min.}\quad© ¶L _ \text{max}©
¶\text{s.t.}\quad© ¶\sum _ {k \in B} x _ {jk}= 1©
¶ © ¶\sum _ {j \in J} s_j x _ {jk} \le b©
¶ © ¶P' _ k \ge p_j x _ {jk} - p _ k©
¶ © ¶P' _ k \ge 0©
¶ © ¶x _ {kk} \ge x _ {jk}©
¶ © ¶x _ {jk} = 0©
¶ © ¶L _ {k, single} - \sum _ {h \le k} x _ {hh} p_h + \sum _ {h \le k} P' _ h©

You'll notice that ¶C _ k© and ¶D _ k© have completely disappeared.

Benchmarking

We were able to get rid of two sets of variables. We also strengthened the constraints: in the old model, moving jobs into later-scheduled batches was okay, although those solutions are equivalent to those in which jobs are only moved into earlier batches. Because of these improvements, we expected our new model to perform better—that is, we expected the solver to find the optimal solution quicker.

So we tested the model against 120 test instances, and indeed: it was a difference like night and day. While the naive model above struggled to solve problems with 25 jobs within a reasonable amount of time, the new one plowed through many 75-job instances in a matter of minutes.

Not only that, the new model held its own against Arnaud's sophisticated algorithm as well (Arnaud is a good sport and helped us run his code). Here are the results:

Results

The datapoints below the diagonal represent problem instances our model solved faster.

Conclusion

There is a good chance that Arnaud's algorithm in combination with newer versions of the solver he used (Choco 3 vs Choco 2) will beat our model. But at the end of the day, it's still relatively sophisticated. MIP models, on the other hand, are easily implemented by just anyone, and we were able to greatly improve on the "obvious" formulation just by applying some strategic thinking.

What is most interesting to me is the way we arrived at our new model: we did it by reasoning about moving a job from one batch to another, and forecasting the effects it would have on the solution. That kind of stateful, procedural thinking feels wrong in the declarative world of integer programming. And yet, because the moves are really just assignments, and assignments are independent of each other, it works.

I wonder whether any of the above is generalizable, and whether there are generalizable parallels to the way we can reason about functional (i.e.) declarative programs in an imperative way. If you've read this far ... congratulations, and I'm looking forward to your thoughts!