The Range (or Interval) Union problem is a pretty commonly encountered problem both in real life and in a technical (coding) interview setting.
Problem Statement: Given a set of ranges (intervals) of the form [start, end), compute the Union of these intervals such that the result set of intervals contains no overlapping intervals.
Other variations of the problem ask you to compute things like:
The solution: The solution itself first sorts all intervals by their start point, and then does something really clever to compute the union of the intervals. The key observation is that if we are processing intervals in order of their start point, we know that the union interval extends at least as far as the end point of the first interval, so we move ahead; including intervals that begin before the first interval ends. In fact, we must keep updating the end point to the right most end point of any interval we have included in the current union interval. That's pretty much it. Once we encounter an interval that starts after our current union interval ends, we can wrap up the union interval and start a new union interval which starts and ends where the new interval starts and ends. We only ever update the end point of the union interval while extending it since we started with the left most begin point for the union interval.
Examples used in the code: (Blue boxes are input intervals and the green boxes are the union intervals).
Problem Statement: Given a set of ranges (intervals) of the form [start, end), compute the Union of these intervals such that the result set of intervals contains no overlapping intervals.
Other variations of the problem ask you to compute things like:
- The largest set (cardinality) of intervals that cross each other
- The number of intervals crossing a given range (or point, or set of ranges/points)
All these variations can be answered using the general technique described below.
The aim of this post is to provide a very simple/easy solution to a problem that otherwise would require fancy data structures, while at the same time retain a decent workable run-time complexity. The solution presented below requires time O(n log n) if the input intervals are not sorted, and O(n) if we assume that the input intervals are sorted by their start times.
This is a shuttle post (i.e. the solution was written up in a shuttle, with the duration of the journey being ~40 minutes).
A massive thanks to Vishwas for providing the most simplified solution to this problem (which he came up with after I described the problem to him in the shuttle itself!)
The solution: The solution itself first sorts all intervals by their start point, and then does something really clever to compute the union of the intervals. The key observation is that if we are processing intervals in order of their start point, we know that the union interval extends at least as far as the end point of the first interval, so we move ahead; including intervals that begin before the first interval ends. In fact, we must keep updating the end point to the right most end point of any interval we have included in the current union interval. That's pretty much it. Once we encounter an interval that starts after our current union interval ends, we can wrap up the union interval and start a new union interval which starts and ends where the new interval starts and ends. We only ever update the end point of the union interval while extending it since we started with the left most begin point for the union interval.
Examples used in the code: (Blue boxes are input intervals and the green boxes are the union intervals).