Problem Definition
The goal of this programming assignment was to learn more about the practical issues that arise when designing a tracking system. We are tasked with tracking moving objects in video sequences, i.e., identifying the same object from frame to frame. Two datsets are provided  the bat dataset and the cell dataset on which to demonstrate our work.
Method and Implementation
Object Localization

Bat dataset
For the bat dataset, we opted to use the localizations provided in the project resources. 
Cell dataset
The cell dataset requires us to perform segmentation for tracking. The grayscale values of the cell objects and dish background are very similar, so we used a variety of techniques to measure cell centroids for the Kalman filter. To differentiate the region of interest (the cell's dish) from the background, we use a hand crafted mask. This is possible because the dish and camera do not move throughout the video. Any detections outside of this region are not reported. Within the disk, we separate it into two regions. The cells in the bottom region (bottom ~70% of each frame) are much easier to localize because there is a larger separation between absolute grayscale values of the dish and the cells. Here, we use absolute thresholding. For the top region, absolute thresholding is not possible without missing many cells or erroneously picking up the background. Here, we use edge detection to detect the cell boundary, dilate, and then use a lower absolute threshold on the pixels close to the detected boundary. We combine detections for the top and bottom regions, and perform connected components analysis to extract centroids of the detections.Example Localizations of Cell Dataset
The original frame in the dataset Binary cell detections Cell centroids after connected components analysis
Object Tracking
For tracking, we utilize an Extended Kalman Filter (EKF).
The EKF is an extension to the vanilla Kalman Filter which linearizes the process model around the control point and operates with discrete approximation of a continuous time system.
The EKF operates with the following assumptions:
 The prior state is represented by a Gaussian distribution, i.e. $ p(x) \sim N(\mu, \Sigma) $
 The continuous process model is $\dot{\bf x} = f({\bf x},n)$ where $n \sim N(0,Q)$
 The measurement model is $z = h({\bf x},v)$ where $v \sim N(0,R)$
For the purposes of the EKF used in this project, we assume that the internal state representation of an object is given by a 4D vector. The first 2 dimensions represent the estimated position in Cartesian space, while the second 2 represent the estimated velocity, i.e. $$ {\bf x_t} = [x_t, y_t, \dot{x}_t, \dot{y}_t]^\top $$ The EKF allows us to track the internal state estimate as well as the covariance of the estimate. While the covariance is not directly used in our work (for simplicity), it could concievably be used to more intelligently scale matches during bipartite matching.
Process Update
The process update seeks to estimate the new position of the tracked obect after some time elapsed.
This would be given by $ {\bf x_t} = {\bf x_{t1}} + f({\bf x_{t1}},n_t)\delta t $.
By further assuming that the object velocity is constant, barring a measurement indicating otherwise, we can simplify the dynamics to be:
$$ f({\bf x},n) \approx {\bf Ax} + n$$
$$\text{where }{\bf A} = \begin{bmatrix} {\bf 0}_{2 \times 2} & {\bf I}_{2 \times 2} \\ {\bf 0}_{2 \times 2} & {\bf 0}_{2 \times 2} \end{bmatrix}$$
With an assumption of unchaning noise ($\frac{\delta f}{\delta n}$ is constant), we can then represent the state and covariance estimate process update as a discretized onestep Euler integration:
$$ {\bf \mu}_t = {\bf F}_t{\bf x}_{t1} $$
$$ \bar{\Sigma}_t = {\bf F}_t {\bf \Sigma}_{t1} {\bf F}_t^\top + {\bf Q} $$
Measurement Update
A general measurement model ${\bf z}_t = h({\bf x}_t, v)$ can be linearly approximated as ${\bf z}_t \approx h({\bf \mu}_t,0) + {\bf C}_t({\bf x}_t  {\bf \mu}_t) + v$
Given that for both the bat and cell datasets, we are observing the (x,y) Cartesian coordinates of the centroids of the bats or cells in a fixed reference frame, we can represent the measurement (a.k.a. observation) model as:
$ {\bf z}_t = {\bf C} \tilde{x_t} $
where $\bf z$ is the measurement  in this case the (x,y) cartesian coordinates of a bat or cell, $\tilde{x_t}$ represents the true state of the object at instance $t$, and ${\bf C} = \begin{bmatrix} {\bf I}_{2 \times 2} & {\bf 0}_{2 \times 2} \end{bmatrix}$.
Following a similar derivation as with the standard Kalman filter, we can derive the Kalman gain and state estimate updates for this system as:
$$ {\bf K}_t = \bar{\bf \Sigma}_t {\bf C}^\top \left({\bf C} \bar{\bf \Sigma}_t {\bf C}^\top + R \right)^{1} $$
$$ {\bf x}_t = {\bf \mu}_t + {\bf K}_t({\bf z}_t  {\bf C \mu}_t) $$
$$ {\bf \Sigma}_t = \bar{\bf \Sigma}_t  {\bf K}_t{\bf C} \bar{\bf \Sigma}_t$$
In the absence of a measurement, the current state estimate is taken as the state estimated by the process model, i.e. $x_t = \mu_t$ and $\Sigma_t = \bar{\Sigma}_t$.
Tracking Pipeline
The full tracking pipeline is implemented (from scratch) in EKFfilter.py. Two class objects are defined: EKF2D which handles an individual 2D EKF filter and BatchTracker which handles a batch of EKD2D objects. Together, they implement the following:
 Initialize EKF filtertracker(s) at a provided initial position(s)
 For each new frame, pefrom a process update to estimate the current estimated position(s), $\mu_t$ of the object(s)
 Find $n$ nearest neighbors amongst the detections for frame $t$ for each object
 Greedily assign detections as measurements for objects based on how close the detection is to the predicted object position, given that the distance between the detection and prediction is below a set threshold
 Given a detection assignment, update the internal state estimate of the object via a measurement update
 If an object cannot greedily claim any of its nnearest neighbors, it is determined to not have an associated detection for the frame and the internal state is updated with the results of the process update
 For any detections that have been assigned to any of the objecttrackers, a new tracker is spawned at the detected position
Experiments
Object Tracking
Object tracking was primarily built with the bat dataset as a reference.
It was primarily important to establish a filter that was capable of not only following along with a specific object but also ignore 'distracting' measurements as appropriate.
Consider this following track, which tracks a single bat over the full bat sequence:
It is clear to see from the video clip that the tracker is able to successfuly track this bat through the full video sequence. For most of the track, there is little ambiguity, however, around the middle of the video, a number of bats overlap and the detections are dropped. Two consecutive frames  frames 828 and 829  are analyzed to show how the filter handles the case where no good detection is available (828) and where a detection is received (829). These specific frames are analyzed as they illustrate the workings of the EKF well.
EKF Prediction and Updates  
Generally, achieving good tracking required the tuning of the filter biases. We tuned the noise parameters associated with internal state representation and measurement noise, $Q$ and $R$ respectively, until good tracking was achievable. This was mostly just done through trial and error with the intuition that a lower $Q$ value indicates a higher trust in the internal state estimate, and a low $R$ value indicates a higher trust in the measured positions. Biasing too much on internal state representations would not allow the states to evolve accurately with new measurements and being too reliant on measurements would cause tracks to be lost or corrupted in cases of overlap or missing detections. As our results show, the values we arrived at allowed for reasonably good tracking
Results
Bat dataset
Generally speaking, the tracking for the bats dataset appears decent, as demonstrated in the following video:
While the results for tracking are, we belive, generally good for the bats dataset, there are a few problem cases where tuning the filters to work well generally resulted in performance issues on a few individual cases:
Tracking issues on Bats dataset  
Cells dataset
The tracking on the cells dataset is more chaotic because the cells don't move at constant velocity and cells will collide and split. However, the Kalman filter manages to correctly track every cell for a large portion of the video:
Tracking issues on Cells dataset
The cells dataset performs relatively well for cells that split into multiple cells as it creates a new track quickly. However, it may be quick to create erroneous new tracks, especially when cells drastically change velocity, or due to imperfect segmentation. This is shown especially in this frame, where 3 separate cells that were each previously detected as a single cell split into multiple tracks.
This pathology causes cells to be mostly correctly followed, but will change tracks in the middle of the video, switching to a track that was incorrectly spawned earlier.
Cell tracking is much more successful in regions near the top of the dish, where cells are more separated. It has the most issues in regions with a lot of cell activity, where many cells are clumped together in a small region moving erratically.
Discussion
As shown in the videos, the tracking results are most successful when the detection of the objects is correct and when the objects are not occluded or densely packed.
Under good conditions, the Kalman filter can perfectly predict the position of the tracking objects after tuning appropriate parameters.
For spurious detections, there is a distance threshold that is in play during measurement matching that ensures that the predicted point won't be matched with a distant measurement even though there is no nearby match.
In this case, if the object goes too many frames without a mesaurement update, the object will instead be removed from the active tracks.
While it is difficult to provide an objective analysis without groundtruth information, visually, it would appear that our method allows for reasonable tracking even in situations where objects touch and occlude each other or when new objects are introcude into the frame, given that the dynamics of the object are reasonable  this is plainly observed in the bats dataset where the movement of the bats, while sometimes erratic, is mostly regular.
However, we still face issues when dealing with sudden large shifts in object dynamics  as observed in the cells dataset.
With more tuning of the filter gains and biases, it may be possible to achieve better tracking.
Our methodolgy is sound however and is flexible to tuning efforts, should they be taken.
Credits and Bibliography
The following websites were accessed for reference
 https://en.wikipedia.org/wiki/Extended_Kalman_filter
 Philip Danmes' notes on EKFs from the University of Pennsylvania (2015)