32. Open-source R Package for modelling data with continuous piecewise linear functions


John Alasdair Warwicker

 Stochastic Optimization (also called Stochastic Programming) is concerned with optimization problems under uncertainty, where some of the input data are not known with certainty at the time of decision making. In Stochastic Optimization, it is assumed that we have information about the distribution of these uncertain input parameters. This information in then incorporated into the mathematical programming models to yield optimal decisions under uncertainty.

What is the data science project’s research question? 

Recently, a number of mixed-integer linear programs (MILPs) have appeared in the literature to model discrete data or continuous functions with piecewise linear (PWL) functions. PWL functions are composed of affine functions which intersect at breakpoints. Fitting PWL functions allows for the prediction and classification of data through interpolation and extrapolation.

Currently, the majority of software packages for fitting PWL functions rely on heuristic methods. The goal of this project is to create an R package for using exact MILP models to fit optimal continuous PWL functions to data.

What data will be worked on?  The data which will be used as input is publicly available, including data sets from published journal articles.

What tasks will this project involve?  Creating an R package – therefore, a good knowledge of R is vital for this project. Competence in version control is also necessary. A basic knowledge of MILP models is also preferable.

What makes this project interesting to work on?  This project offers an interesting opportunity to work on open source software, with the end goal of creating a publicly available R package.

What is the expected outcome?  Contribution to research paper, Contribution to software development, A software package (preferably in R) which can be used to fit optimal PWL functions to model discrete data and continuous functions using exact MILPs.

What infrastructure, programs and tools will be used? Can they be used remotely?   All of the work for this project can be done from home. If necessary, we are able to grant remote access to the machines within our offices.

What skills are necessary for this project?  Data analytics / statistics, Scientific computation, Computational models, Computer simulations, Knowledge of R, version control, and a basic understanding of mixed-integer linear programs.

Is the data open source?  Yes

Interested candidates should be with the necessary skills (Master level or higher is preferred).  John Alasdair Warwicker is looking for 1 visiting scientist, working on the project together with the team.