An easy-to-use form of lapply that emulates parallelization using a SLURM cluster.

superApply(x, FUN, ..., tasks = 1, workingDir = getwd(), packages = NULL,
  sources = NULL, extraBashLines = NULL, extraScriptLines = "",
  clean = T, partition = NULL, time = NULL, mem = NULL, proc = NULL,
  totalProc = NULL, nodes = NULL, email = NULL)

Arguments

x

vector/list - FUN will be applied to the elements of this

FUN

function - function to be applied to each element of x

...

further arguments of FUN

tasks

integer - number of individual parallel jobs to execute

workingDir

string - path to folder that will contain all the temporary files needed for submission, execution, and compilation of inidivudal jobs

packages

character vector - package names to be loaded in individual tasks

sources

character vector - paths to R code to be loaded in individual tasks

extraBashLines

character vector - each element will be added as a line to the inidividual task execution bash script before R gets executed. For instance, here you may want to load R if it is not in your system by default

extraScriptLines

character vector - each element will be added as a line to the individual task execution R script before starting lapply

clean

logical - if TRUE all files created in workingDir will be deleted

partition

character - Partition to use. Equivalent to --partition of SLURM sbatch

time

character - Time requested for job execution, one accepted format is "HH:MM:SS". Equivalent to --time of SLURM sbatch

mem

character - Memory requested for job execution, one accepted format is "xG" or "xMB". Equivalent to --mem of SLURM sbatch

proc

integer - Number of processors requested per task. Equivalent to --cpus-per-task of SLURM sbatch

totalProc

integer - Number of tasks requested for job. Equivalent to --ntasks of SLURM sbatch

nodes

integer - Number of nodes requested for job. Equivalent to --nodes of SLURM sbatch

Value

list - results of FUN applied to each element in x

Details

Mimics the functionality of lapply but implemented in a way that iterations can be submmitted as one or more individual jobs to a SLURM cluster. Each job batch, err, out, and script files are stored in a temporary folder. Once all jobs have been submmitted, the function waits for them to finish. When they are done executing, all results from individual jobs will be compiled into a single list.

Examples

# NOT RUN {
#------------------------
# Parallel execution of 100 function calls using 4 parellel tasks
myFun <- function(x) {
    #Sys.sleep(10)
    return(rep(x, 3))
}

dir.create("~/testSap")
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", time = "60", mem = "1G")


#------------------------
# Parallel execution of 100 function calls using 100  parellel tasks
sapOut <- superApply(1:100, FUN = myFun, tasks = 100, workingDir = "~/testSap", time = "60", mem = "1G")


#------------------------
# Parallel execution where a package is required in function calls
myFun <- function(x) {
    return(ggplot(data.frame(x = 1:100, y = (1:100)*x), aes(x = x, y = y )) + geom_point() + ylim(0, 1e4))
}

dir.create("~/testSap")
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", packages = "ggplot2",  time = "60", mem = "1G")


#------------------------
# Parallel execution where R has to be loaded in the system (e.g. in bash `module load R`)
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", time = "60", mem = "1G", extraBashLines = "module load R")


#------------------------
# Parellel execution where a source is required in funciton calls
# Content of ./customRep.R
   customRep <- function(x) {
           return(paste("customFunction", rep(x, 3)))
   }
# Super appply execution 
myFun <- function(x) {
    return(customRep(x))
}

dir.create("~/testSap")
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, sources = "./customRep.R", workingDir = "~/testSap", time = "60", mem = "1G")

# }