An easy-to-use form of lapply that emulates parallelization using a SLURM cluster.
superApply(x, FUN, ..., tasks = 1, workingDir = getwd(), packages = NULL, sources = NULL, extraBashLines = NULL, extraScriptLines = "", clean = T, partition = NULL, time = NULL, mem = NULL, proc = NULL, totalProc = NULL, nodes = NULL, email = NULL)
x | vector/list - FUN will be applied to the elements of this |
---|---|
FUN | function - function to be applied to each element of x |
... | further arguments of FUN |
tasks | integer - number of individual parallel jobs to execute |
workingDir | string - path to folder that will contain all the temporary files needed for submission, execution, and compilation of inidivudal jobs |
packages | character vector - package names to be loaded in individual tasks |
sources | character vector - paths to R code to be loaded in individual tasks |
extraBashLines | character vector - each element will be added as a line to the inidividual task execution bash script before R gets executed. For instance, here you may want to load R if it is not in your system by default |
extraScriptLines | character vector - each element will be added as a line to the individual task execution R script before starting lapply |
clean | logical - if TRUE all files created in workingDir will be deleted |
partition | character - Partition to use. Equivalent to |
time | character - Time requested for job execution, one accepted format is "HH:MM:SS". Equivalent to |
mem | character - Memory requested for job execution, one accepted format is "xG" or "xMB". Equivalent to |
proc | integer - Number of processors requested per task. Equivalent to |
totalProc | integer - Number of tasks requested for job. Equivalent to |
nodes | integer - Number of nodes requested for job. Equivalent to |
list - results of FUN applied to each element in x
Mimics the functionality of lapply but implemented in a way that iterations can be submmitted as one or more individual jobs to a SLURM cluster. Each job batch, err, out, and script files are stored in a temporary folder. Once all jobs have been submmitted, the function waits for them to finish. When they are done executing, all results from individual jobs will be compiled into a single list.
# NOT RUN { #------------------------ # Parallel execution of 100 function calls using 4 parellel tasks myFun <- function(x) { #Sys.sleep(10) return(rep(x, 3)) } dir.create("~/testSap") sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", time = "60", mem = "1G") #------------------------ # Parallel execution of 100 function calls using 100 parellel tasks sapOut <- superApply(1:100, FUN = myFun, tasks = 100, workingDir = "~/testSap", time = "60", mem = "1G") #------------------------ # Parallel execution where a package is required in function calls myFun <- function(x) { return(ggplot(data.frame(x = 1:100, y = (1:100)*x), aes(x = x, y = y )) + geom_point() + ylim(0, 1e4)) } dir.create("~/testSap") sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", packages = "ggplot2", time = "60", mem = "1G") #------------------------ # Parallel execution where R has to be loaded in the system (e.g. in bash `module load R`) sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", time = "60", mem = "1G", extraBashLines = "module load R") #------------------------ # Parellel execution where a source is required in funciton calls # Content of ./customRep.R customRep <- function(x) { return(paste("customFunction", rep(x, 3))) } # Super appply execution myFun <- function(x) { return(customRep(x)) } dir.create("~/testSap") sapOut <- superApply(1:100, FUN = myFun, tasks = 4, sources = "./customRep.R", workingDir = "~/testSap", time = "60", mem = "1G") # }