Friday Exercise 2.2: Workflow Optimization and Scaling

If you finish the entire workflow and are thirsty for more, try any of the following in whatever order you like:

Bonus 1

Rerun the DAG again with four times the permutations per job (but fewer processes, keeping a total of 100,000 permutations per trait). Which DAG finished in an overall faster time? Why?

Bonus 2

You probably noticed that the job processes from the permutation step create many log, out, and error files. Modify the permutation submit files to better organize these files into subdirectories (check out HTCondor's IntitialDir feature and/or DAG's DIR features). You may wish to always test the DAG using fewer permutations and permutations processes for a quick turnaround.

Bonus 3

Take the workflow to the submit server for the Open Science Grid (, and run it there.

What happens?

Bonus 4

This isn't actual a bonus, but links to a sample workflow diagram and DAG schematic:

And also how to download and look at a solution workflow:

[email protected] $ wget
[email protected] $ tar -xzf WorkflowComplete.tar.gz