Muck

Muck is a build tool for data analysis projects. Given a target (a file to be built), it looks in the current directory tree for a source file with a matching name, determines its static dependencies, recursively builds those, and then builds the target, possibly building additional dynamic dependencies as necessary. For example, if we ask Muck to build some.txt, it will run any source file with some.txt as a prefix, e.g. some.txt.py, some.txt.sh, or some.txt.md (there must be a single source candidate). If some.txt.py opens data.txt, Muck will suspend the execution of the process and update data.txt.

Unlike Make and other traditional build systems, Muck does not use a "makefile". Instead, Muck determines the dependencies of a given file using static analysis and runtime interposition of the Unix open system call. With Muck, programmers can organize projects into discrete steps with arbitrary file dependencies between them. When the source code for a particular step changes, Muck will rebuild that step and all dependent ("downstream") steps, but will not redo any work that is not affected by the change. This incremental rebuild behavior speeds up the development process and helps prevent errors due to stale product files.

Muck is most useful for projects where the various products can be given descriptive, discrete names. It is less useful for problems that can be framed as processing a continuous stream of inputs; these are better served by an application server.

Getting Started

Muck is a work in progress. I encourage people to try it out, with the caveat that it is not yet entirely stable. If you run into issues, I am more than happy to help you work through them. The project is hosted at https://github.com/gwk/muck, with documentation at https://gwk.github.io/muck. To get started, read the "Installation" section.

License

All of the source code and documentation is dedicated to the public domain under CC0: https://creativecommons.org/.publicdomain/zero/1.0/.

Status

Muck is still in development. Currently it only runs on Mac OS, but Linux support is coming soon. It has been used for a variety of experimental projects, but more work is needed to make it production-ready. In particular, Linux support has recently fallen behind, and the test suite and documentation need improvement.

Issues

Please file any bugs, questions, or comments at https://github.com/gwk/muck/issues.

Installation

Python 3.6

Muck requires Python 3.6. (In the future it will support running project scripts in any available version).

Git

To install the latest revision of Muck:

cd external # Or whatever directory you like to check out code. git clone git@github.com:gwk/muck.git # Clones the Muck repository. muck/update-subs.sh # Check out Muck's dependencies as git submodules. pip3 install -e muck # Install Muck from the repository in editable/development mode. Substitute `pip3.6` or other as appropriate.

Pip

Muck is available via the Python Package Index (PyPI), but due to ongoing development the published version is often out of date. Once Muck enters a more stable phase of development, users should install with pip3 install muck.

Troubleshooting

Once installed, muck should be available on the command line. If it is not, it may be that the PATH environment variable in your shell is not configured to point to console scripts installed by pip. Please open an issue and I am happy to help get you started.

Usage

muck takes a list of targets as command line arguments. If no targets are provided it defaults to index.html.
Given a target dir/stem.ext, muck tries to produce the target file using the following steps:

When a script is run, it is first analyzed by Muck for any static dependencies, which will be updated before running the script. Dependency analysis is limited to source languages that Muck understands. Additionally, Muck intercepts calls to open at the OS level, using DYLD_INSERT_LIBRARIES on macOS and eventually LD_PRELOAD on Linux. This information is used to calculate a complete dependency graph for arbitrary processes on the fly.

An Example

Suppose we want to produce a document that contains two charts, A and B. Each of these is derived from some data that we scrape from the web. A typical Muck project to achieve this would consist of four files:

This project has a dependency graph whose shape is essentially a diamond, with document.html at the root, and data.csv.py having no dependencies. Note that .wu files indicate a markup format called Writeup, similar to Markdown. Markdown support is in progress.

The programmer can build the document by invoking muck document.html on the command line. Muck will then take the following steps:

Credits

Muck is developed by George King, and was initially sponsored by the Tow Center for Digital Journalism. Professor Mark Hansen and the Brown Institute have supported this research since it began in 2015. Gabe Stein has collaborated on design and testing.