Muck is a build tool for data analysis projects. Given a target (a file to be built), it looks in the current directory tree for a source file with a matching name, determines its static dependencies, recursively builds those, and then builds the target, possibly building additional dynamic dependencies as necessary. For example, if we ask Muck to build
some.txt, it will run any source file with
some.txt as a prefix, e.g.
some.txt.md (there must be a single source candidate). If
data.txt, Muck will suspend the execution of the process and update
Unlike Make and other traditional build systems, Muck does not use a "makefile". Instead, Muck determines the dependencies of a given file using static analysis and runtime interposition of the Unix
open system call. With Muck, programmers can organize projects into discrete steps with arbitrary file dependencies between them. When the source code for a particular step changes, Muck will rebuild that step and all dependent ("downstream") steps, but will not redo any work that is not affected by the change. This incremental rebuild behavior speeds up the development process and helps prevent errors due to stale product files.
Muck is most useful for projects where the various products can be given descriptive, discrete names. It is less useful for problems that can be framed as processing a continuous stream of inputs; these are better served by an application server.
Muck is a work in progress. I encourage people to try it out, with the caveat that it is not yet entirely stable. If you run into issues, I am more than happy to help you work through them. The project is hosted at https://github.com/gwk/muck, with documentation at https://gwk.github.io/muck. To get started, read the "Installation" section.
All of the source code and documentation is dedicated to the public domain under CC0: https://creativecommons.org/.publicdomain/zero/1.0/.
Muck is still in development. Currently it only runs on Mac OS, but Linux support is coming soon. It has been used for a variety of experimental projects, but more work is needed to make it production-ready. In particular, Linux support has recently fallen behind, and the test suite and documentation need improvement.
Please file any bugs, questions, or comments at https://github.com/gwk/muck/issues.
Muck requires Python 3.6. (In the future it will support running project scripts in any available version).
pip3that you are using are what you think they are.
pip3 --version. You can also use
pip3.6(or a later version) if you do not want to change your default installation.
To install the latest revision of Muck:
cd external # Or whatever directory you like to check out code.
git clone firstname.lastname@example.org:gwk/muck.git # Clones the Muck repository.
muck/update-subs.sh # Check out Muck's dependencies as git submodules.
pip3 install -e muck # Install Muck from the repository in editable/development mode. Substitute `pip3.6` or other as appropriate.
Muck is available via the Python Package Index (PyPI), but due to ongoing development the published version is often out of date. Once Muck enters a more stable phase of development, users should install with
pip3 install muck.
muck should be available on the command line. If it is not, it may be that the
PATH environment variable in your shell is not configured to point to console scripts installed by pip. Please open an issue and I am happy to help get you started.
muck takes a list of targets as command line arguments. If no targets are provided it defaults to
Given a target
dir/stem.ext, muck tries to produce the target file using the following steps:
dir/stem.ext.py, then that file is executed and its standard output is written to
dir/stem/ext.py, and the resulting product is then executed as a source file.
When a script is run, it is first analyzed by Muck for any static dependencies, which will be updated before running the script. Dependency analysis is limited to source languages that Muck understands. Additionally, Muck intercepts calls to
open at the OS level, using DYLD_INSERT_LIBRARIES on macOS and eventually LD_PRELOAD on Linux. This information is used to calculate a complete dependency graph for arbitrary processes on the fly.
Suppose we want to produce a document that contains two charts, A and B. Each of these is derived from some data that we scrape from the web. A typical Muck project to achieve this would consist of four files:
This project has a dependency graph whose shape is essentially a diamond, with
document.html at the root, and
data.csv.py having no dependencies. Note that
.wu files indicate a markup format called Writeup, similar to Markdown. Markdown support is in progress.
The programmer can build the document by invoking
muck document.html on the command line. Muck will then take the following steps:
document.html: does not exist; infer source:
document.html.wu: analyze, discover dependencies:
chartA.svg: does not exist; infer source:
chartA.svg.py: analyze, discover dependencies:
data.csv: does not exist; infer source:
python3 data.csv.py, writing stdout to
python3 chartA.svg.py, which reads
_build/data.csvand writes stdout to
chartB.svg: does not exist; infer source:
python3 chartB.svg.py, which reads
_build/data.csvand writes stdout to
writeup document.html.wu, which reads
_build/chartB.svg, and writes
Muck is developed by George King, and was initially sponsored by the Tow Center for Digital Journalism. Professor Mark Hansen and the Brown Institute have supported this research since it began in 2015. Gabe Stein has collaborated on design and testing.