Introducing the FASTEN project
A popular form of software reuse is the use of Open-Source Software (OSS) libraries, hosted on centralized code repositories, such as Maven or NPM. Developers only need to declare dependencies to external libraries, and automated tools make them available to the workspace of the project.
In recent years, we have seen package management fail in spectacular ways:
- In the lefpad incident, a developer broke a significant part of the Internet by just removing a package from NPM.
- Equifax lost $4 billion because they deemed a security update unnecessary.
- A Linux kernel developer engaged in a series of litigation actions against tens of companies claiming lost revenue, due to, in his view, in-proper enforcement of GPL in a transitively derivative project.
- A recent study by Lauinger et al. found that 1 out of 3 top sites uses at least one library with a known vulnerability.
The list goes on. Package management, and its repercussions, is a topic that affects the daily lives of millions of developers and users, but it has only received moderate attention from researchers.
Last spring, I led a group of 7 partners towards the submission of a project proposal to the H2020-ICT-18 call. Then, in August, we learned that the European Commission granted our consortium a significant amount of money to make package management more intelligent!
The core idea behind FASTEN is really simple: instead of analyzing dependencies at the package level, we will analyze them at the call graph level! This will allow us to be super precise when we are tracking dependencies, when we do change impact analysis, when we recommend clients to update packages etc. It will also open the door to new sophisticated applications, e.g. licensing compliance, dependency risk profiling and data-driven API evolution.
As is usual in those cases, while the idea sounds simple and straightforward, its practical implementation, as we learned in our related work on Rust, is anything but. Static call graph generators are unsound; modern features in programming languages (dynamic dispatch, extensible classes) complicate static call graph generation; in many languages, projects need to be built before constructing call-graphs; the generated graphs are huge; the queries we will need to run will bring current graph databases to their knees. However, the accuracy benefits of creating ecosystem-level, versioned call-graphs outweight the drawbacks. In our preliminary Rust study, we found that in the case of pin-pointing vulnerable packages, accuracy can be improved by 3x, the issues mentioned above withstanding.
Our vision for better dependency management goes beyond giant call graphs. Our goal, and also the project's raison d'être, is to bring the benefits of fine-grained dependency tracking to the hands of developers. To do this, we will create a continuously updated service that automatically analyzes all package versions in the Java, Python and C (Debian) ecosystems and maintains the call-graphs centrally. On top of this, we will create processes that read data from external sources (e.g. security disclosures, GitHub analytics) and enrich the graph, by appending information to the graph nodes (functions). To compensate for inefficiencies in call-graph generation, we will allow clients to upload call-graphs generated by running a project's test suite; this will allow us to enrich the graph edges (function calls). We will implement custom analyses (e.g. vulnerability tracking) as efficient traversals on our graph. And, importantly, we will create plug-ins for Maven and PyPI, to enable developers and CI environments to query the FASTEN knowledge base in a way that looks like this:
$ pip list docutils (0.10) Jinja2 (2.7.2) MarkupSafe (0.18) $ pip check-security Jinja2 (2.7.2) has known vulnerabilities (your project is affected!) Update to version >=2.7.3 (will not break your project) $ pip test-upgrade Jinja2 --version 2.8 Upgrading to Jinja2 2.8 will break the following methods: myproject.foo() myproject.bar() $ pip what-breaks --delete myproject.foo The following direct dependencies will break if you *delete* function foo() * projectA: 15 methods use foo() * projectB: 10 methods use foo() 632 indirect dependencies will fail to work. $ pip test --upload-dyngraph ............15 Tests run OK! Dynamic call graph at: myproject.dot Uploading dynamic call graph to FASTEN
An incredible group of researchers and practitioners with a passion for research that has impact are collaborating in the FASTEN project. I list them below, along with the main contact point per partner (many of them are hiring 😊):
- Software Engineering Research group, TU Delft, Georgios Gousios, PI
- Business Analytics Lab, Athens University of Economics and Business, Diomidis Spinellis
- Laboratory for Web Algorithmics, U Milano, Sebastiano Vigna
- XWiki SAS, makers of the XWiki collaboration platform, Vincent Massol
- Endocode AG, experts in OSS licensing, Mirco Boehm
- Software Improvement Group, experts in software quality, Magiel Bruntink
- OW2 consortium, experts in OSS communities, Cédric Thomas
At TU Delft, we have openings for 2 PhD students and 1 scientific programmer. If you are up for a serious research challenge, with ample opportunity to use your hacking skills to impact everyday software development, we would like to hear from you!