Supporting the Comprehension of Data Analysis Scripts
Florian Sihler, Oliver Gerstl, Lars Pfrenger, Julian Schubert, Matthias Tichy
TLDR
flowR is an IDE extension for R that improves data analysis script comprehension and maintainability through dataflow visualization and static analysis.
Key contributions
- Introduces flowR, an IDE extension for R, enhancing script comprehension in Positron/VS Code.
- Offers interactive dataflow graph visualizations, linting, and inline value annotations.
- Performs interprocedural data/control-flow analysis for comprehensive dataflow graphs.
- Achieves near real-time feedback (576ms avg.) and supports custom analyses via plugins.
Why it matters
Data analysis scripts are often hard to comprehend, hindering reproducibility. flowR provides robust IDE tool support to make R scripts easier to understand and maintain, improving data science workflows and fostering better research.
Original Abstract
A lot of research relies on data analysis scripts to process, clean, and visualize data. However, recent studies show that these scripts are often hard to comprehend and maintain, hindering reproducibility and reuse, accompanied by a lack of tool support for handling such scripts. In this work, we focus on the R programming language, addressing this problem by presenting flowR as an extension for the common data analysis IDEs Positron and VS Code. Alongside a previously presented static backward program slicer, flowR provides an overview of data analysis scripts, interactive graph visualizations, linting, and inline value annotations to support data analysts. FlowR incrementally analyzes R projects by intertwining interprocedural data- and control-flow analyses to build a comprehensive dataflow graph, incorporating R's dynamic and explorative features. Additionally, flowR offers a plugin system and interfaces, allowing the integration of further analyses, such as new linting rules or custom visualizations. Requiring an average of 576ms to calculate the full dataflow graph of real-world projects, this enables near real-time feedback. The demonstration video is available at https://youtu.be/hJzr-r-NmMg . For the full source code and extensive documentation, refer to https://github.com/flowr-analysis/flowr . To try the docker image, use `docker run --rm -it eagleoutice/flowr`.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.