STATOR:Establishing code-base
Here we list pros and cons of all our approaches towards establishing STATOR's code-base. All pros/cons are listed in the order from the most important to the least one.
- TODO:
- Update information in all pros/cons points in all subsections below such that free tools like PAGAI, FramaC, UFO, CPROVER, CBMC, KLEE, Bugst, etc. are included.
Contents
- 1 Perspective 1: Main implementation language/platform
- 2 Perspective 2: Programs in what languages should STATOR analyse
- 3 Perspective 3: What sort of programs we consider
- 4 Perspective 4: Build STATOR on top on a single freely available tool
- 5 Perspective 5: Build STATOR on top of a code collected from several different sources (tools)
- 6 Open questions
- 7 Results
Perspective 1: Main implementation language/platform
Java
- pros
- We can take a lot of code from tools like CPAchecker and Stanse with minimal effort.
- Huge standard library
- There several very good IDEs for development of Java applications.
- It is stable and portable to many different platforms.
- George already implemented its analysis there and no issues appeared.
- Many students know it.
- cons
- Implementation of some program analyses may be slower then the one in programming languages like OCaml or F#. Nevertheless, it is quite difficult to predict impact of this potentially increased programming effort.
F# (.NET/Mono)
- pros
- May provide faster implementation of some program analyses then in Java, because symbolic manipulations of particular data structures (like syntax trees, expressions, BDD,...)
- We can take a C front end from tools like CPAchecker, Stanse, Bugst, or Predator with minimal effort. It only requires to dump internal CFG representations of the tools to disc and load them into equivalent data structures syntactically expressed in F#.
- When we are in .NET we can freely use any other .NET language (C#, Visual basic, managed C++, ...). Since many students know C# and C# is similar to (or better then) Java, so from this point of view F# is better choice then Java. Moreover, there is also implemented integration of non-managed languages, but portability is then a question.
- We can move to Windows and thus develop STATOR in Visual Studio. (Visual Studio can really speed-up development (e.g. excellent debugging, edit&continue, ...)
- cons
- Development in Mono can be less comfortable. Also support of F# in Mono known to us yet. Similarly, state of standard library is not know.
- Portability mostly depends on qualities of Mono.
- Not many people knows it, but the situation improves.
OCaml
- pros
- May provide faster implementation of some program analyses then in Java, because symbolic manipulations of particular data structures (like syntax trees, expressions, BDD,...)
- We can take a C front end from tools like CPAchecker, Stanse, Bugst, or Predator with minimal effort. It only requires to dump internal CFG representations of the tools to disc and load them into equivalent data structures syntactically expressed in OCaml.
- cons
- OCaml development environments are still immature and primitive. This can slow down whole development.
- Not many people knows it, but the situation improves.
- OCaml is not that portable as Java.
C++
- pros
- There is software like PAGAI, KLEE, Predator, Bugst, Symbiotic from which we can take code. Moreover, if we use F# (i.e. .NET) then calls to native C++ libraries are no problem (they are quite nicely supported in .NET).
- Analyses can be faster by constant factor relative to managed languages (Java, F#, ...).
- Overloading of operators allows to do symbolic manipulations of expressions very brief. So we can possibly get close to F#/OCaml features. Nevertheless, it will require to actually implement those operators first.
- cons
- Standard library is definitely not as large as one of Java or .NET languages. Question is whether it is larger or not in a comparison with OCaml's one. But in any case, there are other C++ libraries available, like Boost, which make the situation not that bad.
- It is not type-safe language, i.e. there is vulnerability to more kinds of programming errors.
- Students generally do not like this language.
Perspective 2: Programs in what languages should STATOR analyse
Using dedicated C front-end
- pros
- Such front-ends are already available in tools like CPAchecker, Stanse, Predator, ...
- In CPAchecker they are implementing front-end to Java. So we can take their code, when it is ready. There might further be implemented front-ends to other languages in some other free tools. But then we could use them as external translators from their internal CFG representation into our automata files.
- cons
- Extension of STATOR to other languages will be complicated, if others (like Dirk's team) terminates development of Java front-end or we fail to find free tool which provides front-end to language we are interested in.
Using LLVM front-end
- pros
- LLVM project intends to compile several languages into LLVM byte code (C,C++, Objective C, Java,...). So, this might be very valuable for future of STATOR. But if development of STATOR will formally end in 2017. If there are no successor projects, then this feature does not matter.
- cons
- I know only one tool, Bugst, which translates LLVM into an internal control-flow graph representation. So, only this tool can be used for F# or OCaml platform for STATOR. But Bugst is not finished, so some LLVM features are not supported. Other LLVM based tools like PAGAI, KLEE do not transform LLVM into an internal CFG representation.
Perspective 3: What sort of programs we consider
There should mostly be considered benchmarks from SV_COMP. We are further considering some (probably smaller) open-source projects and some SW for drones and airplanes.
It would be really nice to try C front-ends from CPAchecker, Stanse, Bugst, etc. to few representative programs from each category.
Perspective 4: Build STATOR on top on a single freely available tool
The tools are listed in no particular order (just as they came into my mind). Did not we forget some interesting open-source tool?
CPAchecker
- pros
- According to README.txt file of CPAchecker: "CPAchecker is able to parse and analyze a large subset of (GNU)C." (But unfortunately, I did not find a list of missing features.)
- All preprocessed C files passed into the tool are linked into one big file. This is stated in README.txt file of the tool. Bit it is still unclear, whether it works as in a linker (e.i. properly are handled definitions/declarations, externa/static variables and functions, what about functions without definitions, ...), or whether they simply concatenate all the files into big one. Only the former case is valuable for us.
- George already implemented its analysis there and no issues appeared.
- There are already implemented several program analyses. So, we can try/compare with/use then with minimal effort.
- cons
- It does not provide the "new-school" control-flow graph representation. All its analyses and all internal algorithms of CPAchecker work on the "old-school" representation.
- Since the tool is written in Java, we cannot use those interesting programming features of OCaml or F#.
- The tool accepts already preprocessed C files. So, we need to handle build systems of analysed C project and the preprocessing phase by ourselves.
PAGAI
- pros
- TODO
- cons
- TODO
FramaC
- pros
- TODO
- cons
- TODO
UFO
- pros
- TODO
- cons
- TODO
Stanse
- pros
- TODO
- cons
- TODO
Bugst
- pros
- TODO
- cons
- TODO
Predator
- pros
- TODO
- cons
- TODO
KLEE
- pros
- TODO
- cons
- TODO
CBMC
- pros
- TODO
- cons
- TODO
CPROVER
- pros
- TODO
- cons
- TODO
Perspective 5: Build STATOR on top of a code collected from several different sources (tools)
Here we consider possibilities to take useful code from different open-source tools. This code may differ in implementation language. The goal here is:
- Make I list of code we would like to have (like front-end to C/LLVM, ...)
- Search open-source tools for their implementations of items in our list (we can extend the list if we find some interesting functionality).
- Prune the result s.t. variant with minimal compatibility issue (i.e. language differences and type incompatibility). For the pruning it is crucial a good estimation of amount of work which has to be done to resolve incompatibility issue (e.g. where we can use dynamic libraries, where inter-language call, what will have to be reimplemented, and for what items in the list we have nothing).
TODO: complete the section!
Open questions
How much time/programming effort can we actually save when we use languages like F# or OCaml relative to other languages like Java or C++?
- ???
How much we can actually benefit (i.e. time/programming effort savings) from the "new-school" control-flow graph representation relative to the use of the "old-school" one?
- ???
How much is important for as that CPAchecker already provides several program analyses like predicate abstraction?
- ???
How much restrictive is CPAchecker's concept of CPAs with respect to kind of program analyses we intend to implement in STATOR? (Of course, there is a possibility to circumvent their concept, but it is against tool's philosophy, i.e. it is bad.)
- ???
What is the ideal code-base (i.e. even not real/available, hypothetical) for STATOR from the point of view from members of the STATOR team?
- David: ???
- George: ???
- Marek: In this moment I cannot say, because I do not have clear view of features of our analyses. I need to read related papers first. Even such essential things like "new-school vs. new-school" control-flow graph representation is a mystery for me. So, my opinion to this question does not matter.
How many PhD students we plan to join the STATOR project till the end of 2017?
- ???