awesome-cxx-parsers
Описание
An overview of C and C++ parsers available for Kotlin/JVM and Kotlin/MP
Языки
JavaScript
- C++
Awesome C++ Parsers
An overview of C and C++ parsers available for Kotlin/JVM and Kotlin/MP
Clang
CLI (clang
)
- Kotlin/JVM ✔, Kotlin/MP ✔
- Available since at least Clang version 11.
- Rust bindings
Overview
Getting an AST tree is as easy as
clang++ -fsyntax-only -Xclang -ast-dump=json file.cc >file.json
The AST can also be dumped in binary form, to be later on consumed and parsed by
(example):
# Will produce file.astclang++ -emit-ast file.cc
Example
Here's a JSON output for a sample C++ class.
Field access on line 8 is correctly recognized as a
(reference).
Notably, GCC-style mangled C++ symbol names are also stored in the JSON:
{ "id": "0x8001988b0", "kind": "ParmVarDecl", "loc": { "offset": 36, "line": 3, "col": 15, "tokLen": 11 }, "range": { "begin": { "offset": 32, "col": 11, "tokLen": 3 }, "end": { "offset": 36, "col": 15, "tokLen": 11 } }, "isUsed": true, "name": "shapeHeight", "mangledName": "_ZZN5ShapeC1EiE11shapeHeight", "type": { "qualType": "int" }}
Here,
is a mangled name which corresponds to
:
$ echo '_ZZN5ShapeC1EiE11shapeHeight' | c++filtShape::Shape(int)::shapeHeight
specifies a JSON-serialized instance of
class from the C++ API.
clangd
- Kotlin/JVM ✔, Kotlin/MP ✔
- Supports LSP 3.17 plus extensions
- Can provide
, which, in turn, can be used to accessclang::clangd::ParsedASTclang::ASTContext - Features
Overview
-
Sample response JSON response from
.clangd -
tool window in CLion:clangd
Used by
libclang
: C Interface to Clang
- Java bindings to Clang version 15 are available via JavaCPP Presets (Kotlin/JVM ✔, Kotlin/MP ❌).
- API reference
Overview
has its limitations and doesn't expose the entire AST. Read this for further details.libclang
tokens are not preprocessed.libclang
Examples
- Baby steps with libclang: Walking an abstract syntax tree
- Using libclang to Parse C++ (aka libclang 101)
- https://github.com/bytedeco/javacpp-presets/tree/master/llvm/samples/clang
Used by
LibTooling
: C++ Interface to Clang
- No Kotlin/JVM ❌, no Kotlin/MP ❌ (unless wrappers are written manually)
- C++ API
Links
- LibTooling
- Tutorial for building tools using LibTooling and LibASTMatchers
- https://opensource.apple.com/source/clang/clang-425.0.24/src/tools/clang/docs/LibTooling.html
tree-sitter
Overview
- Kotlin/JVM ✔ available via Java or Kotlin bindings
- Kotlin/MP ✔ possible by wrapping a native platform binary
- Kotlin/JS ✔ possible by wrapping tree-sitter.js
- Has built-in support for its own query language
Initially it was designed by GitHub for their Atom IDE which was sunset to be replaced by CodeHub/VS Code. Designed to tokenize code with focus on tracking changes in a file.
-
Articles:
-
Activity of the project: Project contains two parts: parser (specification) and binding (api). Specification are hosted under the main GitHub organization tree-sitter and looks like it's being updated (checked java/cpp).
But bindings look like a very alpha (checked Java and Kotlin).
According to the
and articles: looks like
is targeted at Web:
/
is part of the main library:
https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web.
Additionally, it's used as a plugin in popular editors and actively supported: Neovim and Emacs
Limitations
The parser of tree-sitter doesn't always track context, so, using the following C++ source code,
class Shape {public: Shape(int shapeHeight) { height = shapeHeight; }
int getHeight() { return height; }
private: int height;};
—
it incorrectly recognizes field access as an
(i.e. local variable) and not a
(for the code to be parsed correctly,
one should change
→
).
The same problem affects even simpler-to-parse languages,
such as Java and Kotlin.
For example,
both field access and local variable access are erroneously recognized as
:
class Shape(private val height: Int) { fun getHeight(): Int { return height }}
Another example is inability to differentiate between a function call and a macro call in C and C++:
static void f1() { // ...}
static void f2() { // ...}
#define f1() f2()
void g() { // Calls `f2()`, not `f1()`. f1();}
PoC
Tried to create a POC using tree-sitter. Run on WSL\Linux.
-
Java bindings: Here is a branch: https://github.com/saveourtool/save-cloud/compare/master...feature/java-tree-sitter
It fails with error in C code:
15:02:06.519 [main] INFO c.s.save.demo.cpg.SaveDemoCpgKt - Started SaveDemoCpgKt in 49.612 seconds (JVM running for 63.49)15:03:04.447 [boundedElastic-1] INFO c.s.s.d.cpg.controller.CpgController - Created a file with sources: demo.java## A fatal error has been detected by the Java Runtime Environment:## SIGSEGV (0xb) at pc=0x00007f547c49bd3d, pid=2570, tid=2727## JRE version: OpenJDK Runtime Environment (17.0.3+7) (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)# Java VM: OpenJDK 64-Bit Server VM (17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, emulated-client, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)# Problematic frame:# C [libjava-tree-sitter.so+0x2bd3d] ts_tree_root_node+0xd## No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again## An error report file with more information is saved as:# /mnt/d/projects/save-cloud/hs_err_pid2570.log## If you would like to submit a bug report, please visit:# Unknown# The crash happened outside the Java Virtual Machine in native code.# See problematic frame for where to report the bug.#Aborted -
Kotlin bindings: Here is a branch: https://github.com/saveourtool/save-cloud/compare/master...feature/kotlintree
It does work on Ubuntu:
Implementations
- C++ Grammar (45 🍴, 156 ⭐)
- Portable: can be built with at least
10 andg++
13 (and, probably, earlier versions).clang++
- Portable: can be built with at least
- Kotlin bindings (2 🍴, 28 ⭐)
- Uses JNA.
- The native library has to be manually built first (requires
).clang++ - The project wraps
(0.20.1) andlibtree-sitter.{so,dylib}
(0.19.0) via JNA.libtree-sitter-cpp.{so,dylib} - Linux and Mac OS X only.
- Java bindings (19 🍴, 61 ⭐)
- Uses JNI.
- The native library has to be manually built first.
- Be sure to clone with
.--recurse-submodules - Can't be built using JDK 17 due to outdated Gradle.
- Playground (C, C++ and other languages)
- Playground (Kotlin grammar, unofficial)
CodeQL
- CodeQL CLI is closed-source ❌
- CodeQL libraries and queries are open-source, MIT-licensed ✔
- For compiled languages like C++, it's necessary to build the project first in order to create a CodeQL database.
- C and C++ status:
- C++20 support is currently in beta. Supported for GCC on Linux only. Modules are not supported.
- Clang (and
) extensions (up to Clang 12.0)clang-cl- Support for the
compiler is preliminary.clang-cl
- Support for the
- GNU extensions (up to GCC 11.1)
- Microsoft extensions (up to VS 2019)
- Arm Compiler 5
- Support for the Arm Compiler (
) is preliminary.armcc
- Support for the Arm Compiler (
- Visual Studio Code integration
Links
- Analyzing your projects
- Exploring the structure of your source code
- Exploring data flow with path queries
- CodeQL for C and C++
- Writing CodeQL queries
- CodeQL CLI manual
Eclipse CDT
- Kotlin/JVM ✔ only, no Kotlin/MP ❌
Used by
foonathan/cppast
- 157 🍴, 1.5k ⭐
- No Kotlin/JVM ❌, no Kotlin/MP ❌ (unless wrappers are written manually)
- C++ API
- Was built in response to
limitationslibclang