awesome-cxx-parsers

0

Описание

An overview of C and C++ parsers available for Kotlin/JVM and Kotlin/MP

Языки

JavaScript

  • C++
Сообщить о нарушении
2 года назад
2 года назад
2 года назад
2 года назад
2 года назад
2 года назад
README.md

Awesome C++ Parsers

An overview of C and C++ parsers available for Kotlin/JVM and Kotlin/MP

Clang

CLI (
clang
)

  • Kotlin/JVM ✔, Kotlin/MP ✔
  • Available since at least Clang version 11.
  • Rust bindings

Overview

Getting an AST tree is as easy as

clang++ -fsyntax-only -Xclang -ast-dump=json file.cc >file.json

The AST can also be dumped in binary form, to be later on consumed and parsed by

(example):

# Will produce file.ast
clang++ -emit-ast file.cc

Example

Here's a JSON output for a sample C++ class.

Field access on line 8 is correctly recognized as a

(reference).

Notably, GCC-style mangled C++ symbol names are also stored in the JSON:

{
"id": "0x8001988b0",
"kind": "ParmVarDecl",
"loc": {
"offset": 36,
"line": 3,
"col": 15,
"tokLen": 11
},
"range": {
"begin": {
"offset": 32,
"col": 11,
"tokLen": 3
},
"end": {
"offset": 36,
"col": 15,
"tokLen": 11
}
},
"isUsed": true,
"name": "shapeHeight",
"mangledName": "_ZZN5ShapeC1EiE11shapeHeight",
"type": {
"qualType": "int"
}
}

Here,

_ZZN5ShapeC1EiE11shapeHeight
is a mangled name which corresponds to
Shape::Shape(int)::shapeHeight
:

$ echo '_ZZN5ShapeC1EiE11shapeHeight' | c++filt
Shape::Shape(int)::shapeHeight

ParmVarDecl
specifies a JSON-serialized instance of
clang::ParmVarDecl
class from the C++ API.

clangd

Overview

  • Sample response JSON response from

    clangd
    .

  • clangd
    tool window in CLion:

Used by

libclang
: C Interface to Clang

Overview

  • libclang
    has its limitations and doesn't expose the entire AST. Read this for further details.
  • libclang
    tokens are not preprocessed.

Examples

Used by

LibTooling
: C++ Interface to Clang

  • No Kotlin/JVM ❌, no Kotlin/MP ❌ (unless wrappers are written manually)
  • C++ API

tree-sitter

Overview

  • Kotlin/JVM ✔ available via Java or Kotlin bindings
  • Kotlin/MP ✔ possible by wrapping a native platform binary
  • Kotlin/JS ✔ possible by wrapping
    tree-sitter.js
  • Has built-in support for its own query language

Initially it was designed by GitHub for their Atom IDE which was sunset to be replaced by CodeHub/VS Code. Designed to tokenize code with focus on tracking changes in a file.

But bindings look like a very alpha (checked Java and Kotlin).

According to the

and articles: looks like
tree-sitter
is targeted at Web:
Web Tree-sitter
/
WASM
is part of the main library: https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web.

Additionally, it's used as a plugin in popular editors and actively supported: Neovim and Emacs

Limitations

The parser of tree-sitter doesn't always track context, so, using the following C++ source code,

class Shape {
public:
Shape(int shapeHeight) {
height = shapeHeight;
}
int getHeight() {
return height;
}
private:
int height;
};

— it incorrectly recognizes field access as an

identifier
(i.e. local variable) and not a
field_identifier
(for the code to be parsed correctly, one should change
height
this.height
). The same problem affects even simpler-to-parse languages, such as Java and Kotlin.

For example, both field access and local variable access are erroneously recognized as

simple_identifier
:

class Shape(private val height: Int) {
fun getHeight(): Int {
return height
}
}

Another example is inability to differentiate between a function call and a macro call in C and C++:

static void f1() {
// ...
}
static void f2() {
// ...
}
#define f1() f2()
void g() {
// Calls `f2()`, not `f1()`.
f1();
}

PoC

Tried to create a POC using tree-sitter. Run on WSL\Linux.

  • Java bindings: Here is a branch: https://github.com/saveourtool/save-cloud/compare/master...feature/java-tree-sitter

    It fails with error in C code:

    15:02:06.519 [main] INFO c.s.save.demo.cpg.SaveDemoCpgKt - Started SaveDemoCpgKt in 49.612 seconds (JVM running for 63.49)
    15:03:04.447 [boundedElastic-1] INFO c.s.s.d.cpg.controller.CpgController - Created a file with sources: demo.java
    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    # SIGSEGV (0xb) at pc=0x00007f547c49bd3d, pid=2570, tid=2727
    #
    # JRE version: OpenJDK Runtime Environment (17.0.3+7) (build 17.0.3+7-Ubuntu-0ubuntu0.22.04.1)
    # Java VM: OpenJDK 64-Bit Server VM (17.0.3+7-Ubuntu-0ubuntu0.22.04.1, mixed mode, emulated-client, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
    # Problematic frame:
    # C [libjava-tree-sitter.so+0x2bd3d] ts_tree_root_node+0xd
    #
    # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /mnt/d/projects/save-cloud/hs_err_pid2570.log
    #
    # If you would like to submit a bug report, please visit:
    # Unknown
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    Aborted
  • Kotlin bindings: Here is a branch: https://github.com/saveourtool/save-cloud/compare/master...feature/kotlintree

    It does work on Ubuntu:

  • Custom grammar

Implementations

  • C++ Grammar (45 🍴, 156 ⭐)
    • Portable: can be built with at least
      g++
      10 and
      clang++
      13 (and, probably, earlier versions).
  • Kotlin bindings (2 🍴, 28 ⭐)
    • Uses JNA.
    • The native library has to be manually built first (requires
      clang++
      ).
    • The project wraps
      libtree-sitter.{so,dylib}
      (0.20.1) and
      libtree-sitter-cpp.{so,dylib}
      (0.19.0) via JNA.
    • Linux and Mac OS X only.
  • Java bindings (19 🍴, 61 ⭐)
    • Uses JNI.
    • The native library has to be manually built first.
    • Be sure to clone with
      --recurse-submodules
      .
    • Can't be built using JDK 17 due to outdated Gradle.
  • Playground (C, C++ and other languages)
  • Playground (Kotlin grammar, unofficial)

CodeQL

  • CodeQL CLI is closed-source ❌
  • CodeQL libraries and queries are open-source, MIT-licensed ✔
  • For compiled languages like C++, it's necessary to build the project first in order to create a CodeQL database.
  • C and C++ status:
    • C++20 support is currently in beta. Supported for GCC on Linux only. Modules are not supported.
    • Clang (and
      clang-cl
      ) extensions (up to Clang 12.0)
      • Support for the
        clang-cl
        compiler is preliminary.
    • GNU extensions (up to GCC 11.1)
    • Microsoft extensions (up to VS 2019)
    • Arm Compiler 5
      • Support for the Arm Compiler (
        armcc
        ) is preliminary.
  • Visual Studio Code integration

Eclipse CDT

  • Kotlin/JVM ✔ only, no Kotlin/MP ❌

Used by

foonathan/cppast

  • 157 🍴, 1.5k ⭐
  • No Kotlin/JVM ❌, no Kotlin/MP ❌ (unless wrappers are written manually)
  • C++ API
  • Was built in response to
    libclang
    limitations

Использование cookies

Мы используем файлы cookie в соответствии с Политикой конфиденциальности и Политикой использования cookies.

Нажимая кнопку «Принимаю», Вы даете АО «СберТех» согласие на обработку Ваших персональных данных в целях совершенствования нашего веб-сайта и Сервиса GitVerse, а также повышения удобства их использования.

Запретить использование cookies Вы можете самостоятельно в настройках Вашего браузера.