Clione is a Java implementation of a lexical parser that tokenizes source code written in C17 and other C-like programming languages.

The main facility is a tokenization API corresponding to the C preprocessor layer. It includes the features of trigraph replacement, line splicing, and tokenization but does not include macro expansion and directive handling.

Get started

Clione is available as a Maven package. To add a dependency on Clione in your build, specify the custom repository and dependency as follows:

Gradle

repositories {
    ⋮
    maven {
        url = uri('https://maroontress.github.io/maven')
    }
    ⋮
}
⋮
dependencies {
    ⋮
    implementation 'com.maroontress:clione:1.0'
    ⋮
}

Maven

<project>
  ⋮
  <repositories>
    ⋮
    <repository>
      <id>maroontress</id>
      <name>Maroontress maven repo</name>
      <url>https://maroontress.github.io/maven</url>
    </repository>
    ⋮
  </repositories>
  ⋮
  <dependencies>
    ⋮
    <dependency>
      <groupId>com.maroontress</groupId>
      <artifactId>clione</artifactId>
      <version>1.0</version>
    </dependency>
    ⋮
  </dependencies>
  ⋮
</project>

Samples

TokenDemo

TokenDemo tokenizes the following code (helloworld.c) and prints all tokens:

#include <stdio.h>

int main(void)
{
    printf("hello world\n");
}

The output is as follows:

$ java com.example.TokenDemo helloworld.c
L1:1--19: DIRECTIVE: #
| L1:2--8: DIRECTIVE_NAME: include
| L1:9: DELIMITER: ' '
| L1:10--18: STANDARD_HEADER: <stdio.h>
| L1:19: DIRECTIVE_END: '\n'
L2:1: DELIMITER: '\n'
L3:1--3: RESERVED: int
L3:4: DELIMITER: ' '
L3:5--8: IDENTIFIER: main
L3:9: PUNCTUATOR: (
L3:10--13: RESERVED: void
L3:14: PUNCTUATOR: )
L3:15: DELIMITER: '\n'
L4:1: PUNCTUATOR: {
L4:2--L5:4: DELIMITER: '\n    '
L5:5--10: IDENTIFIER: printf
L5:11: PUNCTUATOR: (
L5:12--26: STRING: "hello world\n"
L5:27: PUNCTUATOR: )
L5:28: PUNCTUATOR: ;
L5:29: DELIMITER: '\n'
L6:1: PUNCTUATOR: }
L6:2: DELIMITER: '\n'

Run TokenDemo

SourceCharDemo

SourceCharDemo tokenizes the following code (main.c) and prints all characters:

ma??/
in

char *cat = u8"🐱";

The output is as follows:

$ java com.example.SourceCharDemo main.c
L1:1--L2:2: IDENTIFIER: main
  L1:1: m
  L1:2: a
  L1:3--L2:1: i
  | L1:3--5: \
  | | L1:3: ?
  | | L1:4: ?
  | | L1:5: /
  | L1:6: '\n'
  | L2:1: i
  L2:2: n
L2:3--L3:1: DELIMITER: '\n\n'
  L2:3: '\n'
  L3:1: '\n'
L4:1--4: RESERVED: char
  L4:1: c
  L4:2: h
  L4:3: a
  L4:4: r
L4:5: DELIMITER: ' '
  L4:5:  
L4:6: OPERATOR: *
  L4:6: *
L4:7--9: IDENTIFIER: cat
  L4:7: c
  L4:8: a
  L4:9: t
L4:10: DELIMITER: ' '
  L4:10:  
L4:11: OPERATOR: =
  L4:11: =
L4:12: DELIMITER: ' '
  L4:12:  
L4:13--17: STRING: u8"🐱"
  L4:13: u
  L4:14: 8
  L4:15: "
  L4:16: H(0xd83d)
  L4:16: L(0xdc31)
  L4:17: "
L4:18: PUNCTUATOR: ;
  L4:18: ;
L4:19: DELIMITER: '\n'
  L4:19: '\n'

Run SourceCharDemo

Documents

How to contribute

Please send us pull requests or issues from the GitHub icon GitHub repository.