Clione is a Java implementation of a lexical parser that tokenizes source code written in C17 and other C-like programming languages.
The main facility is a tokenization API corresponding to the C preprocessor layer. It includes the features of trigraph replacement, line splicing, and tokenization but does not include macro expansion and directive handling.
Get started
Clione is available as a Maven package. To add a dependency on Clione in your build, specify the custom repository and dependency as follows:
Gradle
repositories {
⋮
maven {
url = uri('https://maroontress.github.io/maven')
}
⋮
}
⋮
dependencies {
⋮
implementation 'com.maroontress:clione:1.0'
⋮
}
Maven
<project>
⋮
<repositories>
⋮
<repository>
<id>maroontress</id>
<name>Maroontress maven repo</name>
<url>https://maroontress.github.io/maven</url>
</repository>
⋮
</repositories>
⋮
<dependencies>
⋮
<dependency>
<groupId>com.maroontress</groupId>
<artifactId>clione</artifactId>
<version>1.0</version>
</dependency>
⋮
</dependencies>
⋮
</project>
Samples
TokenDemo
TokenDemo tokenizes the following code (helloworld.c
) and prints all
tokens:
#include <stdio.h>
int main(void)
{
printf("hello world\n");
}
The output is as follows:
$ java com.example.TokenDemo helloworld.c
L1:1--19: DIRECTIVE: #
| L1:2--8: DIRECTIVE_NAME: include
| L1:9: DELIMITER: ' '
| L1:10--18: STANDARD_HEADER: <stdio.h>
| L1:19: DIRECTIVE_END: '\n'
L2:1: DELIMITER: '\n'
L3:1--3: RESERVED: int
L3:4: DELIMITER: ' '
L3:5--8: IDENTIFIER: main
L3:9: PUNCTUATOR: (
L3:10--13: RESERVED: void
L3:14: PUNCTUATOR: )
L3:15: DELIMITER: '\n'
L4:1: PUNCTUATOR: {
L4:2--L5:4: DELIMITER: '\n '
L5:5--10: IDENTIFIER: printf
L5:11: PUNCTUATOR: (
L5:12--26: STRING: "hello world\n"
L5:27: PUNCTUATOR: )
L5:28: PUNCTUATOR: ;
L5:29: DELIMITER: '\n'
L6:1: PUNCTUATOR: }
L6:2: DELIMITER: '\n'
SourceCharDemo
SourceCharDemo tokenizes the following code (main.c
) and prints all
characters:
ma??/
in
char *cat = u8"🐱";
The output is as follows:
$ java com.example.SourceCharDemo main.c
L1:1--L2:2: IDENTIFIER: main
L1:1: m
L1:2: a
L1:3--L2:1: i
| L1:3--5: \
| | L1:3: ?
| | L1:4: ?
| | L1:5: /
| L1:6: '\n'
| L2:1: i
L2:2: n
L2:3--L3:1: DELIMITER: '\n\n'
L2:3: '\n'
L3:1: '\n'
L4:1--4: RESERVED: char
L4:1: c
L4:2: h
L4:3: a
L4:4: r
L4:5: DELIMITER: ' '
L4:5:
L4:6: OPERATOR: *
L4:6: *
L4:7--9: IDENTIFIER: cat
L4:7: c
L4:8: a
L4:9: t
L4:10: DELIMITER: ' '
L4:10:
L4:11: OPERATOR: =
L4:11: =
L4:12: DELIMITER: ' '
L4:12:
L4:13--17: STRING: u8"🐱"
L4:13: u
L4:14: 8
L4:15: "
L4:16: H(0xd83d)
L4:16: L(0xdc31)
L4:17: "
L4:18: PUNCTUATOR: ;
L4:18: ;
L4:19: DELIMITER: '\n'
L4:19: '\n'
Documents
How to contribute
Please send us pull requests or issues from the GitHub repository.