README.md 4.12 KB
Newer Older
Noric Couderc's avatar
Noric Couderc committed
1 2 3 4 5 6 7 8
# JBrainy

This project is a re-implementation of the paper [Brainy: effective selection of
data structures](https://dl.acm.org/citation.cfm?id=1993509) by Jung et al.

The goal of this tool is to generate data sets for training neural networks,
based on benchmarks of randomly generated applications.

Noric Couderc's avatar
Noric Couderc committed
9 10 11 12
## Requirements

- Gradle 5.0+

Noric Couderc's avatar
Noric Couderc committed
13 14 15 16 17 18 19 20 21
## Structure

- `src/jmh`: Source code for benchmarking using JMH and generate benchmark
  report files.
- `src/main`: Source code for implementing data gathering using hardware performance
  counters (and the PAPI library).
    - `src/main/java/`: Application generation
    - `src/main/kotlin/`: Data gathering with PAPI library.
- `src/test`: Unit tests
Noric Couderc's avatar
Noric Couderc committed
22
- `papi-java`: Submodule with Java bindings to the PAPI library.
Noric Couderc's avatar
Noric Couderc committed
23 24 25

## Tasks

Noric Couderc's avatar
Noric Couderc committed
26
The build tool is gradle, the main tasks are:
Noric Couderc's avatar
Noric Couderc committed
27

Noric Couderc's avatar
Noric Couderc committed
28 29 30
- `gradle` with `jmhTest` or `jmhFull`: Generates applications and runs a JMH-based benchmark to generate a file `jmh-results.csv`.
  - `jmhTest` for a small set of benchmarks (for testing)
  - `jmhFull` for a large set of benchmarks (may take several days to run)
Noric Couderc's avatar
Noric Couderc committed
31
- `gradle run`: Generates applications and gathers performance counters
Noric Couderc's avatar
Noric Couderc committed
32
  - `gradle run --args="--help"`: To get the help
Noric Couderc's avatar
Noric Couderc committed
33 34
- `gradle test`: Runs the unit tests.

35 36 37 38 39 40 41
## Troubleshooting

If the `papi-java` directory is empty after clone

    git submodule update --recursive --init


42 43 44 45 46 47 48 49
## Python environment

To set up the python enviroment to run the python scripts:

    mkdir env
    virtualenv env/
    source env/bin/activate.csh
    pip install -r requirements.txt
50 51 52

## How to train the JBrainy classifier

Noric Couderc's avatar
Noric Couderc committed
53 54 55 56 57 58
If you're lazy, you can run the script `jmh-and-train.sh`.

Otherwise, keep reading.

### More about training

59 60 61
To train, you need three things:

- JMH benchmark data
Noric Couderc's avatar
Noric Couderc committed
62
- Feature vectors data
63

Noric Couderc's avatar
Noric Couderc committed
64 65
### Getting JMH Data
You need to run JMH to get the times that benchmarks take.
66 67 68 69

    ./gradlew jmhFull

By default it creates a file called `jmh-result.csv`.
70
It's contents will look like this:
71

72 73 74 75 76 77 78
    "Benchmark","Mode","Threads","Samples","Score","Score Error (99.9%)","Unit","Param: applicationSize","Param: baseStructureSize","Param: datastructureName","Param: methodSelectionStrategyId","Param: seed"
    "se.lth.cs.jmh.ListApplicationBenchmark.ListApplicationBenchmark","thrpt",1,4,3.935889,2.044457,"ops/ms",100,100,LinkedList,UNIFORM,1
    "se.lth.cs.jmh.ListApplicationBenchmark.ListApplicationBenchmark","thrpt",1,4,3.548889,0.672085,"ops/ms",100,100,LinkedList,UNIFORM,2
    "se.lth.cs.jmh.ListApplicationBenchmark.ListApplicationBenchmark","thrpt",1,4,3.615321,0.703596,"ops/ms",100,100,LinkedList,UNIFORM,3
    "se.lth.cs.jmh.ListApplicationBenchmark.ListApplicationBenchmark","thrpt",1,4,4.604959,2.965161,"ops/ms",100,100,LinkedList,UNIFORM,4
    "se.lth.cs.jmh.ListApplicationBenchmark.ListApplicationBenchmark","thrpt",1,4,5.095096,3.784726,"ops/ms",100,100,LinkedList,UNIFORM,5
    ...
79

80
### Feature vectors
Noric Couderc's avatar
Noric Couderc committed
81
To get the feature vectors, you may run:
82

83 84 85
    ./gradlew run --args="-i jmh-result.csv --input-type JMH --normalize-features -o <features-out-file> -ot FEATURE-VECTORS"


Noric Couderc's avatar
Noric Couderc committed
86
The content of the resulting file looks like this:
87

Noric Couderc's avatar
Noric Couderc committed
88 89 90 91 92 93 94 95 96 97 98
    benchmark_id,feature,feature_type,value
    Synth:UNIFORM:1:100:List:0:LinkedList,collection,collection,LinkedList
    Synth:UNIFORM:1:100:List:0:LinkedList,addAll(java.util.Collection),software,0.02
    Synth:UNIFORM:1:100:List:0:LinkedList,"add(int,java.lang.Object)",software,0.06
    Synth:UNIFORM:1:100:List:0:LinkedList,size(),software,0.01
    Synth:UNIFORM:1:100:List:0:LinkedList,lastIndexOf(java.lang.Object),software,0.04
    Synth:UNIFORM:1:100:List:0:LinkedList,"addAll(int,java.util.Collection)",software,0.05
    Synth:UNIFORM:1:100:List:0:LinkedList,toArray(java.lang.Object[]),software,0.02
    Synth:UNIFORM:1:100:List:0:LinkedList,retainAll(java.util.Collection),software,0.04
    Synth:UNIFORM:1:100:List:0:LinkedList,equals(java.lang.Object),software,0.02
    Synth:UNIFORM:1:100:List:0:LinkedList,listIterator(),software,0.01
99 100

### Training the classifier
101 102
Once you got the features, you train the classifer using the `train_model.py` script (don't forget to install the requirements first!)

103
    python train_model.py <jmh-results>  <features-out-file> <software-output-file>