Ninka is license identification tool that identifies the license(s)
under which a given source file is made available.
This tool uses a source file as input and outputs the licenses
identified within that file.
If you need to know the detail of Ninka, please see the following paper:
Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching
method for automatic license identification of source code files. In
25nd IEEE/ACM International Conference on Automated Software
Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or
download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
If you use Ninka for research purposes, we would appreciate you cite
the above paper.
Contributors
Paul Clough for his code to split sentences
Anthony Kohan for writing the excel and sqlite backends
Armijn Hemel from Tjaldur Software Governance Solutions for multiple bug reports and suggestions
René Scheibe for modularizing the code
License
Ninka is licensed under the GPLv2+:
Copyright (C) 2009-2014 Yuki Manabe and Daniel M. German
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence
splitter script by Paul Paul Clough.
comments is based on a program to remove comments by Jon Newman.
So Ninka detects all the sentences, including the MIT variant, it
finds the GPL bsd intention. But the license is not really BSD.
The disclaimers are not what you expect. Now, in all fairness, maybe
this is another license.
Let me translate the output for you:
file: eq.c;
License(s) found: MITX11noNotice
;1;2;2;6;0;
Found 1 license
Composed of 2 lines (tokens)
2 tokens were ignored
6 tokens were not mached: Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary (-1 indicates where a match happened)
0 tokens were unknown
License matched:MPLv1_1;
One license: 1;
Composed of one token: 1;
3 token were ignored 3;
7 tokens were matched but not recognized as a license: UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd
2 of those tokens were unknown
Contact information
Any feedback will be appreciated. You can email us at Daniel M. German dmg@uvic.ca and Yuki Manabe y-manabe@ist.osaka-u.ac.jp
Introduction
Ninka is license identification tool that identifies the license(s) under which a given source file is made available.
This tool uses a source file as input and outputs the licenses identified within that file.
If you need to know the detail of Ninka, please see the following paper:
Daniel M. German, Yuki Manabe and Katsuro Inoue. A sentence-matching method for automatic license identification of source code files. In 25nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2010). You can email me (dmg@uvic.ca) for a copy or download it from http://turingmachine.org/~dmg/papers/dmg2010ninka.pdf.
If you use Ninka for research purposes, we would appreciate you cite the above paper.
Contributors
License
Ninka is licensed under the GPLv2+:
Ninka::SentenceExtraxtor is a derivative work of the rule-based sentence splitter script by Paul Paul Clough.
comments is based on a program to remove comments by Jon Newman.
Requirements
How to install
Usage
Available options:
Example:
It will create five files:
The files are not required for Ninka’s functionality. But they can help to debug license detection issues.
Ninka model
Ninka uses a pipe-model. Each stage of the pipe does something very specific:
Comment extractor
Module: Ninka::CommentExtractor
Purpose: Extracts top comments of source code.
Output:.comments
Split sentences in comments
Module: Ninka::SentenceExtractor
Purpose: Ninka works by matching sentences of licenses,
Output:.sentences
Filter “good” sentences
Module: Ninka::SentenceFilter
Purpose: Some sentences are related to a license, some are not.
Output:.goodsent and .badsent
Tokenize sentences
Module: Ninka::SentenceTokenizer
Purpose: It creates a file that corresponds to the recognized sentence tokens.
Output:.senttok
Match sentences to licenses
Module: Ninka::LicenseMatcher
Purpose: It looks at the sentence tokens and outputs the licenses found.
Output:.license
The script ninka takes care of all these steps, and optionally creates intermediary files, and writes to the stdout the licenses found.
How to read the output:
Assume, for example, this output:
So Ninka detects all the sentences, including the MIT variant, it finds the GPL bsd intention. But the license is not really BSD.
The disclaimers are not what you expect. Now, in all fairness, maybe this is another license.
Let me translate the output for you:
file: eq.c; License(s) found: MITX11noNotice
;1;2;2;6;0; Found 1 license Composed of 2 lines (tokens) 2 tokens were ignored 6 tokens were not mached: Copyright,-1,-1,DualLicenseIntention,GPLorOpenBSDTypeVer2,BSDpre,BSDcondSource,BSDcondBinary (-1 indicates where a match happened) 0 tokens were unknown
Another example:
nsAccessibilityUtils.cpp;MPLv1_1;1;1;3;7;2;UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd
License matched:MPLv1_1; One license: 1; Composed of one token: 1; 3 token were ignored 3; 7 tokens were matched but not recognized as a license: UNKNOWN,MPL1_1_GPL2_LGPL2_1intentionVer0,1,-1,-1,MPLsee,Copyright,-1,Altern,UNKNOWN,MPLoptionNOTGPLVer0,MPLoptionIfNotDelete3licsVer0,licenseBlockEnd 2 of those tokens were unknown