目录
目录README.md

this project is mainly used to get the github software recommendation of stackoverflow users

to run this project, firstly you need to install JDK1.7+

in folder rec, use “mvn assembly:assembly” to compile this project

use index_main.sh filter_projects.sh main.sh to run each subprogram

The project mainly consists of five parts

  1. the sourceDao(com.ow2.rec.sourceDao) part is the operation of source database, which includes:
  1. ProjectDao: this is the operation of Github projects
  2. SOFFilterSourceDao: this is the operation of filtering the stackoverflow users
  3. TagDao: this is the operation of stackoverflow tags
  4. UserTagDao: this is the operation of the middle result table, which names user_tags
  1. the targetDao(com.ow2.rec.targetDao) part is the operation of target database, which includes:
  1. MatchResultDao: this is the operation of processing the match result of stackoverflow users and github projects
  2. SOFFilterTargetDao: this is the operation of storing stackoverflow users after filtering
  1. the lucene(com.ow2.rec.lucene) part includes all the operations relating to lucene, which includes:
  1. LuceneIndex: create index
  2. LuceneSearch: search projects using lucene index
  1. the main(com.ow2.rec.main) part includes all the entrances of “main” functions, which includes:
  1. FilterUsers: this is the entrance which used to filter stackoverflow users(those who is not that active in stackoverflow)
  2. IndexMain: this is the entrance of creating lucene index
  3. Main: this is the entrance of creating queries and calculate relation scores between stackoverflow users and github projects
  4. Match: this is auxiliary class for Main class
  5. GetGreatPrj: this is used to get projects with high activeness after getting the match result
  1. the model(com.ow2.rec.model) part includes all the models used in the whole process:
  1. MatchItem: this is the model corresponding to the match result
  2. Project: this model is corresponding to Github projects
  3. Tag: this model is corresponding to the stackoverflow Tags table
  4. User: this model is corresponding to the stackoverflow Users table
  5. UserTag: this model is correponding to the middle result table called user_tags, which is used to store the relation between stackoverflow users and their related tags
  1. the util(com.ow2.rec.util) part is the auxiliary part:
  1. Normalizer: this class is used to normalize the data that we used in the process, which includes the operation of tag, string, number and etc
  2. SimilarityCounter: this class is used to calculate the similarity between two texts

to run the project, we need the following database tables:

  1. user_tags1~user_tags12: these tables are used to store the statistical middle result of stackoverflow users ,posts and tags structure: Id, UserId, PostId, Tag, Updatetime
  2. Users: this is the users table of stackoverflow
  3. user_tags: this table is used to store the stackoverflow user id and their related tags structure: Id, UserId, AllTags(eg:<file-io,java>)
  4. watchers: this is the watchers table of github
  5. stackoverflow_users: this table is used to store users of stackoverflow after filtering structure: same structure as Users
  6. projects: this is the projects table of github

the database we used in this project is ow2(source database) and ossean_production(target database). If you want to change the name, modify applicationContext.xml applicationContext_mybatis.xml in bin/resources

the demo we will present in http://ossean.trustie.net/softwarerec

邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

©Copyright 2023 CCF 开源发展委员会
Powered by Trustie& IntelliDE 京ICP备13000930号