Skip to content

Conversation

@nilesh-c
Copy link
Member

Haven't tested this on sample data yet, mustn't merge. @chile12 could you check this?

@nilesh-c
Copy link
Member Author

Sorry, still got lots of build errors.

@nilesh-c
Copy link
Member Author

@chile12 could you test this out with a few languages? Maybe start up all the mappings langs with nohup. Here's the command I use:

../run NormalizeDatasets ../extraction.default.properties wikidata wikidata-sameas .ttl.bz2 page-links,infobox-properties -normalized

Specify your base-dir in the config file. wikidata is the language suffix for wikidatawiki, wikidata-sameas is the name of the dataset. Add all the datasets you want to normalize separated by commas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add wiki-code in the filename

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the script is re-started with another language withouth deleting the file it will produce false results

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about the cache file? wikiFinder is instantiated with the baseDir and language - so for enwiki it creates the file enwiki-YYYYMMDD-sameas-mappings.obj in the respective directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect, you save them on the lang folder not the wikidata one, I got confused

@jimkont jimkont added this to the 2015-10 milestone Jun 1, 2015
@jimkont jimkont modified the milestones: 2015-10, 2016-04 Jul 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants