Vör : A Knowledge Graph Project
In a group of similar Wikipedia topic, they tend to share a set of similar keywords. Some of them are literally the same or mean something very similar. Here I explore through the keywords collected from thousands of wikipedia articles and build a linked network.
Involving a few toys
The crawler process is simplified as it just picks up a seeding page, collects the whole content as text and stores inside MongoDB. Then it continues reading the next articles as linked from the seeding page. The process repeats until interruption.
The keyword extractor reads the downloaded Wikipedia contents and picks the potential keywords from them.
I designed a keyword extractor very naïvely. Following are the key features of what it does:
The extraction works well enough but it doesn't perfectly filter out some garbage words we human know they are.
I'm now working on topic clustering based on keywords. We'll see how it works soon in the future.
WIP : Using Word Vectors (Word2Vec)
Word2Vec is a descriptive model for text analysis which is able to identify similar words or opposite ones. In this project Word2Vec steps in to joins similar keywords in the knowledge model. The idea is simple, if Wikipedia pages share the close topics but they refer to some similar terms with different keywords, we should be able to identify them.
The idea behind this approach is to connect related keywords together so the related topics which don't share the same keywords but are semantically equivalent could have more connections between them.