Tool to detect addresses via machine learning

You can post your problem related to Computer Software Projects here. We will try our best to help you out.
Post Reply
Vazquez
Posts: 31
Joined: Mon Mar 06, 2017 8:50 am

Tool to detect addresses via machine learning

Post by Vazquez » Thu Mar 09, 2017 8:59 am

I'm currently developing a tool aiming to detect addresses (or any pattern, like job, sport team or anything) in a text.

So what I'm currently doing:

1/ Splitting the text in words 2/ Stemming the words

Users can create categories (job, sport team, address...) and will manually assign a sentence to a category.

Each stemmed word of this sentence will be stored in DB, with an updated score (+1)

When I will browse a new document, I will compute for each sentence the score thanks to all words in it.

Example:

I live in Brown Street, in London

=> (live+1, Brown +1, Street+1, London+1)

Then next time I see

I live in Orange Street, in London The score will be 3 (live +1, Street+1, London+1) so I can say "this sentence might be an address". If user validates, I update the words (live+1, orange+1, street+1, london+1). If he says "inaccurate", all words will be downvoted.

I think with more runs, I will be able to detect addresses since "Street" and "London" will have a large score (same for zip code etc)

My question is:

First, what do you think about this approach? Secondly, context is just ignored with this approach. A sentence with Street & London should have a better score. It means if I detect Street & London in the same sentence, we can likely say it's an address.

How can I store this information in a database? I'm currently using a relational database (MySQL), but I'm afraid the size will become huge if I store the link between each word.

Is it what we call a neural network? What is the best way to store it?

Do you have any tips to upgrade my detection algorithm?

Post Reply