Saturday, December 22, 2007

TamilNews - The search engine for regional news

In this Project, we have developed a search engine for searching the Tamil news items collected from popular Tamil news sites. Our work follows the behaviour of standard web search engines like Crawling, Indexing and Searching.

Based on the centralized architecture, our search engine crawls the news item for every hour and performs the indexing of text over crawled data by choosing the appropriate scan area and omits other noisy data like advertisements, unrelated URL redirections (link filtering). Then the indexer uses standard scoring on texts documents and classifies them under different categories. Then two types of the graphical user interface is provided to make the process of giving input in English and Tamil - the Famous South Indian Regional language easy. The searcher follows morphological analysis with keywords using the dictionary based approach and supports cross language reference. In this project, UTF-8 format is utilized for multi-language support.