BestBET: Project Proposal

Group Members

Abdul Salama
Neha Gaur
Isamu Mar
Saptarshi Bhattacharya
Yamir Godil

Problem

It is difficult for new investors to interpret the vast amount of data about a single stock. Users are often at a loss about which companies to invest in and how to make wise and profitable decisions.

Goal: Our website will offer suggestions to users about which companies to invest in and also offer more information about how well companies are performing in the financial world. Our website will give the investors our take on the current data available.

Artifact: Website that facilitates stock searches and lists our top rated stocks.

Architectural Diagram

Project Development

Database Design: Storing gathered data. (2 weeks) Abdul & Yamir

Company Information (Company name, ticker, meta data we want to add)
Crawled Article (URL, Company, Timestamp)
Training Data - Multiple tables may be (Sector, Article, word, count, weights)

Crawler (4 weeks) - Abdul & Neha

Crawler to pull all articles from reputed websites (May be CNN, BBC, NY Times, CA Times etc)

Classifiers (2 weeks) - Saptarshi & Isamu

Sector Classifier given an article
Confidence Classifier: classifying given article into a confidence level

Information Extractors(1 week) - Isamu & Yamir

Extract Timestamp given an article
Extract Company Name given an article
Extract Sector given an article

User Interface (1 week) - Isamu & Yamir

Simple UI to assist users in making wise decisions from our rating predictions

Milestones

- Oct 29:

Complete Architecture Design.
Team members working on their respective chunks of the work (ramping up with terminology, technology, part of the project, etc)
Completing Database setup
Achieve familiarity with Classifier Algorithms

- Nov 17: Complete Backend (crawler, classifiers, database)

Implemented Classifiers
Working Crawler
Completed Database Design/Setup

- Dec 3:

User Interface Completion.
Integration between front-end and back-end.
Start testing and report writing

- Dec 14: Present report and project!

Machine Learning

We will use machine learning for document classification. Machine learning will be used to assign a confidence level to each article, where the confidence level is directly proportional to the article’s prediction of the stock doing well in the near future.

We will start by reading manually a small set of financial articles and use them as our initial training data. To expand our training set, we will use our crawler to fetch more financial articles written in the past. These articles are going to be fed into our machine learning model to predict confidence level. The predicted confidence level for each article will be compared with the past performance of the corresponding stocks, with respect to the timestamp of the article. If the prediction and the performance are in agreement, we add the article in our training data set. This way, we can use old articles to grow our training set to a size which will be big enough to facilitate our prediction of current articles.

Evaluation of Success

It is difficult to measure our level of success by how well we can predict the actual stock prices, because this is a difficult problem that many companies devote huge amounts of resources to.

Instead we will measure our success by how accurate our news classifications are.

We can measure this by randomly looking at classified articles by hand and make sure our classification matches the one we interpret, which can give us a sample precision and recall for our classifications.