DATA SCIENCE PROJECT IDEAS GETTING STARTED PROJECTS 1. CREDIT CARD FRAUD DETECTION PROBLEM You are hired by XYZEE ba
312 94 230KB
DATA SCIENCE PROJECT IDEAS
GETTING STARTED PROJECTS
1. CREDIT CARD FRAUD DETECTION PROBLEM You are hired by XYZEE bank as their winter analyst, XYZEE’s current image is deteriorating due to the frauds against customers using XYZEE’s credit cards reported in the local newspaper. Before this problem becomes a global concern, the VP of the bank wants you to build a fraud detection algorithm from the given data about the transactions. Here’s a catch most transactions aren’t fraud so you’ve less 1% transactions labelled as fraud, hence you must find a way to deal with this imbalanced dataset. Use of external dataset is not allowed. You are requested to send your notebooks for evaluation.
The data can be accessed through: https://www.kaggle.com/mlgulb/creditcardfraud The evaluation metric is: Area Under Precision Recall Curve (AUPRC)
CONSULTING AND ANALYTICS CLUB
3
2. HOUSE PRICE PREDICTION PROBLEM If you’ve learnt Machine Learning from Online Courses, chances are the first problem you’ve heard, that could be resolved with Machine Learning is predicting housing prices from a variety of parameters. Well just try it for yourself that how mind boggling it gets when you’ve to take 70+ parameters into account. Use of external dataset is not allowed. The data can be accessed through: https://www.kaggle.com/c/house-pricesadvanced-regression-techniques
Your task is to submit predictions to the live leaderboard on Kaggle. The evaluation metric is: Root Mean Squared Logarithmic Error (RMSLE)
CONSULTING AND ANALYTICS CLUB
4
DATA ANALYTICS PROJECTS
3. LET’S MAKE INDIA A HAPPY NATION PROBLEM The Prime Minister Of India is worried because of the growing dissatisfaction among the citizens of country. He is looking for a Data analyst consultant to identify the key factors for the happiness for a country and device a strategy to boost the country’s rank in the World Happiness Index. You being an IITian, is his prime choice for the role of the data analyst consultant. Your all arguments must hold a strong link to data as the PM trusts data more than mere suggestions. As a Starting point you might use this data: https://www.kaggle.com/unsdsn/world-happiness We strongly recommend you use other data sources too and cite them.
CONSULTING AND ANALYTICS CLUB
6
4. WAR ON TERRORISM PROBLEM You’ve been approached by the UN Security General to counteract the more than ever rising problem of terrorism across the globe. UN wants you to identify which terrorist groups are likely to target which part of the globe so that they can build the preventive measures and deploy the squads of Counter Terrorists in Order to combat terrorism and erase it. As a Starting point you might use this data: https://www.kaggle.com/STARTUMD/gtd We strongly recommend you use other data sources too and cite them.
CONSULTING AND ANALYTICS CLUB
7
5. THE IITIAN WHO SCORES PROBLEM Your friend in the college has been desperately single throughout his/her life and they finally managed to ask someone out. Their Partner says yes to the first date but being a busy IITian their partner agrees to go into a relationship with your friend only on the basis of a 5-minute date on the New Year’s Eve. Give your friend some good advice for his/her first date based on the data you collect about dating and speed dating. Note that this statement is gender neutral hence choosing the gender of your friend and their partner is up to you. As a Starting point you might use this data: https://www.kaggle.com/annavictoria/speed-dating-experiment
We strongly recommend you use other data sources too and cite them.
CONSULTING AND ANALYTICS CLUB
8
INNOVATION PROJECTS
6. PREVENTING RAPES PROBLEM An Indian feels disgusted to live in a country where his/her sister, daughter, mother or any girl feels unsafe. The government labels the pro