VINEET GANGWAR


Portfolio


Computer Vision - Scalable Motion Detection Camera system using Raspberry Pi

Python, JavaScript,Raspberry Pi, Scikit-Learn, OpenCV, TensorFlow, BackgroundSubtractorMOG2, Canny Edge Detector, Python-Flask, AWS S3, AWS DynamoDB

Used Python's Multiprocessing and Queues to run 6 processes on Raspberry Pi. The processes used OpenCV library’s BackgroundSubtractorMOG2 and Canny edge detector algorithms to detect motion. Upon motion detection videos are uploaded to AWS S3 and meta data to AWS DynamoDB. A Python-Flask based web server running on the Pi provides video live streaming. Used TensorFlow Inception-V3 running on AWS EC2 to classify images in the videos. A web based visualization interface built using Javascript, HTML and Google charts provide access to the videos and meta data

View Project User Interface Backend Code Frontend Code

Information Retrieval - Log Search using Inverted Index

Python, MRJob, AWS EC2, AWS EMR, AWS S3, MongoDB, Shunting-Yard algorithm

Created a Log Search tool using Inverted Index. It supported full Boolean search query. Query was processed using Shunting-Yard algorithm. Boolean Search was processed using Python set methods. Two versions was implemented. A local version and a cloud version
Local version: On a log file of 100 MB, speeds faster than Grep was achieved. Inverted index was create using a Python script. Postings list was stored in a local instance of MongoDB. The input log file was split into smaller files. Postings list contained filename and offset of log entries
Cloud version: Inverted index was created using MRJob and AWS EMR. Postings list was stored in MongoDB in AWS EC2. Each log entry was stored in AWS with a unique key

View Project Code

Randomized Experiment - Discrimination in Rental Housing Markets

Python, R, Test of proportions

In this experiment, housing rental advertisements from 5 cities were scrapped from Craiglist using Python and deduplicated using text analytics. The study utilized a 2x2 factorial design to measure the prevalence of racial and/or social status discrimination. Name was used to indicate race ('white' and 'black' sounding names) and type of job in the email body was used to indicate social status. Responses were read using Python and test of propoetions was used for statistical signifance.

View Project

Visualization - Washington Metro ridership

Javascript, D3, Bootstrap, Tableau

Used Washington Metro rail passenger ridership data for the month of May 2012. Used D3/Javascript and Tableau to create visualizations. Created Chord diagram and small multiples using D3/Javascript.

View Project Code

Prediction - Forest Cover Type Prediction

Python, Scikit-Learn, Matplotlib

In this Kaggle competition, we used Extra trees classifier and 16 engineered features to predict forest cover type using cartographic variables

View Project Code

Big Data - Music service using the Million Song Dataset

Python, Javascript, Softlayer, HDFS, Spark, PCA, K-Means

Used Softlayer to store the dataset (270 GB) in HDFS and then used Spark for feature reduction and clustering. For each song in a cluster, the next 100 similar songs were computed using Euclidean distance and stored in ElasticSearch. Created a website using Python-Webpy that allowed users to search songs and once a song was selected the website played the next similar songs by querying ElasticSearch. The website streamed 30 seconds song clips from 7Digital using song ids.

View Project Code

Neural Network - Classifying notMNIST dataset

Python, TensorFlow (Udacity - Deep Learning MOOC)

Trained 2 Neural Networks as part of assignments of this course
One with 4096 nodes in a single hidden layer
Second with 2 hidden layers with 1024 nodes each

Assignments

Introduction to Machine Learning

Course at UC Berkeley (Master of Information and Data Science)

Linear Regression, Logistic Regression, Nearest Neighbors, Naive Bayes, Decision Trees, Gradient Descent, Clustering, Gaussian Mixture Models, PCA, Graphs, Expectation Maximization. Using Python and Scikit-Learn

Assignments

Machine Learning at Scale

Course at UC Berkeley (Master of Information and Data Science)

Trained models on very large datasets using distributed infrastructure and parallel computation. Tools used - Python, Spark, MRJob, MapReduce, local Hadoop and AWS EMR. Linear and logistic regression using gradient descent. Shortest path, PageRank, Decision Trees, SQL Joins, Market Basket using Apriori

Assignments

Experiments and Causal Inference

Course at UC Berkeley (Master of Information and Data Science)

Randomized Experiments and experiment design. Mainly used R

Assignments