Twitter Account Classification

Built with: Python (pandas, scikit-learn, NumPy)

  • Executed comprehensive data pre-processing on Twitter datasets, including stratified splitting to preserve account type distribution, and developed the cleanTweet function to improve text quality for model input.
  • Employed a bag of words model with logistic regression for account classification, fine-tuned parameters for optimal performance, and analyzed keywords to distinguish between human and non-human accounts.
  • Check it out on GitHub

Precision-Tolerant Database System (Capstone project)

Built with: MySQL

  • Research into flexible database systems that accommodate imprecise data, challenging traditional relational models to improve data retention and query accuracy while maintaining integrity.
  • Developing precision-tolerant database systems that retain imprecise data violating Numerical Conditional Constraints, extending existing DBMSs to improve data retention and query handling while exploring cost-saving benefits and analytics enhancements.
  • Check it out on GitHub

Ecommerce Data Insights

Built with: dbt, Snowflake, Apache Airflow, SQL

  • Comprehensive hands-on project integrating dbt (data build tool) with Snowflake, focusing on advanced data transformation and deployment techniques to enhance data pipeline efficiency and reliability.
  • Key steps executed: environment setup, dbt configuration, model creation and transformation, implementation of macro functions and testing protocols, culminating in Airflow-based model deployment for streamlined data workflows.
  • Check it out on GitHub