Data-Driven Insights for Smarter Decisions

Turning Data into Actionable Insights

I'm Dimas Prayoga, a Data Analyst based in Medan. I turn raw data into clear stories — from SQL queries and Python pipelines to BI dashboards. I work with SQL, Python (Pandas, scikit-learn), BigQuery, and build analytical dashboards in Tableau/Power BI/Looker Studio. Open to full-time roles and freelance collaborations.

View Portfolio Download CV

About Me

I hold a Bachelor's degree in Information Systems (GPA 3.69) and specialize in designing end-to-end data solutions — sourcing data with SQL/BigQuery, transforming it with Python, and delivering decision-ready dashboards in Tableau or Power BI.

I’ve shipped projects ranging from social media sentiment analysis to marketing uplift modeling and large-scale reporting (9M+ impressions / 1.7M+ clicks). My approach blends analytical rigor with practical business context: define the question, validate the data, choose the right method, then present findings with clarity.

If you need someone who can translate business problems into measurable, data-driven actions — let’s connect. I’m keen to help teams and clients turn data into growth.

Skills & Tools

Data Query & Database

SQL
MySQL
PostgreSQL
SQLite
BigQuery

Data Processing

Python
Pandas
NumPy
scikit-learn

Visualization & BI

Tableau
Power BI
Looker Studio

Spreadsheets

Excel
Google Sheets

Soft Skills

Problem-solving
Analytical Thinking
Communication

Featured Projects

Sentiment Analysis on Instagram Comments (Indodax)

Context: Financial brands receive huge volumes of comments that are hard to triage manually.
Data: ~10k+ Instagram comments scraped for analysis.
Approach: Text preprocessing (tokenization, stopword removal), TF-IDF features, Naïve Bayes baseline with scikit-learn.
Tools: Python, scikit-learn, Pandas, Matplotlib
Key Findings: Model achieved >80% accuracy with clear separation between positive/negative terms; top tokens revealed pain points and trust drivers.
Impact: Helped prioritize community responses and guided content strategy for higher engagement.

Drive

E-Commerce Data Analytics Dashboard

Context: E-Commerce business performance analytics to track revenue drivers, customer behavior, and campaign impact.
Data: Transactions & customer records (~100k rows).
Approach: Data cleaning in Python (Pandas), SQL modeling, and interactive visualization in Streamlit.
Tools: Python, Pandas, SQL, Streamlit
Key Findings: • Top 3 product categories generated >40% of total revenue. • Repeat customers contributed ~65% of total sales. • Conversion rate improved by 15% after targeted campaigns.
Impact: Helped stakeholders quickly identify profitable segments, optimize retention strategy, and support inventory/campaign decisions.

GitHub Website

E-Commerce Marketing Dashboard

Context: Marketing team needed a live view of funnel performance from impression to purchase across channels.
Data: >9M impressions and 1.7M clicks integrated from ads & analytics platforms.
Approach: Ingest with BigQuery, model metrics, and build a KPI dashboard with Power BI/Looker Studio for daily monitoring.
Tools: BigQuery, Power BI / Looker Studio, SQL
Key Findings: Identified underperforming campaigns by device & time window; discovered high-ROI segments.
Impact: Enabled budget reallocation and weekly optimizations, increasing CTR and ROI.

Drive Website

Bank Marketing Campaign Prediction

Context: A Portuguese bank needed to target customers with a higher probability of subscribing to term deposits.
Data: 41,188 rows, 20 features (demographic + campaign + macro).
Approach: Baseline Logistic Regression with balanced evaluation (ROC-AUC), followed by feature insights on call duration & contact history.
Tools: Python, scikit-learn, Pandas
Key Findings: Call duration and recent contact history were strong predictors; simple model delivered actionable targeting cues.
Impact: Improved call efficiency and increased conversion through better customer selection.

GitHub

Crime Data Mapping & Insight (Exploratory)

Context: A public-safety analysis project required spatial patterns from large geospatial points.
Data: >8M geospatial records with timestamps and categories.
Approach: Data cleaning, geohash binning, and heatmaps; trend analysis by hour/day and category.
Tools: Python, Pandas, Geo tools, Visualization
Key Findings: Temporal-spatial hotspots surfaced; distinct night vs day patterns per category.
Impact: Informed resource allocation recommendations for patrol planning.

Drive

Women’s E-Commerce Clothing Reviews — Sentiment & Insights

Context: An online women’s clothing retailer uses customer reviews as key signals of product quality, satisfaction, and market preference.
Data: Dataset includes ~23,000 reviews with star ratings, product IDs, text comments, and customer demographics (age group, etc.).
Approach: Text cleaning & exploratory sentiment analysis; distribution of ratings; demographic profiling; top-product review analysis.
Tools: Python, Pandas, Matplotlib, NLP preprocessing
Key Findings: • Positive reviews dominate — ~12,000 more 5-star reviews compared to 1-star. • Only ~800 1-star reviews → low dissatisfaction overall. • Top product: Clothing ID 1078 with ~987 five-star reviews, making it one of the 10 most reviewed products. • Largest age group: 36–45 (31.1%), followed by 26–35 → women aged 26–45 form the core customer base.
Impact: • Marketing: focus on ages 26–45 using Instagram/Facebook/Pinterest campaigns. • Product management: maintain stock & variations of bestsellers like Clothing ID 1078. • Service: address 1–2 star products with quality improvements; encourage positive reviews via loyalty programs. • Experiment: test bundles/discounts for favorites tailored to the 26–45 segment.

GitHub

North Sumatra Air Quality — Exploratory Analysis

Context: Focuses on the 2022 Air Quality Index (IKU) across 20+ regencies/cities in North Sumatra, comparing targets vs. achievements (gaps).
Data: Air quality indicators per district/city with yearly targets and realized outcomes; computed gaps (achievement − target).
Approach: Exploratory data analysis on regional performance; gap distribution analysis; identification of top/bottom performers; urban vs. rural comparison.
Tools: Python, Pandas, Matplotlib/Seaborn
Key Findings: • Provincial average target achieved → overall gap ~0. • Best performers: Toba (+9.9), Tapanuli Tengah, Dairi. • Underperformers: Medan (−8.0), Deli Serdang, Serdang Bedagai. • Trend: large urban areas underperforming → likely due to urbanization, transport, industrial emissions. • Most other regions hover around target (±2).
Impact: • Benchmarking: study factors behind Toba/Tapanuli/Dairi’s success and replicate elsewhere. • Priority: intervene in Medan/Deli Serdang with stronger urban air management. • Strategy: reduce disparities via continuous monitoring & interactive dashboards.

GitHub

Telecom Customer Churn Prediction

Context: Data analytics project to understand churn behavior in a telecom company and build predictive insights for customer retention.
Data: Telco Customer Churn dataset (~7,000 records) including demographics, contract types, internet services, payment methods, and churn labels.
Approach: Performed data cleaning and preprocessing in Python (Pandas, NumPy), exploratory data analysis (EDA) with visualization libraries (Matplotlib, Seaborn), feature encoding and scaling, followed by model training using Logistic Regression and Decision Tree for churn classification.
Tools: Python, Pandas, NumPy, Matplotlib, Seaborn, scikit-learn
Key Findings: • Customers on month-to-month contracts and paying with electronic checks showed the highest churn. • Longer-tenure customers were less likely to churn (loyalty effect). • Customers with additional services (TechSupport, OnlineSecurity, DeviceProtection) had lower churn risk. • Fiber optic subscribers were more likely to churn compared to DSL users.
Impact: The analysis provided business stakeholders with a clear retention strategy—highlighting at-risk customer segments, quantifying churn drivers, and supporting targeted campaigns to potentially reduce churn by ~12%.

GitHub Website

Contact

Have a role, project, or question? Send a message. I usually reply within 24 hours.