Weekly Report — Mar 30 – Apr 5, 2026 (friday mode)
Work
Add2Cart
- DONE Eliminate bridge table (
bridge_product_retailer) from Redshift pipeline and Holistics AML layer- Completed all 5 phases of the migration plan (Slack thread):
- Phase 0: Snapshot baseline tables (
_migration_baseline,_bridge_backup) created on Redshift. - Phase 1: Added
retailer_id+retailer_current_pricing_skeycolumns toatc_price_history, backfilled ~27M rows (75.7% coverage). Known gap: 8,713 keys from Chemist Discount Center slug mismatch — documented. - Phase 2: Deployed 5 updated SPs (
sp_refresh_daily_prices,sp_refresh_atc_price_history_enrich,sp_run_all,sp_refresh_price_history_from_s3,sp_validate). Ransp_run_all()— enriched table validated at 12.1M rows, join path works without bridge. - Phase 3: 7 commits on
feat/eliminate-bridge-aml— removed bridge from all 4 datasets +market_price_rankmodel, added directrcp_skeyrelationships, deletedagg_retailer_product_pricemodel. Pushed to Holistics, dashboards verified. - Phase 4 (soak): Monitored dashboards end-to-end through bridgeless path — no issues.
- Phase 5 (cleanup): Dropped bridge/agg tables + SPs, deleted bridge AML model, cleaned up migration artifacts. PR #2 merged. Shared completion summary with Anurag in Slack. Bridge kept as
_bridge_product_retailer_backup(frozen, 222K rows) for reference. Remaining data quality items (near-miss slugs, missing retailers, history-only) handed off to Anurag.
- Phase 0: Snapshot baseline tables (
- Impact: Simplified join path from
master_product → retailer_current_pricing → bridge → atc_price_historytomaster_product → retailer_current_pricing → atc_price_history(one fewer hop), improving query performance.
- Completed all 5 phases of the migration plan (Slack thread):
- WAITING Guidance for countries-level data ingestion (Notion D4 doc)
- Investigated multi-country retailer ingestion from 6 RDS databases (
airbyte_schema_*). Found criticalprice_history_skeycollision: 2,774 keys collide across countries (39,710 rows), mainly Watsons across SG/ID/MY. - Decision: Holistics will not implement the ingestion. Created D4 guidance document with two approaches evaluated: (1) Dynamic Schema (separate Redshift schemas per country, switch via User Attributes) — recommended for Add2Cart; (2) Consolidated Schema (UNION ALL + RLP). Added Mermaid diagrams, comparison table, collision detection SQL, SP impact matrix, and AML examples.
- Feedback: initial version was too detailed on implementation. Rewrote to lead with “what needs to be done” and high-level approaches first. Waiting for anh Huy to review.
- Aligned with Anurag on Slack: he ingested retailers for all countries into Redshift.
- Investigated multi-country retailer ingestion from 6 RDS databases (
Presales
- DONE Study Jonas Chorum onboarding call 1 (Notion prep note)
- Hotel PMS company (~1,000 properties), non-multi-tenant (each customer = separate DB on self-hosted SQL Server). Key use cases: natural language querying over reservations, embedded analytics as paid add-on, dynamic data source for per-hotel routing.
- Answered tunnel/bastion server question in Slack — they ran Windows script instead of Linux, connection error escalated to support team (Tien).
- anh Huy led the call, I sat aside to assist. Decision from team review: known limitation on cross-database aggregation, need to clarify further with customer.
- DONE Refactor Erin (Showbie) dashboard — Chukwudi had a fever, asked for help via Slack to refactor from query models to decoupled AQL metrics. Huddled with anh Dong, then handed over to him for the call.
- DONE Onboarding call 2 with Basata (Taha) (ReadAI transcript, Slack debrief)
- Good sentiment with Taha. Key feedback: creating single dashboard + embed portal flow isn’t seamless, needs to be easier to understand. Question: “any best practices to make the embedded dataset easier for business users to use?” — introduced Custom View feature. Embed Portal vs Single Dashboard distinction remains a point of confusion.
- Embed portal tutorial already delivered: Notion guide, shared in #presales-sa Slack. Taha’s backend colleague to begin portal embedding.
- TODO Onboarding call 1 with Jonas Chorum — vibe-coding a dynamic data source demo. Built POC showing
data_source_namein AML routing per-hotel queries via JWT user attributes. Limitation identified: cross-database aggregation (50-60 DBs) requires a data mart. Moved to Apr 6. - TODO Polish embedding documentation — formalized as P2 backlog item. Will ask anh Tai and anh Huy next week to clarify scope and whether to own this. Friction observed across Showbie, Superbexperience, Basata, and Jonas Chorum.
- NOTE Presales capacity: anh Huy is deliberately stepping back from leading calls, delegating to team (me, anh Dong, Chukwudi, Mario) to build team capacity. The expectation is thorough preparation before every call.
- NOTE Presales load continues to increase: Jonas Chorum, Showbie, Basata active this week. The embedding use case is becoming the dominant pattern — 4 of last 5 customers need embedded analytics.
Internal
- DONE Thuan’s PR #845 (DE-208 /
mart_product__dataset_datamodel_dimensions) — MERGED. Fix for PARSE_JSON syntax error. All CI checks passed after review rounds in W13. - DONE Review DE-206 for anh Hieu — exchange rate document review completed. Notion doc and Slack thread.
- DONE Visualize schedule of all data lineage (Notion: Data Pipelines Schedules)
- Created full Mermaid flowchart of the entire pipeline: Extract & Load (21:30–23:55 VNT) → Domain Transforms (00:20–00:55) → Mart Transforms (00:15–01:35) → Post-Transform (01:35–08:00). Mapped all cron schedules from
prefect.yaml, dbt model dependencies viaref()calls, and Airbyte sync frequencies. - Key finding:
mart_eventdepends onmart_product(specificallymart_product__usersandmart_product__dashboard_widgets), but runs weekly on Monday 00:15 before the daily product mart at 01:10. Safe becausemart_eventdoesn’t use the+prefix (reads existing BigQuery data, no upstream rebuild).
- Created full Mermaid flowchart of the entire pipeline: Extract & Load (21:30–23:55 VNT) → Domain Transforms (00:20–00:55) → Mart Transforms (00:15–01:35) → Post-Transform (01:35–08:00). Mapped all cron schedules from
- DONE Check Zendesk tenant Inovonics (DAT-571)
- Investigated why tenant
US-1099511640417wasn’t mapped to Zendesk org in analytics. Traced throughitg_mappings__zendesk_tenant→ HubSpot domain mapping → Zendesk organizations table. - Root cause: Airbyte Zendesk source connector (
source-zendesk-support:0.2.6) has a pagination bug capping ingestion at 10,000 records. The raw organizations table had exactly 10,000 rows; 680 of 911 org IDs in tickets were missing. - Recommendation: Upgrade Airbyte Zendesk connector to fix pagination.
- Investigated why tenant
- DONE Add CI to validate Holistics AML project (PR #67 — MERGED)
- Triggered by a silent semantic conflict: PR #63 (Hieu) added a dashboard depending on
domain_hubspot_companies, and PR #64 (Thuan) deleted that model. Git didn’t flag it because they touched different files. - Implemented GitHub Actions workflow using Holistics Validation API to validate AML syntax and references on every push.
- Triggered by a silent semantic conflict: PR #63 (Hieu) added a dashboard depending on
- DONE Attended Product Office Hours P1 and P2 (Notion newsletter)
- Noted: AI theme builder tool at
holistics.h-theme-builder.pages.dev/theme-builder— will try with more prospects. - New dataset exploration UI looks promising.
- Data team should contribute to Holistics skills and internal-skills repos.
- Noted: AI theme builder tool at
- TODO Create agent skill for fixing dbt data pipelines for #data-ops-bot — carry-over from W13, not started.
- TODO Fix excluding internal testing Zoho accounts (DAT-524) — carry-over since W7, not started.
- TODO Calendly data pipeline (DAT-283) — carry-over, not started.
Docs
- TODO Polish embedding documentation — increasing embedding leads make this urgent. Same item tracked under Presales.
- WAITING Add demo video for local development docs — waiting for team to trigger.
Logseq
- DONE Add project glossary (PR #13) — created canonical project list in
pages/Projects.md. - DONE Improve backlog structure and automation query pre-processing (PR #12).
Personal / Tooling
- DONE Changed to a more professional work avatar.
- DONE Edit video last Hue Trip — completed on Saturday.
- DONE Sao kê (personal finance task).
- DONE Settled on mochi.cards as Flashcard app for learning English words.
- Installed Annotate — free on-screen annotation tool for customer demos and screen recordings. Lightweight alternative to paid tools.
- Discovered Pebble Index 01 — smart ring for quick voice notes. Interesting for capturing ideas on the go.
- Found Vietnam Real Estate Dataset on HuggingFace — potential hobby data analysis project.
- Note: Logseq automation is compounding — helps retain historical context across projects, improving performance in sync calls and connecting dots. The second reason for feeling productive this week.
Learning & Notes
- Watched How to Present a MIND-BLOWING Software Demo — key takeaways: recap slides (fact → problems → criteria → new findings), speak customer’s language (airline = passengers, SaaS = users), confirm what they care about before demo, use cases > features, have an assist person, send personalized recap to each audience member. Applied learning to Jonas Chorum call prep.
- Read How to teach technical concepts with cartoons by Julia Evans — guide on visual teaching. Potential application for Kindle Scribe. Follow-up from W13 recommended resource on “implementation challenges.”
- Read Details aren’t the problem. The problem is too many of the wrong details by Wes Kao — levels of detail in communication. Completed from W13 recommended resources backlog.
- LEARNING Single-tenant vs Multi-tenant architecture patterns — relevant context for Jonas Chorum (single-tenant: infra duplicated per tenant) vs most other customers (multi-tenant: shared infra, logical isolation). Understanding this distinction is critical for presales scoping.
- LEARNING Country names follow a
people + landpattern: Iceland, Greenland, England, Switzerland, Finland, Kazakhstan, Uzbekistan (stan = land). - NOTE Good observation from anh Huy: “If I just stay in the call, everyone can’t improve. The only thing that makes sense now is to force the team to prepare very well before the call.” Leadership by stepping back.
- NOTE Increase Calendly pipeline task to top priority next week — noted on Thu.
Next Week
- P1 — Add2Cart: Project retro with anh Dong — review the full bridge elimination + countries guidance work.
- P1 — Presales: Onboarding call 1 with Jonas Chorum — finalize dynamic data source demo, address cross-database aggregation limitation.
- P1 — Internal: Calendly data pipeline (DAT-283) — elevated to P1 per Thu note. Standardize source pipeline and build unified dbt models.
- P2 — Internal: Create agent skill for fixing dbt data pipelines in #data-ops-bot — carry-over from W13.
- P2 — Internal: Fix excluding internal testing Zoho accounts (DAT-524) — lingering since W7. Timebox 2h or explicitly deprioritize.
- P2 — Internal: Follow up on DAT-571 (Zendesk Airbyte connector upgrade) — ensure pagination fix is scheduled.
- P2 — Docs: Polish embedding documentation — ask anh Tai and anh Huy to clarify scope. Write step-by-step embed portal setup guide.
- P2 — Add2Cart: Get anh Huy’s review on the D4 countries guidance document.
- P3 — Presales: Read Modeling Patterns docs.
- P3 — Personal: Self reflection and update CV.
Career & Personal Consulting
Progress Review (Start/Stop/Keep):
- Start: Creating a reusable “embedded analytics scoping checklist” — you’ve now handled 5+ customers with embedding needs (Basata, Superbexperience, Showbie, Innerspace, Jonas Chorum). Each one follows the same pattern: data source setup → modeling → RLP/JWT → white-labeling. Document this as a repeatable template.
- Start: Proactively adding CI/safety nets — the AML validation CI (PR #67) caught a real problem (silent semantic conflict). This kind of infrastructure work prevents future firefighting. Look for similar opportunities.
- Stop: Writing overly detailed technical documents on first pass — the D4 guidance doc got feedback that it was “too detailed unnecessarily.” Lead with what needs to be done, then add implementation detail as appendix. Apply the Wes Kao “levels of detail” framework you just read.
- Keep: The observe-then-lead pattern for presales calls — Jonas Chorum (Huy leads, you assist) → Basata (you co-lead) → eventually you lead solo. This is the right progression.
- Keep: Delivering guidance documents instead of doing the work yourself (D4 for Add2Cart, embed portal tutorial for Basata) — this is staff-level behavior. Scope the problem, identify risks, hand off with a clear plan.
Observations:
- This was a high-output week: bridge elimination completed (Phases 0–4), countries guidance delivered and rewritten, Erin dashboard handed off, Basata call 2 done, data lineage visualized, Zendesk root cause found, AML CI added, PR #845 merged. The shift from “doing tasks” to “closing projects and handing off” is a maturing pattern.
- The presales embedding pattern is becoming your specialty. 4 of the last 5 customer interactions centered on embedded analytics. The Basata feedback (“creating single dashboard + embed portal flow isn’t seamless”) is product-level insight — escalate to product team.
- The Zendesk investigation (DAT-571) uncovered a systemic issue: 680 of 911 org IDs missing due to Airbyte pagination bug. This is the kind of data quality root cause analysis that prevents months of downstream confusion. Good instinct to trace the full lineage.
- The carry-over list is shrinking this week (DE-206 done, data lineage done, Erin done) — but DAT-524 (W7) and agent skill (W13) remain. Calendly is now elevated to P1. Good prioritization awareness.
- Positive signal: learning → application cycle is fast (demo video → Jonas Chorum prep; Wes Kao article → D4 rewrite feedback). This suggests high ROI on continued presales skill investment.
Recommended Resources to Learn
Embedded Analytics (your emerging specialty):
- Article: Embedded Analytics for SaaS: An Express Guide (2026) — Holistics’ own guide. Know your product’s positioning inside-out. Useful for framing conversations with Jonas Chorum and other embedding prospects.
- YouTube: Scaling SaaS Analytics Without Scaling Your Team — embedded BI demo and strategy session. Relevant to the “how do we scale to 1,000 properties” question from Jonas Chorum.
Presales & Demo Skills (continuing from W13):
- Book: The Trusted Advisor by David Maister — carry-over from W13. Still highly relevant as you transition from demo-runner to strategic advisor for embedding customers.
- Article: The 3-2-1 Speaking Trick — concise communication framework. Complements the Wes Kao article you just read on levels of detail.
Data Engineering & Pipeline Observability (your core craft):
- Tool: Elementary — dbt-native data observability. Carry-over from W13. Directly relevant to the agent skill for #data-ops-bot.
- Article: dbt Best Practices — official guide updated for 2026. Covers structuring projects, materialization patterns, and CI workflows. Relevant for the agent skill and mentoring.
- Tool: Dagster — modern orchestration with asset-centric approach. Worth evaluating if you’re rethinking the data lineage visualization problem (alternative to Google Stitch).
Career Growth (sustaining momentum):
- Book: Staff Engineer by Will Larson — carry-over from W13. The bridge elimination project (scoping → executing → documenting → handing off) is textbook staff-level work. This book helps you articulate that trajectory.
- Article: The Engineer/Manager Pendulum by Charity Majors — carry-over from W13. Your role blend (IC engineering + presales + mentoring) is exactly the pendulum she describes.