Chinh (lelouvincx) / Weekly Report - 2026-W15

Created Mon, 25 May 2026 00:00:00 +0000 Modified Mon, 25 May 2026 06:02:25 +0000
2520 Words
  • Weekly Report — Apr 6 – Apr 12, 2026 (friday mode)

  • Work

  • Internal

  • DONE Fix failed dbt tests — snowplow events (DAT-569)

  • DONE Check Zendesk — Tenant Inovonics (DAT-571)

    • Triggered by Vincent’s Slack message: Zendesk Customer Support dashboard showed 0 for Inovonics. Traced full lineage from fct_zendesk_ticketsitg_mappings__zendesk_tenantstg_zendesk__organizations_airbyte_raw_organizations and found org 13430326317849 missing entirely.
    • Root cause: Airbyte Zendesk connector v0.2.6 uses offset-based pagination capped at 10K records by Zendesk API. Raw table had exactly 10,000 rows; 680 of 911 org IDs in tickets were missing. Entire downstream lineage from staging to mart was broken.
    • Wrote upgrade plan evaluating 3 options: (1) Upgrade Airbyte platform v0.35 → v2.0 (fixes root cause for all connectors, but high DevOps effort + risk to HubSpot syncs during migration), (2) Rebuild in Prefect (data-team-only, orgs-only ≈ 2–3 days), (3) Use Fivetran (managed SaaS, ≈ 2–3 days, requires refactoring all stg_zendesk__*.sql models). Waiting for review.
    • Interim fix applied: manually upgraded connector in Airbyte UI to v2.6.6. Vincent confirmed dashboard now shows data. Long-term Airbyte upgrade tracked as a Linear project.
  • WAITING DAT-555 — Fix fct_job_queue_performance BigQuery resource limit (DAT-565)

    • Parent issue: Investigate dbt Mart Product flow failures — 3 sub-issues: DAT-567 (PARSE_JSON, done), DAT-566 (uniqueness test, analyzing), DAT-565 (resource limit, PR reviewing).
    • Decision (discussed with anh Triet): keep the model at minute-level granularity for DevOps debug tracing. Originally planned to remove it, but DevOps needs it for job queue overload diagnosis. The in-app Job Monitoring dashboard is not a full replacement since it lacks historical minute-level data.
    • Root cause: the model performs a CROSS JOIN between a minute-level timespine and jobs table. _dbt_max_partition was 6 months behind (2025-10-08), causing unbounded lookback on incremental runs — consuming 5.7M CPU seconds vs 724K limit.
    • Fix: cap incremental lookback to 7 days using greatest() + coalesce() on _dbt_max_partition. Added _loaded_at column to distinguish current-run data from historical partitions (needed because insert_overwrite leaves old partitions untouched). Added singular test assert_job_queue_performance_lookback_within_7_days.sql.
    • PRs open and awaiting review: dbt #854 (cap lookback, gemini-code-assist reviewed with feedback on NULL handling — addressed) and internal-aml-project #69 (remove downstream AML dataset/models, gemini approved). Notion doc: DE-212.
    • Backfill plan: ~26 weekly chunks once PR is merged.
    • Dashboard cleanup: delete [WIP] Job Queue Performace 3.0, keep Job Queue Performance Estimation (v2) for DevOps tracing, review Report Job Queue Performance Monitoring with DevOps for potential redundancy.
  • WAITING Calendly data pipeline (DAT-283 — “Sync all calls from Calendly”)

    • Linear scope: currently only demo/onboarding calls ingested; need all 50+ event types (training, customer success, case study, etc.) for growth team’s sales rep performance tracking and Lead Funnel enrichment.
    • Major progress this week: defined full source schema from Calendly OpenAPI spec (42 API paths, 49 schemas). Narrowed to 7 tables present in BigQuery src_calendly: users, event_type, event, event_membership, event_invitee, event_guest, question_and_answer. Published ERD on dbdiagram.
    • Schema design decisions: flattened nested API objects (UTM params, location) into columns; broke arrays into separate relational tables; included Fivetran metadata columns for lineage and soft-delete handling; all PKs follow Calendly URI pattern.
    • Set up Fivetran as interim ingestion tool (bypassing broken Zapier → Google Sheets path that only had 400 of 4,438 events). Data now syncing to src_calendly in BigQuery.
    • Wrote full 6-phase dbt modeling plan: (1) source definition + seed mapping for event type categories (Sales/Retention/Solo/Internal/Marketing), (2) staging layer (3 views: events, invitees, hosts), (3) domain layer (2 tables: consolidated events at event-invitee grain, normalized Q&A), (4) mart layer (fct_calendly_events), (5) fix downstream fct_sales_leads join (replace broken event_uuid + src_gsheets with dom_calendly__events on invitee_email_domain), (6) deprecate 5 legacy models.
    • PR open: dbt #853 — review required, awaiting anh Dong.
    • LEARNING: when doing data pipeline ingestion, always find an OpenAPI spec for source schema reference. Critical for tracking API changes.
  • DONE Review Usage Monitoring dashboard

  • TODO Lead Funnel by Sales Motion (DAT-560)

    • Linear scope: add sales_motion dimension (call-first, trial-first, trial-only, call-only) to fct_sales_leads so DTS106 dataset can slice lead metrics by channel. 4 sub-issues: DAT-561 (enrich call sources, PR reviewing), DAT-562 (add dimension, backlog), DAT-563 (fix rep assignment, backlog), DAT-564 (re-audit with BizOps, backlog).
    • Moved detailed context to dedicated page [[Lead Funnel by Sales Motion]]. Synthesized classification logic, revenue planning baseline ($330K MRR target → 59 raw leads/month), data quality audit findings (Fanserv/Novigi mis-categorized due to same-day UTC ordering; missing call sources via non-Calendly channels; event_uuid NULL since Dec 2024), and override mechanism (HubSpot “DataOps Override” property as fallback).
    • Next step: review and make a plan; blocked on completing Calendly pipeline first (DAT-283 provides the enriched call data needed for DAT-561).
  • TODO Create agent skill for fixing dbt data pipelines in #data-ops-bot — carry-over from W13, not started.

  • TODO Fix excluding internal testing Zoho accounts (DAT-524) — carry-over since W7, not started.

  • Add2Cart

  • DONE Ingest countries data into Retailers table — completed Apr 5. Bridge elimination (Phases 0–5) fully done in W14.

  • DONE Document Add2Cart Dashboard Implementation Process — completed Apr 11, moved to Backlog/Done.

  • WAITING Guidance for countries-level data ingestion (D4 doc) — waiting for anh Huy’s review.

  • TODO Project retro with anh Dong — carry forward to W16.

  • Presales

  • DONE Onboarding call 1 with Jonas Chorum — completed Apr 10

    • Hotel PMS company (~1,000 properties), single-tenant architecture (each customer = separate DB on SQL Server). Participants: Ahmad (PM for SMS host copy), Kevin Lane (tech lead for Sprinter Miller), Mitchell (SWE), plus product strategy lead. Led by anh Huy, I supported with technical prep.
    • Slack thread: originally scheduled Mar 31, moved 1h later (11pm VNT). anh Huy flagged it as “embedding + dynamic data source setup call” and asked me to sit in and support Chukwudi.
    • Vibe-coded a dynamic data source demo at holistics-embed-demo.pages.dev showing per-hotel JWT routing with data_source_name in AML.
    • Impression: they have high expectations on the embedded solution. This deal will be a long run. Key limitation: cross-database aggregation (50-60 DBs) requires a data mart.
  • DONE Answer Erin and Harsha on embed portal questions (Showbie) — completed Apr 11

    • Erin’s questions: (1) JWT token expired in sandbox embed URL → explained it’s a security measure, just regenerate in sandbox; (2) can left-side panel be restricted? → clarified what access control options exist; (3) dashboard title shows in all embedding; (4) pointed to Row-Level Permission docs for per-account filtering; (5) confirmed tabs can use different underlying datasets.
    • Erin’s follow-up: comparing single dashboard iframe vs embed portal — confirmed single dashboard supports export + drill-through but NOT email subscriptions. Chukwudi confirmed the current quote is for single dashboard approach. I added: can pivot to embed portal with only 1 dashboard (no explore) to get email subscriptions while hiding explore panel.
    • Harsha’s question: asked about Looker-like dynamic schema switching ({{ _user_attributes['schema'] }}). Chukwudi pointed to Dynamic Data Source docs. I added Dynamic Schema docs with AML code example using H.current_user.schema variable.
    • Built a Custom Embed tester section in holistics-embed-demo.pages.dev so presales team can quickly demo how embed looks in a real application.
    • LEARNING: the UX gap between embed single dashboard and embed portal is real — prospects need to compare functionalities side-by-side (export yes/no, email subscription yes/no, explore yes/no).
    • LEARNING: In the future, hooli should be a unified toolbox for everyone to try embed portal in their real application. Customers: Basata, Superbexperience, Showbie, Innerspace, Jonas Chorum.
  • TODO Polish embedding documentation — vibe-coded holistics-embed-demo (React + Cloudflare Pages, JWT-based, multi-user simulation, RLS demo). Waiting on anh Tai and anh Huy to clarify scope.

  • Docs

  • TODO Add demo video for local development docs — waiting for team trigger.

  • Personal / Tooling

  • Vibe-coded holistics-embed-demo.pages.dev — full-stack React + Cloudflare Pages app demonstrating secure JWT embedding, multi-user impersonation for RLS testing, embed portal, Ask AI, single dashboard, and a custom embed URL tester. Dev tools panel shows raw JWT payload and generated iframe URL.

  • Bought ATK X1 Ultimate V2 mouse. No Bluetooth (1K dongle only). Has a web-based config panel at hub.atk.pro for Motion Sync and Straight Line Correction.

  • Edited Huế trip videos — published on YouTube.

  • Found the book Staff Engineer — saved for future reading. “At the moment don’t feel like I’m ready enough to be at that level.”

  • Settled on mochi.cards as Flashcard app (completed Apr 6).

  • Learning & Notes

  • LEARNING Asset dependency lineage typically flows: dbt models → AML models (table → query) → datasets → query reports → dashboard widgets → dashboards → schedules/alerts/shareable links/embed links.

  • LEARNING When doing data pipeline ingestion, always find an OpenAPI spec (like Calendly’s) for source schema reference. This is important because when the source API changes, you have a reference to update data code accordingly.

  • LEARNING dbt incremental models: capping lookback windows (e.g., 7 days) with greatest() prevents resource exhaustion when _dbt_max_partition falls behind. Add _loaded_at column to distinguish current-run data from historical partitions when using insert_overwrite.

  • LEARNING Embed analytics UX: the gap between embed single dashboard and embed portal is a consistent friction point across prospects. Building a live demo app (not just docs) helps presales communicate the difference effectively.

  • NOTE Found a Logseq CLI idea for AI agents.

  • NOTE Started tracking TODO items: self reflection + CV update, find football dataset for Duc Anh’s teaching.

  • Next Week

  • P1 — Internal: Transfer MRR project with anh Hieu — due Apr 15. High urgency.

  • P1 — Internal: Calendly data pipeline (DAT-283) — get PR #853 reviewed by anh Dong. Once merged, run models and validate data. Then fix downstream fct_sales_leads join.

  • P1 — Internal: Write 1-on-1 report (noted in Apr 15 journal).

  • P1 — Internal: Create agent skill for fixing dbt data pipelines in #data-ops-bot — carry-over from W13, elevated to P1.

  • P2 — Internal: DAT-555 — get PR #854 and #69 reviewed. Plan backfill after merge. Clean up unused dashboards.

  • P2 — Internal: DAT-571 — get Zendesk upgrade plan reviewed and decide on option (Fivetran vs Prefect orgs-only vs Airbyte upgrade).

  • P2 — Internal: Fix excluding internal testing Zoho accounts (DAT-524) — lingering since W7. Timebox: max 2h.

  • P2 — Docs: Polish embedding documentation — clarify scope with anh Tai and anh Huy.

  • P2 — Docs: Add demo video for local development docs — target Wed or Fri.

  • P2 — Teaching: Find a football dataset for Duc Anh.

  • P3 — Add2Cart: Inactive — wait for Simon/Anurag to trigger again, then follow up with anh Huy on D4 review. Project retro with anh Dong deferred.

  • P3 — Internal: Lead Funnel by Sales Motion (DAT-560) — review and make a plan after Calendly pipeline stabilizes.

  • P3 — Internal: Contribute to Holistics skills and internal-skills repos.

  • P3 — Presales: Read Modeling Patterns docs.

  • P3 — Personal: Self reflection and update CV.

  • Career & Personal Consulting

    Progress Review (Start/Stop/Keep):

  • Start: Building reusable demo tooling for presales — the holistics-embed-demo app is already serving multiple customers (Showbie, Jonas Chorum). This is high-leverage work. Formalize it as a shared team resource and get buy-in from anh Huy.

  • Start: Proactively documenting decisions on dedicated pages (e.g., [[Lead Funnel by Sales Motion]]) — this consolidation pattern keeps project context accessible instead of buried in daily journals.

  • Stop: Letting carry-over items linger without resolution — DAT-524 has been open since W7 (6 weeks). Agent skill since W13 (3 weeks). Either timebox and do them or formally deprioritize with a note to manager. The backlog hygiene matters for credibility.

  • Keep: The “schema-first” approach to data pipelines — finding Calendly’s OpenAPI spec, mapping to ERD, then designing dbt layers. This is thorough and prevents rework.

  • Keep: Vibe-coding demo apps instead of only writing docs — the embed demo app communicates value faster than any document. Several customers are already benefiting.

    Observations:

  • This week’s work pattern shows a healthy mix of closing items (DAT-569 deployed, Zendesk interim fix, Jonas Chorum call done, countries ingestion done) and advancing strategic work (Calendly pipeline schema + modeling plan, embed demo app).

  • The Calendly pipeline work demonstrates strong engineering maturity: traced from broken Zapier pipeline → identified limitations → chose Fivetran as interim → mapped full API schema → designed 6-phase dbt plan. The disciplined approach will pay off.

  • Presales embedding is solidifying as your specialty. The embed demo app is now serving real customer conversations (Showbie embed portal questions answered same-day). Consider proposing this as an official presales tool.

  • The DAT-555 investigation shows good judgment: initial instinct was to remove the model, but after discussing with DevOps (Triet), pivoted to keeping it with a fix. Listening to stakeholder needs > defaulting to deletion.

  • MRR transfer (due Apr 15) is the highest urgency item for next week. Don’t let it get crowded out by carry-over work.

  • Workload is spread across 5 projects this week (Internal, Add2Cart, Presales, Docs, Personal). The Internal project dominates with multiple parallel tracks (DAT-569, DAT-571, DAT-555, DAT-283, DAT-560). Consider flagging to manager if context-switching cost is high.

  • dbt Incremental Models & BigQuery (directly relevant to DAT-555 lookback cap and Calendly pipeline):

  • Article: How to Use Incremental Models in dbt for Efficient BigQuery Data Processing — covers all 3 strategies (merge, insert_overwrite, append), lookback windows for late-arriving data, incremental predicates for cost optimization, and testing patterns. Directly applicable to the fct_job_queue_performance fix.

  • Discussion: dbt incremental models with insert_overwrite: backfill data causing duplicates — r/dataengineering thread on exactly the partitioning + backfill pattern you’re dealing with. Community solutions for handling _dbt_max_partition gaps.

  • Data Pipeline & API Ingestion (directly relevant to Calendly pipeline and Fivetran setup):

  • Article: How Fivetran, dbt, and genAI Can Supercharge Data Workloads — covers the Fivetran → dbt pipeline pattern you’re using for Calendly. Includes pre-built dbt packages for common sources (HubSpot, Zendesk, Jira) and MCP server integration.

  • Guide: 10 Best Data Ingestion Tools — evaluation criteria for ingestion tools (Fivetran vs Airbyte vs Hevo). Relevant for the Zendesk DAT-571 decision (Option 1 vs 3).

  • Embedded Analytics Architecture (your emerging specialty — Jonas Chorum, Showbie, Basata):

  • Guide: The Complete Guide to Embedded Analytics for SaaS Products (2026) — covers the 4-layer architecture (experience, data, security/governance, action), multi-tenancy patterns (JWT + RLS), and the 8 capabilities every SaaS product needs. Directly maps to the embed demo app you built.

  • Guide: Multi-Tenant Deployment: 2026 Complete Guide — deep dive on tenant isolation models, JWT-driven security contexts, and hybrid datasets. Addresses the exact Jonas Chorum single-tenant vs multi-tenant architecture question.

  • Comparison: Embedded Analytics vs Traditional BI: Complete Comparison (2026) — covers embed portal vs single dashboard tradeoffs (the exact question Erin/Showbie asked), white-labeling, and when to use each approach.

  • Career Growth & Presales (continuing from W14):

  • Book: Staff Engineer by Will Larson — carry-over. The Calendly pipeline project (tracing root cause → designing 6-phase plan → documenting for handoff) is textbook staff-level work.

  • Reading list: The Ultimate Presales Reading List for 2026 — 34 curated books for sales engineers. Highlights: The Trusted Advisor Sales Engineer by John Care (directly relevant as you transition from demo-runner to strategic advisor), The Six Habits of Highly Effective Sales Engineers by Chris White (practical habits for demo prep and discovery).