Avatar
🧠

Organizations

81 results for Page
    • https://www.reddit.com/r/dataengineering/comments/1hq9dwl/complexity_of_data_transformations_and_lineage
      • If you’re working with data as a primary focus, part of the job (a big one) is documenting what you’re doing and validating what you touch before shipping it.
      • I’m confident that my systems are and remain correct because I confirm the state of things before starting new work and document what I did, it adds like an hour to a project.
    • https://www.reddit.com/r/dataengineering/comments/10usa5i/looking_for_an_opensource_data_lineage_app_where/
      • Context: company has been documenting all its data objects manually and has a large csv explicitly showing each data object and its predescessor/s. These aren’t just the standard database/workflow/dashboard objects; these include things like power automate scripts. I’m just looking for a good way to show everything in a map, visualize them, and navigate through their connections properly) At this point, I’ll even be happy with a pure visualization engine, like for instance if I can repurpose kedro-viz or dbt’s lineage visualizer so that it can take a csv or json of object relationships as an input. Or even a custom power BI visualization or python graph frontend would be fine, but I can’t seem to see one that works. I’d also be happy if any of the aforementioned lineage tools I mentioned above have this functionality and I just missed it.
    • https://www.reddit.com/r/dataengineering/comments/1ba4g7v/how_to_diagram_sql_queries/
      • Love dbdiagram :) I’m also using dbdocs as a light-weight data catalog instead of plain dbt docs. While I do find dbt docs useful for data lineage, I’ve discovered that I can achieve the same functionality through my dbt core setup using the dbt Power User VSCode extension. And dbdocs fill in the gaps: ERD, table metadata, easy to deploy, shareable,… almost cover 90% of my needs
        • => implying dbdocs isn’t the lineage surface
    • https://www.reddit.com/r/SQL/comments/nxbtxb/tools_to_draw_data_lineage/
      • Good for data models and showing direct relationships between tables, but it doesn’t show data flows. When you visualize data flows, you want to see data from which table ends up where, it’s different than “Column X is an FK to Column Y”
      • Will dbdiagram be possible to show how several columns are being transformed to one column?
    • https://www.reddit.com/r/dataengineering/comments/1kyi6hx/what_do_you_use_for_lineage_and_why
      • i’ve used a bunch of these. real talk: data lineage is overrated at early stages & often overcomplicated. when ur team is < 10, physical lineage diagrams on a whiteboard + good dbt docs get you 80% there. we started with DBT lineage for our first year which did the job, then built custom lineage in Preswald when we needed more flexibility (needed to include non-dbt systems). the problem with most enterprise lineage tools is they force you into their ecosystem - great for huge teams with dedicated resources, massive overkill for startups. your investment should match your problems - if ur just trying to debug why a dashboard broke, dbt docs are prob fine. if ur trying to comply with SOX, yea get OpenLineage or something heavy duty.
      • Hey , i also want an open source tool for automated data lineage for my company which we can integrate in our product which is a data security product . I am going through openmetadata , but finding it difficult . Can you suggest any lightweight and easy to use tool which is open source ? and which can be used for automated lineage . I went through many tools online like DataHub , Collate , Informatica , etc . Most sites and GPTs suggested to use OpenMetaData. WHat is your recommendation .
    • https://www.reddit.com/r/dataengineering/comments/1ijr4jd/what_data_lineage_tools_do_you_use_and_what_makes/
      • I’ve been working with OpenLineage lately and I like it a lot. It’s an open standard for collecting lineage data. Great community and they are very open to PRs and new features. Not a ton of integrations right now, but they have most of the big ones.
      • I’ve been looking closely at DataHub for awhile now, and I think I’ll be using in coming projects. It’s an open-source tool with a managed version by Acryl. It does a bit more than just data lineage too, so may be overkill for what you’re after.
    • What are the short comes of current data lineage tools?
      • Do the current lineage tools address data audit needs?
      • https://www.reddit.com/r/dataengineering/comments/1gjzsu7/what_are_the_short_comes_of_current_data_lineage/
      • The field is pretty crowded and most of the data platforms are already providing lineage out of the box.
      • Make it our own, with thst said 90% of our platform is custom pyspark code running on aws, databriks or azure. No comercial offer does cover that , but they could :). We hocked the backend into our internal llm bot, so not user can just slack into it. No commercial would letvyou do that, they would sell it to you as a addon. Plus we are global brand and we shared our code with other sister brands and we all exchange internal features.
      • Bugs everywhere.
      • We use Collibra for snowflake lineage. Coverage of sql syntax is ok but it’s very buggy and hard to manage. No proper APIs for lineage means manual management. Other issue is it works on scraping the query logs in snowflake for a period of time so it can produce confusing results after code changes.
    • Is data lineage one of the most underrated thing in DE?
      • https://www.reddit.com/r/dataengineering/comments/1g8k2h5/is_data_lineage_one_of_the_most_underrated_thing/
      • I worked for multiple companies as a DE and zero of them applied anything related to data lineage. Whenever my team mentions it would be important to do this it gets ignored.
      • If they don’t do documentation, I wouldn’t even expect have of them to know what data lineage even is.
      • Data lineage is one of those things no one thinks they need… until they do. Like when you are debugging why a multi-system process or ETL isn’t working. The question of, “where did this data come from” comes up and now you are wasting time trying to find that out. It really sucks if it passes through multiple systems or multiple formats. (ODBC and JDBC are really sneaky like that.) Be the person that documents their stuff and allocate time for it. It will be an uphill battle because documentation is one of the first things thrown overboard when the inevitable money/time crunch shows up.
      • This question keeps me up at night, since I’m in the process of building a POC database engine that has cell-level data lineage, forwards and backwards. I’ve been in data over 20 years. Most in DE supporting analytics. I’ve NEVER been somewhere that had robust data lineage. It drove me nuts enough to spend years dreaming up a robust solution. Why don’t places care? As someone who wants to open source something and launch a business around it, it drives me nuts. Am I crazy for finding data lineage fundamental? I don’t think the current gen of tools are there. I don’t think OpenLineage is good enough. It’s progress. (I guess that’s why I’m building my own.) I haven’t used Dagster but anything that doesn’t preserve transaction logs in a way that syncs up with time travel in a consistent way, to me, just isn’t good enough. The major downside of my approach is you only get lineage inside my engine. That’s probably a non-starter for many places, especially those big enough to be early adopters. IDK.
    • https://www.reddit.com/r/dataengineering/comments/1g3e20y/data_lineage/
      • How do you all like to track dataset lineages? Dependencies between tables, sources/sinks per job, something like Kafka to a Spark written Iceberg table joined with another table to eventually landing in Snowflake… etc?
    • https://www.reddit.com/r/dataengineering/comments/1cvmerf/data_lineage_tools/
      • OP is describing the exact use case for OpenLineage, but it’s hard to estimate how complete their lineage graph would be without knowing more about their tooling. OL will give you column lineage for Spark and Airflow jobs. Dbt is supported, as well.
      • There are open source catalogs, like DataHub, but data lineage in it is extremely limited. So they do exist, but most likely will not suit your needs. Then you have paid products like Informaticas data catalog, which is out of scope. They support more or less everything.
      • Use SQLMesh, it has lineage, diffs, etc.
      • I was pushing for OpenMetadata at my last job, lineage being one of the selling points. I never got it deployed.
    • https://www.reddit.com/r/dataengineering/comments/1iddujm/data_lineage_and_quality_tool/
      • I’m exploring OpenMetadata for data quality, governance, and lineage. While I’m not necessarily opposed to containerized deployments, I’m prioritizing ease of use, especially when it comes to automated data lineage and quality testing. I’m looking for alternative tools that might be more convenient to work with in these specific areas. Are there any tools that are considered “better” than OpenMetadata in terms of simplifying the process of setting up and managing automated data lineage and quality tests? Any recommendations would be greatly appreciated!
      • SQLMesh is a solid tool for managing transformations, plus you get column level lineage of your models as a part of the open source offering.
    page Created Mon, 25 May 2026 00:00:00 +0000
    • Goal

      • Resolve discrepancies between MRR (Monthly Recurring Revenue) numbers on bi.holistics.io and Zoho Billing, and establish a single source of truth for customer MRR.
      • Stakeholders: Data Team, RevOps (Quinn, Arden, Vincent), Finance (Sriram).
      • Why it matters: MRR is the core metric for revenue planning (see [[Lead Funnel by Sales Motion]]’s $330K target). If the number is wrong, every downstream decision — pricing analysis, churn measurement, forecasting — is wrong.
      • Decision-making questions:
        • Why does MRR on bi.holistics.io differ from Zoho Billing?
        • Which exchange rate methodology should we use for multi-currency subscriptions?
        • How do we handle customers with multiple tenants or multi-region setups?
    • MRR Definition

      • MRR = sum of all active subscriptions’ monthly recurring revenue.
      • Zoho’s base currency: Singapore Dollar (SGD).
      • Per-subscription calculation: subscription.mrr / subscription.exchange_rate
      • Customer = a HubSpot Company (one company can have multiple Holistics tenants and multiple Zoho subscriptions).
      • Primary Tenant: for customers with multi-region or multi-tenant setups, the pipeline must select one primary tenant. This is the core identity resolution challenge.
    • Data Architecture

      • flowchart TD
            subgraph Sources["Data Sources"]
                Holistics["Holistics Backend DB\n(Tenants, Users, Trial Submissions)"]
                Zoho["Zoho Billing\n(Subscriptions, Invoices, Exchange Rates)"]
                HubSpot["HubSpot CRM\n(Companies)"]
            end
        
          subgraph Pipeline["dbt Pipeline"]
              stg_tenants["stg_holistics__tenants"]
              stg_users["stg_holistics__users"]
              stg_trials["stg_holistics__trial_submissions"]
              stg_zoho["stg_zoho__subscriptions"]
              stg_hs["stg_hubspot__companies"]
        
              domain["Domain Mapping\n(tenant → domain_name)"]
              customer_id["itg_mappings__customer_identities\n(Holistics ↔ Zoho ↔ HubSpot)"]
              dim_customer["dim_customers\n(primary tenant selection)"]
              fct_mrr["fct_mrr\n(MRR calculation)"]
          end
        
          subgraph Issues["🔴 Known Issues (9 problems)"]
              I1["1.1/1.2: Exchange Rate Errors\n(static rate, multi-step conversion)"]
              I2["2.1-2.4: Wrong Tenant↔Company Mapping\n(region bug, domain mismatch,\nduplicates, multi-domain)"]
              I3["3.1/3.2: Wrong Tenant↔Zoho Mapping\n(manual errors, multi-tenant sub)"]
              I4["4: Active but Unpaid Customers\n(Zoho subscription logic bug)"]
          end
        
          subgraph Output["Reports"]
              bi_h["bi.holistics.io\nMRR Overview Dashboard"]
              monitor["Customer Identity\nMonitoring Dashboard"]
          end
        
          Holistics --> stg_tenants & stg_users & stg_trials
          Zoho --> stg_zoho
          HubSpot --> stg_hs
        
          stg_tenants & stg_users & stg_trials --> domain
          domain --> customer_id
          stg_hs --> customer_id
          stg_zoho --> customer_id
          customer_id --> dim_customer
          dim_customer --> fct_mrr
          fct_mrr --> bi_h
          customer_id --> monitor
        
            I1 -.->|affects| fct_mrr
            I2 -.->|affects| customer_id
            I3 -.->|affects| customer_id
            I4 -.->|affects| stg_zoho
        
      • Data Sources

        • HubSpot CRM: Companies.
        • Holistics Backend DB: Tenants, Users, Trial Submissions.
        • Zoho Billing: Zoho Customers, Subscriptions, Payments, Invoices, Exchange Rates.
      • dbt Pipeline Flow

        • Staging: stg_holistics__tenants, stg_holistics__users, stg_zoho__subscriptions, stg_hubspot__companies
        • Mapping: itg_mappings__customer_identities (Holistics ↔ Zoho ↔ HubSpot)
        • Dimensions: dim_customers (primary tenant selection logic)
        • Facts: fct_mrr (final MRR calculation)
      • Identity Resolution

        • Currently relies on domain_name (extracted from tenant uname or email). This is fragile.
        • Proposed long-term fix: a canonical customer_skey shared across HubSpot and Zoho.
      • ER Diagram

      • Key Datasets & Dashboards

    • Exchange Rate Decision

      • Base currency: SGD (Zoho uses only 1).
      • Reporting currency: USD (agreed upon, >40% of revenue is in USD, USD is more recognizable and comparable).
      • Problem: pipeline previously used static rates; Zoho uses per-transaction rates set at plan creation time.
      • Agreed approach (Option 2): use live rates (e.g., Google Finance / ECB) — specifically the current/today’s rate for a “what is it worth now” view.
      • Caveat: requires careful handling to avoid FX noise in historical growth metrics.
    • Identified Problems & Root Causes

      • CategoryProblemRoot Cause
        Exchange Rate1.1/1.2: InaccuraciesPipeline uses static rates; Zoho uses per-transaction rates set at plan creation
        HubSpot Mapping2.1–2.4: Mapping BugsMissing regions, domain mismatches (e.g., raft.ai vs vector.ai), duplicate HubSpot companies
        Zoho Mapping3.1/3.2: Link ErrorsManual errors and system inability to handle multi-tenant subscriptions
        Subscription Logic4: Unpaid ActivesZoho fails to re-activate tenants after past-due payments are settled
    • Edge Cases

      • Multi-tenant customers: Datacubed — one Zoho account for two tenants.
      • Multi-region conflicts: Kognity — SG database shows expired trial while US database shows active paying status.
      • Domain mismatches: Vector AI — HubSpot uses vector.ai, app uses raft.ai.
    • Technical Fixes

      • PR #812: Fixed missing regions in manual mapping.
      • PR #852: Implemented hard-coded patches for edge cases:
        • map_holistics_zoho.csv: maps specific tenants (e.g., Datacubed) to shared Zoho accounts.
        • partner_programs.csv: excludes freemium plan IDs from MRR.
        • dom_holistics__internal_tenants.sql: filters out internal Holistics testing tenants.
    • Progress

      • DONE Identify root causes of MRR discrepancies
      • DONE Agree on exchange rate methodology (Option 2 — live rates)
      • DONE Fix missing regions in manual mapping (PR #812)
      • DONE Implement hard-coded patches for edge cases (PR #852)
      • DONE Set up Customer Identity Monitoring dashboard
      • TODO Replace static exchange rates with live rate ingestion — DAT-576 (for quick win)
      • TODO Implement canonical customer_skey for long-term identity resolution
      • TODO Handle Zoho re-activation bug for past-due subscriptions
    • References

    page Created Mon, 25 May 2026 00:00:00 +0000
  • page Created Mon, 25 May 2026 00:00:00 +0000
  • Plan for Next 1-on-1

    Key topics to discuss with manager

    1. MRR project transfer — status update after handoff from anh Hieu (due Apr 15). Any open questions or risks from the transfer.

    2. Presales capacity & portfolio review — currently active on 5+ customer accounts (Showbie, Jonas Chorum, Basata, Innerspace, Superbexperience). Request alignment on which accounts to prioritize vs. hand off. Discuss whether the embed demo app should become an official presales tool.

    page Created Mon, 25 May 2026 00:00:00 +0000
  • Plan for Next 1-on-1 (covering Apr 13 – May 20, 2026)

    Meeting context: anh Hieu left this period; presales surged to dominant project; DAT-576 MRR exchange rate is 5-week carry-over. Two backlog drops on the table. Bring 3 prioritization options for DAT-576 — this is a prioritization ask, not a scheduling ask.

    • P1 – Must Discuss
      • Internal
        • DAT-576 MRR exchange rate — re-scope or re-assign. 5-week carry-over. Anh Hieu left → context lost → restart. Bring 3 options: (a) re-scope to 1-week-effort presales-compatible version, (b) hand off to teammate whose week isn’t presales-loaded, (c) formally de-prioritize until presales calms post-BuyCo close. Decision needed in this meeting — do not commit a 6th week of “Mon AM blocks.” Linear DAT-576.
        • Formally drop DAT-524 Fix excluding internal Zoho test accounts — 14-week carry-over. Fix scope is trivial — add const_zoho_internal_customers filter to stg_zoho__events (other Zoho staging models already apply it). Arden already confirmed Quinn US test + Holistics SGD test accounts are the missing ones. Owner: nobody since W7. Decision: do it now (~30 min) or drop formally. Linear DAT-524.
      • Presales / Career
        • Career lane check-in — IC engineering vs solutions/presales hybrid vs tech lead. Performance review feedback (Mar) validated the hybrid direction. Anh Thanh (chief engineer) approaching me re: AI-AQL = solutions-engineering signal. BuyCo onboarding 4 solo = trust cashed. Want to confirm the current lane is intentional, not drift.
    • P2 – Should Discuss
      • Presales
        • Showbie loss synthesis — first lost deal. Lost to Omni (HubSpot Closed Lost recorded, Erin reply). Erin cited “overall functionality within the tool” before message truncated. Action: re-pull full Slack thread, then chase Harsha for personal feedback (apply The Mom Test — ask about specific past PoC moments, not abstract pain points). Share 1-page learning note focused on Looker-migration moat + security risk perception. Owe this for the team’s next deal.
        • BuyCo onboarding 4 follow-ups owed by me (from May 19 debrief): share PBI migration documentation publicly, publish migration skill as customer-facing package, send HTML action button samples to Rodolphe. Decision timeline tight — Xairo + Luc evaluating next week, GoodData competing.
        • Power BI playbook → “presales-owned reusable artifact” template? BuyCo onboarding 4 used conceptual-differences.md + migration-overview.md. Reusable. Should it become the template for future migration-flavored deals (Tableau, Looker, Sigma)?
        • Presales team shared status board — me + anh Dong + Chukwudi + Mario have no shared customer-status surface. Context scattered across Slack threads + individual notes. Propose Notion DB or Linear project.
      • Internal
        • Heuristics upstream to team docs — dbt deprecation 3-phase rule should land in holistics/dbt contributing guide. Customer-comms recipe pattern should land in presales/CS onboarding doc. Ask: is this welcome, or noise?
        • Backlog drop as standing 1-on-1 agenda item — “what’s the oldest item in my backlog, and should it still be there?” Process suggestion.
    • P3 – Nice to Discuss
      • Personal
        • Health Rules + Principles pages — converted W17 sick week + W20 badminton signal into rules. Sharing the pattern in case useful for team retros.
        • Personal Finance RFC-0002 — used Personal Finance app to make data-driven house-move decision. Side-project ROI signal worth sharing.
      • AI tooling
    • Decision-class items (manager input required)
      • ItemQuestionDefault if no decision
        DAT-576 MRRRe-scope, hand off, or de-prioritize?Continue slipping — bad outcome
        DAT-524 ZohoDrop or assign?Drop
        Career laneConfirm hybrid direction?Continue current pattern
        Heuristics upstreamWelcome or noise?Keep personal
    • Focus topics if time permits
      • Mon: DAT-576 + DAT-524 (decision-class items first)
      • Career lane check-in (10 min)
      • Showbie synthesis sharing (5 min)
      • Heuristics upstream (5 min)
      • Personal: Health Rules pattern (skip if rushed)
    • Carry-over watch (since last 1-on-1)
      • ItemSinceWeeksAction
        DAT-576 MRR exchange rateW16 (Apr 13)5Escalate as prioritization question
        DAT-524 exclude internal Zoho accountsW7 (Feb 9)14Drop formally
        Showbie loss synthesisW19 (May 4)2Owed — close this week
        H1 self-reflection + CV refreshW19 (May 4)2Lock May 23–24
        Wamly onboardingW18 (Apr 27)3Customer-side block — Chukwudi re-engaging
    • Metrics to bring (period Apr 13 – May 20)
      • Customer interactions: 8+ across BuyCo, Jonas Chorum, Basata, Bicycle Transit, Enhance Fitness, PatientsKnowBest, Showbie, Wamly
      • PRs merged: Calendly dbt #853, internal-aml-project #78, dbt #854, dbt #858, prefect #400, dbt #860, dbt #864, dbt #867, internal-aml-project #77
      • Carry-overs resolved: DAT-560 (Lead Funnel by Sales Motion, 5w), DAT-555 (fct_job_queue_performance, 4w), DAT-283 (Calendly, 8w)
      • Carry-overs lingering: DAT-576 (5w), DAT-524 (14w), demo video for local dev (5w, now unblocked)
      • New artifacts: [[Health Rules]], [[Principles]], Power BI playbook (conceptual-differences.md + migration-overview.md), holistics-embed-demo continuing
      • Onboarding calls led: Jonas Chorum #2, BuyCo #4 (solo)
      • Lost: Showbie → Omni
    page Created Mon, 25 May 2026 00:00:00 +0000
  • Gears

    • Keyboard: MelGeek O2 or Nuphy Air75
    • Laptop: Macbook
    • Mouse: Attach Shark X3
    • Headphone: SkullCandy Hesh ANC
    • Reading & Note: Kindle Scribe 2022
    • Softwares

      • Alcove
      • Better Display
      • AlDente
      • CleanShot X
      • Youtube Premium (+ Youtube Music)
      • Gemini
    • Misc

      • Sleep Fold
    page Created Mon, 25 May 2026 00:00:00 +0000
  • Noticeable calls

    • https://copilot.clari.com/call/11644b50-8a66-4c36-a91b-e0ed635d3211
    • Metric Catalogue

      • Base measures
        • Recommended Retail Price (or RRP; synonym - Manufacturer Suggested Retail Price, MSRP): The price suggested by the manufacturer for retailers to sell a product.
        • Total SKUs (or Total Products): Count of SKUs.
        • Total Features (or Total Ad Items): Count of Features.
        • Retailer Price (or Selling Price): The retailer’s non-promotional shelf price for a product (within that retailer).
        • Promotional Price: The temporary, reduced price offered by a retailer for a product in a promotion.
      • Metrics (are built from measures)
        • Current Price: If promo applied, current_price = promotional_price, else current_price = rrp
          • Current Price must track the Shelf Price because in mature retail markets (like Australia/UK/US), the “Standard Shelf Price” is often lower than RRP. If you default to RRP when no promo exists, you will artificially inflate the “Was Price,” making discounts look deeper than they are. The current_price should always track the Shelf Price, not the RRP, unless they happen to be identical.
        • Was Price: retailer_price
        • Discount Percentage: Logic: 1 - (current_price / NULLIF(was_price, 0))
        • Discount Amount: was_price - current_price
        • Retailer Lowest Price: The lowest single price point detected for a Retailer within the specific Start Date and End Date of the selected Advertisement. Logic: MIN(current_price)
        • Average Discount Percentage: Logic: AVG(discount_pct)
        • Average RRP: AVG(rrp)
        • Average Promotional Price: AVG(promotional_price)
    page Created Mon, 25 May 2026 00:00:00 +0000
  • A. Work

    • Internal

      • TODO Handle multi-currency exchange rates for MRR

      • Current pricing setup

        • Zoho’s base currency = SGD. Plans are priced in SGD, exchange rate is hardcoded in Zoho (example, 1 EUR = 1.4 SGD).
        • Plans are created in Zoho Billing with prices in SGD (base currency). Zoho doesn’t allow creating products in EUR/USD directly.
        • A manual exchange rate is set in Zoho (e.g., 1 EUR = 1.4 SGD).
          • Why? To prevent customers’ purchase amount can potentially fluctuate over time, the team does not use live rate in Zoho.
          • Customers see stable local currency prices on their subscription.
          • This is updated periodically (~yearly) to stay close to market rates.
        • The same plans are mirrored in Holistics App’s Tenant Admin.
      • Customer payment flow:

    page Created Mon, 25 May 2026 00:00:00 +0000
  • Done

    • DONE Unsub Contabo VPS due to unused

    • DONE Continue Cross-Model Calculation

      • This may not be a good timing to do this. Should wait for the docs revamping project being in progress more than now (as of #2026-01-03 ).
    • DONE Review AI sharing dashboard for Aurora

      • Timeline:
        • Aurora’s request came in via support. Phuong (AI team) looped in the data team.
        • Hieu investigated and found AI data lives in ClickHouse (logs + user_id only), NOT the production DB. This means user-friendly info (names, emails) and dashboard/dataset-level tracking aren’t directly available.
        • Tien (AI team) confirmed: user_id mapping is possible via prod DB, token usage exists in HOtel, but dashboard/dataset AI activity tracking is hard.
        • Triet advised: only answer what’s currently feasible, treat the rest as feature requests.
        • Hieu drafted a reply to Aurora listing what’s available (conversation ID, user ID, messages, token usage) and limitations (no usernames, no asset tracking, only MCP data).
        • Aurora accepted — said user_id is fine, they’ll map it themselves.
        • Hieu built a dashboard at https://us.holistics.io/dashboards/v4/1099511684352-aurora-ai-conversations and proposed Google Sheets delivery. Aurora also suggested S3 export as an alternative.
        • Nam asked Hieu to wait before sharing — he wanted to discuss commercially first.
        • Nam proposed turning this into a paid upsell package at $10,200/year (“Usage and AI Data Monitoring Transfer”).
        • Chinh (me) reviewed and said the data is feasible to share as spreadsheet, but noted Hieu’s dashboard is not ready to consume right now.
        • Chinh and Phuong questioned charging for AI usage data since the AI team is already building an in-app AI usage monitoring dashboard for all customers.
        • Vincent (CEO) clarified the monetization rationale: the in-app dashboard is free for all; what’s being sold is data sync/export to their own systems (S3/ETL). This targets enterprise needs: compliance, security auditing, long retention (7+ years), and custom analytics.
        • Triet and Vincent discussed delivery method — S3 vs Google Sheets vs ETL to their data warehouse. Vincent prefers S3 since Aurora mentioned it and it’s more scalable for a paid add-on.
      • For myself (data team):
        • Data source: AI conversation data is in ClickHouse (not prod DB). It has: conversation timestamps, conversation IDs, user IDs, messages (prompts + replies), token usage. User info (email/name) requires joining with a production DB mapping table.
        • Dashboard exists: Hieu built a dashboard at us.holistics.io/dashboards/v4/1099511684352 but currently not ready to share now.
        • NOTE I need to review the data in dashboard. But not now. Let’s wait for Aurora’s response and the team initialize a project about this. Not sure who will be main owner btw.
        • Prior art: I did similar work for SweetSpot before: https://holistics.slack.com/archives/C09GURCKQV8/p1772181645114809?thread_ts=1770132838.858979&cid=C09GURCKQV8
        • Delivery format is undecided: The current preference is S3 export (Aurora offered to share an S3 bucket). Google Sheets with scheduled delivery was the original plan because we can reuse the Google Sheets Delivery feature. Final decision pending Nam’s email to Aurora.
        • This is now a paid add-on ($10,200/year) — so quality and reliability matter. It’s not ad-hoc anymore.
        • Scope boundaries: Only provide what’s currently available. Don’t try to solve missing data (dashboard/dataset AI activity, feature-specific breakdown beyond MCP). The AI team will build in-app monitoring separately.
        • Stakeholders: Nam (commercial timing), Chinh (me) (data review/quality), and Phuong/Tien/Dat (AI team).
    • ((69bcb599-a91e-49ea-bc3d-6a5a651f03df))

    page Created Mon, 25 May 2026 00:00:00 +0000
    • Author: [[George Polya]]
    • I just bought the book. It’s $10. Not expensive. But the value it brings is huge. Worth it.
    • Key notes

      • Chap 1

        • Problem solving is a muscle. The more we practice solving problems, the better we are.
        • Problem solving is like swimming. We learn by imitating and practicing.
        • The teacher who wishes to develop his students’ ability to do problems must instill some interest into their mind and give them plenty of opportunity for imitation and practice.
        • 4 steps of problem solving:
          • Understand the problem logseq.order-list-type:: number
            • What is the unknown? logseq.order-list-type:: number
          • Devise a plan - connect the data, the condition, the unknown logseq.order-list-type:: number
            • Have you solve a similar problem before? logseq.order-list-type:: number
          • Carry out the plan logseq.order-list-type:: number
            • Can you prove it’s correct? logseq.order-list-type:: number
          • Look back logseq.order-list-type:: number
            • What have you done good and bad? logseq.order-list-type:: number
            • Can you derive the result differently and extend this problem? logseq.order-list-type:: number
        • The teacher should be in his student’s shoe, to help him understand and devise the plan better.
        • Do by example.
        • Could you restate the problem?
    • Quotes

      • “If you cannot solve a problem, there is an easier problem that you cannot solve.”
      • “The main danger is that the student forgets his plan. This may easily happen if the student received this plan from outside, and accepted it on the authority of the teacher, but if he worked for it himself, even with some help, and conceived the final idea with satisfaction, he will not lose this idea easily.”
      • “The teacher may ask: Can you see clearly that the triangle with sides x, y, c is a right triangle? To this question, the student may answer honestly “yes”, but he could be much embarrassed if the teacher be not satisfied with the intuitive conviction, then go on asking: But can you prove that this triangle is a right triangle? Thus, the teacher should suppress this question for now.”
        • Reflect on myself, sometimes I ask questions that dive very deep into one side of the problem, and it goes beyond my initial intention - just want to provide more knowledge to the student. Most cases it’s not good.
        • Because it kills the interest.
        • Don’t dive deep too much, just enough to get the student’s interest. This is more important.
      • “It is better to solve one problem in five ways, rather than solve five problems in one way.”
    page Created Mon, 25 May 2026 00:00:00 +0000
Next