Packaging anonymized data taught me what diligence really tests

Earlier this week, I exported 142 session fragments for SLC Digital’s due diligence team. Not raw video. Not biometrics. Just state vectors—5-layer JSON payloads stripped of identifiers, timestamps, and session hashes. I zipped them, encrypted the file with their public key, and uploaded it to a burner Tresorit link. Then I sat back and realized: this wasn’t a technical request. It was a stress test of my honesty.

What I actually learned isn’t about data formats or anonymization pipelines. It’s that investors don’t trust the demo. They trust the gap between what you could show and what you choose not to. Most founders think diligence is about proving sophistication—showing the model card, the accuracy metrics, the pipeline diagrams. But when you’re a solo engineer building browser-based perception with no cloud ML, what they’re really looking for is restraint. They want to see that you know the difference between capability and overclaim.

I’ve seen other founders dump raw AU intensities, gaze heatmaps, and HRV traces into due diligence folders like they’re scoring points for complexity. We don’t. Our /state endpoint emits 47 signals, yes—ARKit blendshapes, rPPG-derived BPM, FACS units, prosody buckets—but the packet is structured, minimal, deterministic. No model scores. No “engagement” or “trust” logits. Just observables and fixed-weight derived indicators: R_risk, intent_clarity, human_signature. All calculated client-side in WebAssembly using methods from Giannakakis et al. and replications in IEEE 2025. No training. No gradients.

When I packaged those 142 sessions, I didn’t filter for “good data.” I included the shaky webcam feeds, the low-light sessions, the one where the test subject coughed mid-capture and the rPPG spiked. Because our liveness scorer already handles that. It’s been live since April 8, 2026—server-side, running on the Oracle ARM box in Chicago, analyzing the last five ticks of each session. It checks for BPM volatility below 12, gaze stability above 95 with zero blinks, micro-expression bursts that suggest video replay. We tested it: 3 real people, 2 spoof attempts (phone photo, screen replay), 100% separation. Threshold at 0.5. Margin of 0.2.

But I didn’t include the liveness scores in the export. Not because they’re secret—they’re deterministic, built from existing signals—but because the investor didn’t ask for anti-spoof logic. They asked for behavioral data. And if I’d thrown in an extra “liveness” field unprompted, it would’ve looked like I was trying to oversell. Like I needed the data to do more than it does.

That’s the trap so many fall into: inflating the narrative to match the funding stage. We’re pre-seed. Raising €2M at €6M pre. No customers. No revenue. Just code, patents pending, and a medical advisor who wrote an ethical positioning paper—not a clinical validation. So when I anonymize data, I don’t hide the emptiness. I highlight it. No user IDs. No geolocation. No device metadata. Just clean vectors ticking at 25Hz, each one a snapshot of face, voice, and motion—nothing more.

This changes how I frame everything. The next request will probably ask for signal distributions, noise floors, failure modes. I’m preparing those now. Not to impress, but to expose. Because what Emoulse is building isn’t an AI oracle—it’s infrastructure. Like Stripe for biometrics, but in the browser, on-device, sub-50ms. The dashboard (https://www.emopulse.app/dashboard.html) shows it raw: no smoothing, no storytelling.

What this forces is a new discipline: shipping truth instead of potential. No more “our model can detect depression with 95% accuracy”—we don’t claim that. Not now, not ever. Because once you start fudging the boundary between literature and implementation, you’re not building a company. You’re building a story. And stories don’t run on WebAssembly.

What do you actually leave out when you’re trying to be believed?

DE

Source

This article was originally published by DEV Community and written by EmoPulse.

Read original article on DEV Community

Back to Discover

Reading List