Technology Apr 18, 2026 · 12 min read

Private AI Inference with Midnight: Proving Model Outputs Without Revealing Inputs

Private AI Inference with Midnight: Proving Model Outputs Without Revealing Inputs Every time you send a query to a language model, a credit scoring API, or a medical diagnosis system, you're sharing your data with the service provider. For consumer applications, this is often acceptable....

DE
DEV Community
by Tosh
Private AI Inference with Midnight: Proving Model Outputs Without Revealing Inputs

Private AI Inference with Midnight: Proving Model Outputs Without Revealing Inputs

Every time you send a query to a language model, a credit scoring API, or a medical diagnosis system, you're sharing your data with the service provider. For consumer applications, this is often acceptable. For anything involving medical history, financial records, legal documents, or proprietary business data, it's a serious privacy problem with real legal and competitive consequences.

The naive solution — encrypting inputs — doesn't work at the inference layer. The model has to see the data to process it. Homomorphic encryption can theoretically compute on encrypted data, but it's orders of magnitude too slow for any real model. Trusted execution environments (TEEs) move the trust problem to hardware rather than eliminating it.

Zero-knowledge proofs offer a different approach: run inference in the clear (off-chain, where you control the compute), then prove the result was computed correctly without revealing the input. Midnight's Compact language provides the circuit infrastructure to make this verifiable on-chain.

This article covers the architecture, the commit-prove-verify pattern, practical use cases, a Compact circuit sketch, and the honest limitations of current ZK systems for ML inference.

The Privacy Problem in AI

Consider three representative scenarios:

Credit scoring: A fintech app wants to prove to a lender that a user's credit score (computed from bank transaction history) exceeds a threshold. The lender needs a reliable score. The user doesn't want to share their full transaction history. The fintech doesn't want to expose their scoring model.

Medical diagnosis verification: A hospital system uses a trained model to flag high-risk patients for intervention. A patient wants to prove they received a positive diagnosis (for insurance purposes) without revealing the specific input data — their symptoms, medications, age, or other sensitive factors.

Content moderation: A platform runs a classifier on user-generated content. The platform wants to prove to regulators that moderation decisions followed policy (the classifier output met a threshold) without revealing the content of individual posts.

In all three cases, the problem is the same: prove f(x) = y where f is a known function (the model), x is private (the sensitive input), and y is either public or selectively disclosed.

How ZK Proofs Address This

Zero-knowledge proofs let you prove statements about private data. The classic formulation: "I know an x such that f(x) = y" can be proven without revealing x.

For AI inference, this means:

  1. The input x is committed to: commitment = hash(x, secret) is posted on-chain before inference
  2. Inference is run off-chain: y = model(x) computed by the user
  3. A ZK proof is generated: "I know x such that hash(x, secret) = commitment AND model(x) = y"
  4. The proof is verified on Midnight: the chain accepts the proof without learning x

The key requirement: model(x) must be expressed as an arithmetic circuit (or equivalent constraint system). This is possible for many ML models but with significant constraints on model complexity.

What Models Can Be Expressed as Circuits

ZK circuits work over finite fields. The operations available are: field addition, field multiplication, and comparison (via range checks). Most ML operations map onto these:

ML Operation Circuit Representation Cost
Linear layer (matrix multiply) Field multiplications + additions O(m*n) constraints
ReLU activation Range check + conditional ~100 constraints
Sigmoid/Tanh Polynomial approximation ~500-1000 constraints
Max pooling Comparison circuit O(k) constraints
Batch normalization Multiply + add with fixed params O(n) constraints
Softmax Expensive (requires division and exp) ~10k constraints per class

For practical purposes, this means:

  • Feasible today: Small MLPs (2-4 layers, <1000 neurons), linear classifiers, decision trees
  • Feasible with engineering effort: CNNs for small images (MNIST-scale), small transformer attention blocks
  • Not currently feasible: GPT-scale LLMs, large vision transformers, models with floating-point arithmetic

The key constraint is that ZK circuits use fixed-point arithmetic — floating-point operations must be approximated using integer arithmetic with a scale factor.

Fixed-Point Arithmetic in Compact

ML models typically use float32 or float16. ZK circuits use integers over a prime field. The translation:

// Fixed-point representation: actual_value = field_value / SCALE
const SCALE: Uint<64> = 1000000;  // 6 decimal places
const SCALE_BITS: Uint<8> = 20;   // 2^20 ≈ 1M

// Fixed-point multiply: (a/SCALE) * (b/SCALE) = (a*b) / SCALE^2
// After multiply, divide by SCALE to renormalize
circuit fixed_mul(a: Uint<64>, b: Uint<64>) -> Uint<64> {
  let product: Uint<128> = a as Uint<128> * b as Uint<128>;
  let renormalized: Uint<64> = (product / SCALE as Uint<128>) as Uint<64>;
  renormalized
}

// ReLU: max(0, x)
circuit relu(x: Uint<64>, is_negative: Bool) -> Uint<64> {
  // x is encoded as unsigned; negative values wrap around the field
  // is_negative is a witness the prover provides
  // Circuit verifies the witness is consistent
  if is_negative {
    assert x > (Field::MAX / 2);  // x is "negative" (high field value)
    0
  } else {
    assert x <= (Field::MAX / 2);
    x
  }
}

The witness is_negative is an example of a "hint" — additional information the prover provides to help the circuit proceed without expensive comparisons. The circuit verifies the hint is consistent with the field element value.

The Commit-Prove-Verify Pattern

Here's the full flow for a private credit score verification:

Step 1: Input Commitment (On-Chain)

Before requesting a score, the user commits to their input data. This creates an on-chain record that the input was fixed before inference began (preventing cherry-picking).

circuit commit_input(
  witness input_vector: Vec<Field, 50>,  // 50 features (income, age, payment history, etc.)
  witness secret: Field,
  witness user_pubkey: PublicKey,

  public input_commitment: Field
) {
  // Hash all input features + secret + owner
  let commitment = poseidon_hash_vec(
    input_vector + [secret, pubkey_to_field(user_pubkey)]
  );
  assert commitment == input_commitment;
}

contract PrivateInference {
  fn commit_input(proof: ZKProof, commitment: Field) {
    verify_proof(proof, commit_input_circuit, [commitment]);
    ledger.pending_commitments.insert(commitment);
  }
}

Step 2: Off-Chain Inference

The user runs the model locally (or on a trusted server they control):

# Python: run the model
import numpy as np
features = load_user_features()  # private data
model = load_scoring_model()     # public model weights
score = model.predict(features)  # e.g., 742

# Generate the ZK proof
prover = CompactProver()
proof = prover.prove(
    circuit="credit_score_inference",
    witnesses={
        "input_vector": features_to_field_elements(features),
        "secret": user_secret,
        "model_weights": load_model_weights_as_field_elements(model),
        "score": score
    },
    public_inputs={
        "input_commitment": user_commitment,
        "score_public": score,
        "threshold": 700
    }
)

Step 3: On-Chain Verification

The proof is submitted to Midnight. The contract verifies it without learning the input features.

circuit credit_score_inference(
  // Private: the actual input data
  witness input_vector: Vec<Field, 50>,
  witness secret: Field,
  witness user_privkey: PrivateKey,

  // Model weights are public (the model is known/verifiable)
  // They're embedded in the circuit or provided as public parameters

  // Intermediate computation values (hints for efficiency)
  witness layer1_output: Vec<Field, 128>,
  witness layer2_output: Vec<Field, 64>,
  witness layer3_output: Vec<Field, 1>,
  witness relu_signs: Vec<Bool, 192>,  // sign hints for all ReLU ops

  // Public outputs
  public input_commitment: Field,
  public score_above_threshold: Bool,
  public threshold: Uint<64>
) {
  let user_pubkey = derive_pubkey(user_privkey);

  // Verify input commitment matches the committed data
  let computed_commitment = poseidon_hash_vec(
    input_vector + [secret, pubkey_to_field(user_pubkey)]
  );
  assert computed_commitment == input_commitment;

  // Verify the commitment exists in pending_commitments
  // (prevents proofs against uncommitted inputs)
  assert ledger.pending_commitments.contains(input_commitment);

  // ---- Model Inference ----
  // Layer 1: 50 inputs → 128 neurons
  // W1 is a 128x50 matrix of public model weights (embedded in circuit)
  let l1_pre = linear_layer_50_128(input_vector, W1, b1);
  for i in 0..128 {
    assert relu(l1_pre[i], relu_signs[i]) == layer1_output[i];
  }

  // Layer 2: 128 → 64
  let l2_pre = linear_layer_128_64(layer1_output, W2, b2);
  for i in 0..64 {
    assert relu(l2_pre[i], relu_signs[128 + i]) == layer2_output[i];
  }

  // Layer 3: 64 → 1 (score output, no activation)
  let score_field = linear_layer_64_1(layer2_output, W3, b3)[0];

  // Convert score to threshold comparison
  // score_field is in fixed-point representation
  let score_int = score_field / SCALE;  // divide by scale factor
  assert (score_int >= threshold) == score_above_threshold;
}

contract PrivateInference {
  fn verify_credit_score(
    proof: ZKProof,
    input_commitment: Field,
    score_above_threshold: Bool,
    threshold: Uint<64>
  ) {
    assert ledger.pending_commitments.contains(input_commitment);

    verify_proof(proof, credit_score_inference_circuit, 
      [input_commitment, score_above_threshold, threshold]);

    // Mark commitment as verified
    ledger.verified_scores[input_commitment] = ScoreResult {
      above_threshold: score_above_threshold,
      threshold: threshold,
      verified_at: block_height(),
    };

    ledger.pending_commitments.remove(input_commitment);
  }
}

The lender queries verified_scores[commitment] and sees: the user committed to specific input data, ran the scoring model on it, and proved (without revealing inputs) that the score is above the threshold. The lender doesn't know the exact score, the input features, or the user's identity unless the user chooses to reveal them separately.

Use Case: Private Medical Diagnosis Verification

A patient wants to prove to an insurance company that their physician's diagnostic model flagged them as requiring a specific treatment, without revealing the raw medical data.

The model inputs might be: blood pressure, glucose levels, symptom duration, medication history — highly sensitive data the patient (rightly) doesn't want the insurer to have verbatim.

The circuit structure is similar to the credit score example, but:

  • The model is the hospital's diagnostic classifier
  • The "score" output is a binary flag (requires_treatment: Bool)
  • The circuit is parameterized with the hospital's verified model weights
  • The hospital's signature on the model weights is included as a public parameter, preventing the patient from using a different model
circuit medical_diagnosis(
  witness patient_data: Vec<Field, 30>,   // clinical measurements
  witness patient_secret: Field,
  witness physician_attestation: Signature, // physician signed the input commitment

  public input_commitment: Field,
  public diagnosis_flag: Bool,
  public hospital_model_id: Field   // identifies which model version was used
) {
  // Verify physician signed the commitment (they reviewed the data)
  assert verify_signature(physician_attestation, input_commitment, HOSPITAL_PUBKEY);

  // Verify input matches commitment
  let commitment = poseidon_hash_vec(patient_data + [patient_secret]);
  assert commitment == input_commitment;

  // Run diagnostic model
  // (model weights for hospital_model_id are embedded in circuit)
  let diagnosis = run_diagnostic_model(patient_data, hospital_model_id);
  assert diagnosis == diagnosis_flag;
}

The physician's signature proves the data wasn't fabricated by the patient. The circuit proves the model was run correctly on the signed data. The insurer sees: a physician-attested diagnosis from a known hospital model. They learn nothing about the clinical measurements.

Compact Pattern for Verifying Committed Inputs Match a Claimed Output

For simpler cases — where the "model" is a small decision tree or a linear classifier — the verification circuit is more compact:

// Simple linear classifier: score = w · x + b > threshold
circuit linear_classifier(
  witness features: Vec<Field, 10>,
  witness secret: Field,

  // Model weights are public (embedded in circuit as constants)
  // W: [w0, w1, ..., w9]
  // b: bias term

  public input_commitment: Field,
  public classification: Bool,   // true if above threshold
  public threshold: Field
) {
  // Verify input commitment
  let commitment = poseidon_hash_vec(features + [secret]);
  assert commitment == input_commitment;

  // Compute dot product
  let mut score: Field = 0;
  for i in 0..10 {
    score = score + W[i] * features[i];
  }
  score = score + B;  // add bias

  // Check threshold
  assert (score > threshold) == classification;
}

This pattern works for any deterministic computation that can be expressed as polynomial constraints. Linear models, small neural networks, decision trees, and rule-based classifiers are all tractable.

Current Limitations and Future Directions

Proof generation time for large models: A 2-layer MLP with 1,000 neurons takes seconds to prove. GPT-2 (117M parameters) would take days or years with current proving systems. This is not a Midnight limitation — it's fundamental to current ZK proof systems. Hardware acceleration (GPU/FPGA provers) and recursive proof composition are active research areas.

Fixed-point precision errors: Converting float32 model weights to fixed-point integers introduces rounding errors. For safety-critical applications (medical), the circuit must use sufficient scale factors to maintain meaningful precision. The accumulated error across layers must be characterized and accepted.

Model update complexity: If the hospital updates their diagnostic model, every circuit that embeds the old weights becomes outdated. New model versions require new circuits with new trusted setup (for SNARK-based systems) or at minimum re-parameterization.

Input encoding for complex data: Text, images, and audio require careful encoding into field elements. A 256x256 grayscale image = 65,536 field elements. Each one is a private witness. The commitment must cover all of them. For large inputs, the commitment circuit itself becomes expensive.

Proving infrastructure: Running the Compact prover for a 5-layer neural network requires significant compute. In practice, users would delegate proof generation to a trusted prover service — re-introducing trust, but different trust than "give the AI service all your data."

The honest trade-off: ZK proofs for AI inference are a promising direction but not currently production-ready for large models. For small models (linear classifiers, small MLPs, decision trees under ~100 nodes), the approach is practical today. For anything larger, expect proving times measured in minutes to hours.

Summary

Private AI inference on Midnight follows a clean three-phase pattern: commit to inputs before inference, run inference off-chain, prove the result on-chain via ZK circuit. The pattern works correctly for small models — linear classifiers, small neural networks, rule-based systems — that can be expressed as polynomial constraints in fixed-point arithmetic. For these cases, Midnight provides a verifiable privacy guarantee: the chain confirms the model was run on specific committed inputs without learning what those inputs were.

The limitations are real and should be stated plainly: large models are not currently feasible to prove, floating-point precision requires careful handling, and proof generation times constrain real-time applications. But for the use cases where the model is small and privacy is critical — credit scoring, eligibility determination, clinical decision support — this architecture offers something no other approach does: a cryptographic guarantee that the claimed computation was performed correctly on private data.

DE
Source

This article was originally published by DEV Community and written by Tosh.

Read original article on DEV Community
Back to Discover

Reading List