GCP Server Deployment

HeatSafe Server Deployment on GCP

This document is the production runbook for the HeatSafe sync server.

Target shape

  • Cloud Run service for the API and dashboard
  • Cloud Run job for database migrations
  • Cloud SQL for PostgreSQL
  • Secret Manager for DATABASE_URL
  • Direct Cloud Run IAP for staff dashboard access

This keeps the first production version small. It avoids a load balancer unless the project later needs custom domains or multi-region routing.

Prerequisites

  • A Google Cloud project with billing enabled
  • gcloud authenticated to the target project
  • Artifact Registry, Cloud Run, Cloud SQL Admin, Secret Manager, and IAP APIs enabled
  • A regional choice shared by Cloud Run and Cloud SQL. Default: us-west1

Required environment values

  • PROJECT_ID
  • PROJECT_NUMBER
  • REGION
  • REPOSITORY for Artifact Registry, example heatsafe
  • IMAGE full Artifact Registry image URL
  • SERVICE_NAME, example heatsafe-server
  • MIGRATION_JOB, example heatsafe-server-migrate
  • INSTANCE_NAME, example heatsafe-pg
  • DATABASE_NAME, example heatsafe
  • DATABASE_USER, example heatsafe_app
  • SERVER_SURFACE for split deployments: api or dashboard

Example shell session:

export PROJECT_ID="your-project-id"
export PROJECT_NUMBER="$(gcloud projects describe "$PROJECT_ID" --format='value(projectNumber)')"
export REGION="us-west1"
export REPOSITORY="heatsafe"
export SERVICE_NAME="heatsafe-server"
export MIGRATION_JOB="heatsafe-server-migrate"
export INSTANCE_NAME="heatsafe-pg"
export DATABASE_NAME="heatsafe"
export DATABASE_USER="heatsafe_app"
export IMAGE="${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPOSITORY}/${SERVICE_NAME}:latest"

1. Enable services

gcloud services enable \
  artifactregistry.googleapis.com \
  run.googleapis.com \
  sqladmin.googleapis.com \
  secretmanager.googleapis.com \
  iap.googleapis.com \
  cloudbuild.googleapis.com \
  monitoring.googleapis.com \
  logging.googleapis.com

2. Create Artifact Registry

gcloud artifacts repositories create "$REPOSITORY" \
  --repository-format=docker \
  --location="$REGION" \
  --description="HeatSafe server images"

If the repository already exists, this command can be skipped.

3. Create Cloud SQL for PostgreSQL

gcloud sql instances create "$INSTANCE_NAME" \
  --database-version=POSTGRES_16 \
  --cpu=2 \
  --memory=8GiB \
  --region="$REGION" \
  --storage-size=20GB \
  --availability-type=zonal \
  --edition=enterprise

gcloud sql databases create "$DATABASE_NAME" \
  --instance="$INSTANCE_NAME"

gcloud sql users create "$DATABASE_USER" \
  --instance="$INSTANCE_NAME" \
  --password="$(openssl rand -base64 24)"

Capture the generated password immediately and store it in a password manager before continuing.

4. Create Secret Manager secrets

The service currently reads a single DATABASE_URL environment variable. For Cloud Run plus Cloud SQL public-IP socket connectivity, store the final connection string as one secret.

Example connection string:

postgres://USER:PASSWORD@/DATABASE?host=/cloudsql/PROJECT:REGION:INSTANCE

Create the secret:

printf '%s' "postgres://${DATABASE_USER}:DB_PASSWORD@/${DATABASE_NAME}?host=/cloudsql/${PROJECT_ID}:${REGION}:${INSTANCE_NAME}" \
  | gcloud secrets create heatsafe-database-url \
      --replication-policy=automatic \
      --data-file=-

Grant the runtime service account access after the service account exists.

5. Build and push the server image

Run this from the repository root:

gcloud builds submit ./server --tag "$IMAGE"

6. Create the Cloud Run runtime service account

gcloud iam service-accounts create heatsafe-server-sa \
  --display-name="HeatSafe Server Runtime"

Grant required roles:

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:heatsafe-server-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

gcloud secrets add-iam-policy-binding heatsafe-database-url \
  --member="serviceAccount:heatsafe-server-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

7. Deploy the migration job

Use the same container image for migrations. The compiled migration entrypoint is dist/src/db/migrate.js.

gcloud run jobs create "$MIGRATION_JOB" \
  --image="$IMAGE" \
  --region="$REGION" \
  --service-account="heatsafe-server-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --set-secrets="DATABASE_URL=heatsafe-database-url:latest" \
  --set-env-vars="NODE_ENV=production" \
  --add-cloudsql-instances="${PROJECT_ID}:${REGION}:${INSTANCE_NAME}" \
  --command="node" \
  --args="dist/src/db/migrate.js"

Run it before the first deploy and before each schema rollout:

gcloud run jobs execute "$MIGRATION_JOB" \
  --region="$REGION" \
  --wait

8. Deploy the Cloud Run service

Google’s current guidance allows enabling IAP directly on Cloud Run. That is the simplest fit for this server because the dashboard already expects the Google-authenticated email header.

For production pilot deployments, prefer two Cloud Run services from the same image:

  • public API service with SERVER_SURFACE=api
  • IAP-protected dashboard service with SERVER_SURFACE=dashboard
gcloud run deploy "$SERVICE_NAME" \
  --image="$IMAGE" \
  --region="$REGION" \
  --service-account="heatsafe-server-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --no-allow-unauthenticated \
  --iap \
  --add-cloudsql-instances="${PROJECT_ID}:${REGION}:${INSTANCE_NAME}" \
  --set-secrets="DATABASE_URL=heatsafe-database-url:latest" \
  --set-env-vars="NODE_ENV=production,SERVER_SURFACE=dashboard" \
  --min-instances=0 \
  --max-instances=10 \
  --cpu=1 \
  --memory=512Mi

If this is the first time IAP is enabled in a project without an organization, Google may require the initial enablement to be done once in the Cloud Run console.

9. Grant IAP access

Grant the IAP service agent permission to invoke the service:

gcloud run services add-iam-policy-binding "$SERVICE_NAME" \
  --region="$REGION" \
  --member="serviceAccount:service-${PROJECT_NUMBER}@gcp-sa-iap.iam.gserviceaccount.com" \
  --role="roles/run.invoker"

Grant staff access through IAP:

gcloud iap web add-iam-policy-binding \
  --region="$REGION" \
  --resource-type=cloud-run \
  --service="$SERVICE_NAME" \
  --member="user:staff1@example.com" \
  --role="roles/iap.httpsResourceAccessor"

Repeat the final command for each user or Google Group that needs dashboard access.

10. Verify the deployment

Check the service configuration:

gcloud run services describe "$SERVICE_NAME" \
  --region="$REGION"

Expected checks:

  • Iap Enabled: true
  • Cloud SQL connection name is attached
  • service account is heatsafe-server-sa@PROJECT_ID.iam.gserviceaccount.com

Operational checks:

  • Confirm GET /health returns ok: true
  • Open /dashboard in a browser while signed in as a Google user or group member with roles/iap.httpsResourceAccessor
  • Run a registration request and a sync batch against the service URL from a trusted client
  • Confirm POST /v1/sync/batches returns serverAckedAt and receiptStatus
  • Confirm POST /v1/sync/acks moves the batch to client_acked

11. Rollout sequence for future releases

  1. Build and push the new image.
  2. Execute the migration job with the new image if the schema changed.
  3. Deploy the service with the new image.
  4. Verify /dashboard, registration, batch upload, and final ACK.

12. Monitoring and alerts

Set these alerts in Cloud Monitoring before production use:

  • Cloud Run 5xx rate above baseline for the service
  • Cloud Run request latency spikes for /v1/sync/batches and /v1/sync/acks
  • Cloud SQL instance CPU, memory, and connection saturation
  • Log-based alert on repeated sync ingest failures or database connection failures
  • Daily check for ingest_batches.receipt_status = 'awaiting_client_ack' older than 24 hours
  • Daily check for rising receipt_status = 'expired' counts

Useful logging filters:

resource.type="cloud_run_revision"
resource.labels.service_name="heatsafe-server"
severity>=ERROR
resource.type="cloud_run_job"
resource.labels.job_name="heatsafe-server-migrate"
severity>=ERROR

13. Notes and limitations

  • The dashboard trust model assumes Google IAP injects the authenticated user email header.
  • Do not expose dashboard routes on a public unauthenticated Cloud Run service. If the mobile API must be public, deploy the API and dashboard as separate Cloud Run services or add a production route-surface guard so public traffic cannot reach the dashboard.
  • The server keeps unconfirmed batch receipts for seven days.
  • The client should only delete local outbox rows after POST /v1/sync/acks succeeds.
  • If the client retries the same batch before the final ACK, the server returns the stored receipt state instead of duplicating data.

14. Future production hardening

The current sync protection is device-token based:

  • POST /v1/sync/batches, POST /v1/sync/acks, and POST /v1/opt-in/disable require a valid bearer device token.
  • Device tokens are generated by the server and only token hashes are stored in PostgreSQL.
  • Upload payloads are schema-validated, request bodies are capped, and duplicate event IDs or batch retries are handled idempotently.

The current weak point is anonymous registration:

  • POST /v1/opt-in/register intentionally allows a new install to register without prior staff approval.
  • Anyone who discovers the endpoint could register a fake device, receive a valid token, and then submit schema-valid fake data.

Before a broader production launch, add one or more enrollment controls:

  • staff-issued participant or study enrollment codes
  • one-time registration invite tokens
  • mobile app integrity checks such as Play Integrity, App Attest, or Firebase App Check
  • rate limiting on registration and sync endpoints
  • abuse monitoring for unusual registration volume, upload volume, locations, or repeated invalid payloads

Dashboard access should be managed at the IAP layer:

  • Cloud Run IAP should authenticate and authorize staff before traffic reaches dashboard routes.
  • Manage dashboard access through Cloud Run IAP IAM, preferably by granting roles/iap.httpsResourceAccessor to a Google Group.
  • ALLOW_INSECURE_DEV_AUTH must stay disabled in production.