Zero-Downtime Storage Migration from S3 to GCS

The Problem

We migrated our frontend and backend services from AWS to GCP. The migration itself went smoothly, but shortly after we noticed that infrastructure costs had increased. The cause was cross-cloud egress: our file storage was still on AWS S3, and every read request from our GCP-hosted services was crossing cloud boundaries and incurring transfer fees.

The platform served around six million users. Files were referenced across multiple MongoDB collections as absolute S3 URLs (strings like https://s3.amazonaws.com/our-bucket/path/to/file). The DevOps team proposed the straightforward approach: copy all files from S3 to GCS manually, update every database record to replace the old URLs, verify the results, and decommission S3. Given the volume of records involved, that plan carried real risk. A partial update could leave the application in an inconsistent state, and the scope of the database migration made rollback difficult.

Investigation

Before committing to the manual approach, I looked into what GCP offered natively for this kind of transfer. GCP has a Storage Transfer Service that can pull data directly from an S3 bucket into a GCS bucket with no intermediary machines or custom scripts. More importantly, it supports incremental sync, meaning it can keep GCS up to date with any new files written to S3 during the migration window. That eliminated the riskiest part of the plan: the gap between when you start copying and when you finish.

The second problem, updating millions of database records, was still open. I wanted to avoid that entirely. Our backend was a Next.js API that read file URLs from MongoDB and returned them in responses. That meant there was exactly one place in the system where S3 URLs became visible to the frontend: the API response layer. If I could intercept URLs there, I could translate them to GCS equivalents without touching the database at all.

Root Cause

The cost increase was a direct consequence of cross-cloud network traffic. But the reason the fix felt so risky was a data modelling decision made long before the migration: file references were stored as absolute provider-specific URLs rather than as relative paths or storage-agnostic keys. That tight coupling meant changing storage providers appeared to require updating every record, when in practice the data itself was fine and only the interpretation of it needed to change.

The Fix

We ran the migration in four stages, with the application live throughout.

Stage 1: Bulk transfer. We configured GCP's Storage Transfer Service to copy all existing files from S3 to GCS. The bucket structure was preserved exactly, so every file was reachable at an equivalent GCS path.

Stage 2: Live sync. Before making any application changes, we enabled continuous sync on the transfer job. New files written to S3 by the running application were automatically mirrored to GCS within minutes. This gave us a verified, consistent copy before changing anything in the application code.

Stage 3: URL transformation layer. We added a small utility at the API layer that detected S3 URLs in outgoing responses and rewrote them to their GCS equivalents. The transformation was deterministic: the bucket path was the same in both providers, so converting https://s3.amazonaws.com/our-bucket/path/to/file to https://storage.googleapis.com/our-bucket/path/to/file was a straightforward string replacement. We deployed this across environments one at a time (dev, staging, then production) and confirmed that file URLs in responses were resolving correctly before moving on.

Stage 4: Switch writes and decommission. Once the read path was confirmed working via GCS, we updated all write paths to store new files directly in GCS. New uploads went straight to GCS; old files continued to be served through the transformation layer. After a validation period with no active traffic reaching S3, we disabled the sync job and decommissioned the bucket.

The database was never touched. The migration went to production with zero downtime.

Lessons Learned

Check what the platform provides before building custom tooling. GCP's Storage Transfer Service handled the bulk copy and live sync natively. The initial plan to write scripts and copy files manually would have been slower and more error-prone.
A thin transformation layer can eliminate a whole class of migration risk. The database stored absolute URLs, which looked like it required a mass update to fix. Adding translation at the API boundary bypassed that entirely. The data did not change; only the layer that interpreted it did.
Live sync before cutover removes the most dangerous variable. Because GCS was continuously updated during the migration window, there was no race condition between finishing the copy and switching the application over. We cut over to a known-good, current dataset.