Why Do Tests Pass on Hardhat/Anvil Forks but Break on Mainnet? What Hidden Differences Are We Missing?

Tushar Dubey

Tushar Dubey

@DataChainTushar
Published: Nov 21, 2025
Updated: Jun 16, 2026
Views: 528

I’ve hit this pain point so many times that I genuinely stopped trusting “all tests passing” unless I run them against a real mainnet RPC.

Here’s the pattern: everything works fine on Hardhat/Anvil forks — clean state, predictable gas, zero RPC jitter, instant block confirmations. But the moment the contract hits mainnet, random behaviours start appearing: timestamp drift breaking vesting logic, oracle updates arriving out of sync, state bloat pushing gas beyond estimates, or signatures behaving differently because the chain ID isn’t mocked properly.

My suspicion is that local forks hide too much real-chain noise. They’re great for dev speed, but they don’t reflect the messy, evolving, unpredictable nature of actual networks.

I want to hear from people who’ve debugged real production issues:
Which subtle fork-vs-mainnet differences burned you the most?
Was it mempool congestion? Different gas spikes? Reentrancy paths behaving differently? State residue? Block builder differences? RPC provider inconsistencies?

Also — how do you mitigate this?
Do you run linear tests without snapshots? Replay entire transaction batches? Mock network variables? Fork from the latest block? Use multiple RPCs?

Replies

Welcome, guest

Join ArtofBlockchain to reply, ask questions, and participate in conversations.

ArtofBlockchain powered by Jatra Community Platform

  • Priya Gupta

    Priya Gupta

    @CryptoSagePriya Nov 2, 2025

    This is one of those problems you only respect after getting burned in production.
    For us, the biggest culprit was mempool behaviour, not the contract code itself. Local forks have no mempool congestion, no MEV bots, and no delay between tx broadcast → inclusion. On mainnet, even a 3–5 second delay changed the ordering of two dependent transactions and caused a settlement mismatch that never appeared locally.

    Another huge difference is real oracle cadence. Chainlink price feeds don’t update at the neat intervals your local mock assumes. During volatile markets, updates cluster — and if your logic depends on “freshness checks,” local environments never reproduce that burst pattern.

    Finally: gas estimation lies on forks. Mainnet gas surfing, refunds, and hot storage slots change everything. A function that cost 110k locally suddenly cost 135k on mainnet because surrounding storage slots weren’t empty anymore.

    The only mitigation that worked for us:
    run a nightly test suite on real mainnet RPCs with no snapshots and sequential state growth.
    It’s painfully slow — but brutally honest.

    DeFiArchitect

    DeFiArchitect

    @DeFiArchitect Jun 16, 2026

    This point about not relying only on clean fork snapshots is important.

    In interviews also, I feel “my tests pass on Hardhat/Anvil” is becoming a weak answer unless the candidate can explain what may still break on mainnet. RPC differences, oracle freshness, mempool ordering, gas estimation, storage growth, old user balances, chain ID assumptions, and signature behaviour can all change the result.

    So maybe the better interview question is not only “did you write tests?” but “can you explain why Hardhat and Anvil fork tests pass locally but fail on Ethereum mainnet, and how you would verify the issue before calling it production-ready?”

    Curious how others handle this in their own testing flow — do you test against multiple RPC providers, dirty fork state, or only one controlled mainnet fork?

  • ChainMentorNaina

    ChainMentorNaina

    @ChainMentorNaina Nov 3, 2025

    For me the “aha moment” came from storage residue. I always assumed forking “latest block” meant I was testing real state, but in reality Hardhat drops a lot of low-level traces and storage warm/cold slot patterns differ. When I replayed our staking flows on a full archive RPC, gas spiked by 20–30% purely because the storage tree was already bloated from years of writes.

    Another trap: some RPCs compress traces or throttle logs. My tests passed on Alchemy but failed on Infura because of slight differences in how they returned historical calls. After that, I started running every critical test across multiple RPC providers.

  • Abdil Hamid

    Abdil Hamid

    @ForensicBlockSmith Nov 3, 2025

    My personal trap was evm_snapshot addiction 
    Locally everything felt “clean” — reset state, run again, perfect outputs. On mainnet nothing resets, and the moment I switched to linear tests (no snapshot, no fresh fork), random state bloat started breaking assumptions. Even integer rounding behaved differently because accumulated dust values were now real.

    I now force myself to run a “dirty chain simulation” once a week — 300–400 transactions in a row without resets. That was the only way I caught subtle storage-packing problems.

  • BlockchainMentorYagiz

    BlockchainMentorYagiz

    @BlockchainMentor Nov 4, 2025

    One thing nobody warned me about: chain-ID mismatches.
    Locally everything signs smoothly because the chain ID is whatever your config says. On mainnet, an EIP-155 mismatch broke every signature in our relayer flow. Now I randomize chain IDs and run the suite with multiple env files to make sure nothing is hardcoded. It’s a small check but saves massive pain.

  • Shubhada Pande

    Shubhada Pande

    @ShubhadaJP Nov 21, 2025

    This is exactly why “all tests pass locally” is not enough proof in smart contract interviews.

    A stronger interview answer is not just: “I tested it on Hardhat or Anvil.” The stronger answer is: “I checked what can change between a local fork and mainnet — RPC behaviour, oracle freshness, mempool ordering, gas estimation, storage residue, chain ID, signature assumptions, block timing, and whether the test still holds when state grows without snapshots.”

    That is also what many Web3 hiring teams quietly look for when they ask debugging questions. They are not only checking whether a junior Solidity developer or blockchain QA engineer knows Hardhat, Foundry, Anvil, or fork testing. They are checking whether the candidate can explain why Hardhat and Anvil fork tests pass locally but fail on Ethereum mainnet, and how they would prove mainnet-readiness before calling the contract production-safe.

    For anyone preparing for smart contract debugging rounds, this thread connects well with the Smart Contract Interview Prep Hub here:
    Smart Contract Interview Prep: Technical, Security, Debugging & Founder Rounds Explained | ArtofBlockchain

    Also related:
    Flaky smart contract tests and blockchain QA automation across networks:
    As a Blockchain QA Engineer, How Do You Deal With Flaky Smart-Contract Tests That Fail Only on CI? | ArtofBlockchain


    If your resume or portfolio says “Solidity testing,” “Hardhat,” “Foundry,” “Anvil,” “fork testing,” or “blockchain QA automation,” this is the kind of debugging explanation that should be visible somewhere — not as a long theory note, but as one clear example of how you handled local-fork-versus-mainnet failure.