Written by

immeas

Published on

February 25, 2025

Web3 Operational Security: Design, Processes, Infrastructure

Enhance web3 OpSec with proven security strategies for protocols. Learn layered defenses, key handling, and secure processes to protect user and protocol funds.

Table of Contents

With the ByBit attack still fresh in my memory, I want to help the web3 community strengthen its operational security (OpSec). This isn’t an article about the ByBit hack specifically but rather actionable steps you can take today to secure your environment and processes. If you're interested in details about the ByBit hack and how to prevent similar incidents, Patrick Collins has two excellent videos: one breaking down the hack and another on how to verify what you sign, also published on Cyfrin’s blog: How To Verify Safe Multi-Sig Wallet Signatures.

This guide is aimed at protocol developers, maintainers, and those responsible for managing millions or even billions in user and protocol funds. It’s not intended for end users—there are plenty of excellent guides for that. If you're building and maintaining a protocol that handles significant funds, you must balance convenience and security far more carefully than an average user, especially since you’re managing other people's or an organization's money.

Before auditing blockchain protocols, I spent over a decade as a developer in the card payments industry. I worked with PCI DSS requirements, participated in audits, provided evidence, and maintained compliant environments both on-prem and in the cloud. For those unfamiliar with PCI DSS, it stands for Payment Card Industry Data Security Standard. This organization sets the security standards required for processing Visa, MasterCard, and other card brands.

Although PCI DSS primarily focuses on securing payment environments and cardholder data, many of the processes it enforces are also great for operational security. Web3 can learn a lot from these practices.

In the context of blockchains, I highly recommend reading the SEAL 911 team's comprehensive best practices. It's extensive and covers all aspects of OpSec thoroughly. Read it, bookmark it, and revisit it regularly. It’s essentially the holy grail for web3 OpSec.

This article summarizes SEAL’s best practices and adds actionable insights, inspired by PCI DSS requirements related to documentation maintenance and scheduled reviews.
‍

Defense in layers

The core principle of defense is depth. “Depth” can be defined in many ways. In security, we talk about a multi-layered approach. This includes the organizational structure, development practices, operational processes, off-chain environments, key handling, on-chain contracts, etc. Everything should be designed so that even if one or two layers are compromised, the system remains secure, and catastrophic loss is avoided.

Every layer must be hardened and resilient because one mistake shouldn’t spell disaster. We’re all human; we get socially engineered and click on the wrong link. A leaked private key or even hiring a North Korean hacker shouldn't put protocol or user funds at risk.

Any system can be breached: any server, any software can harbor a zero-day exploit, and any provider can suffer a data leak. You must plan and design for this eventuality. That’s what defense in layers means—don’t trust any layer implicitly.

We must build more robust, resilient platforms and enforce vigilant, trustless processes.
‍

Design on-chain for security

Any on-chain design aims to minimize attack surfaces and ensure that damage is contained if one part of the system is compromised.

Permissionless contracts: Make contracts as permissionless as possible. The fewer privileges a compromised account has, the less damage it can cause. While permissionlessness is a core principle of decentralized finance (DeFi), it’s also a defensive strategy. By limiting privileged actions, you minimize the risk that a single compromised key can wreak havoc on your system.
‍
Keeper/maintenance bots: These bots should operate without privileged access whenever possible. If escalated privileges are necessary, store keys securely using trusted services from major cloud providers. These keys should never be directly accessible to developers. Any non-application access should be logged and require authorization from security officers. This helps prevent accidental exposure or misuse of sensitive keys. Services like Chainlink Automation can assist with this.
‍
Time-lock privileged roles: Roles with elevated privileges should be assigned to time-locked contracts. This setup introduces a deliberate delay for sensitive actions, giving trusted guardians (whose security you’ve fully verified) the ability to cancel unauthorized changes before they are executed.
‍
Multi-sigs or DAOs for changes: No single person should be able to push through changes alone. Critical actions should require approval from multiple trusted parties, ensuring transparency and reducing the risk of internal abuse or external compromise.
‍
Guardian accounts: Guardian accounts must also be controlled by multi-sigs. This adds another layer of protection and prevents any one account from becoming a single point of failure.
‍
Pause and emergency functions: Simulate and test various failure scenarios. Make sure that emergency mechanisms are robust and allow you to quickly halt operations if serious issues arise.
‍

Key handling and signing

SEAL 911’s recommendations and best practices go into great detail on key handling and transaction signing. The most important points to remember are:

Encrypt your keys: Use secure vaults and secret management systems to protect sensitive information.
‍
Verify calldata: As mentioned above, re-watch Patrick’s excellent guide on how to verify calldata (then re-read the blog post and bookmark both). Establish a verification procedure where users must prove that they've verified the calldata (using a photo of the hash or similar verification data).
- Many organizations don’t have the technical expertise in-house to do this, which is a shame. If this sounds like your organization, you must have at least one person who can do this and make their signature and verification required for all transactions. Just be aware that you are creating a single point of failure! A better solution is to train every signer to verify calldata and require verification for all transactions.
  ‍
Dedicated signing devices: Dedicate computers exclusively for signing transactions. This limits exposure to malware that could intercept sensitive actions.
‍
Diversify key management: Spread your keys across different setups and environments. Using different hardware wallets and operating systems reduces the chance that a single vulnerability will expose all your funds.
‍

Off-chain security

Maintain least privilege access: Grant the minimum permissions necessary for users and applications to function.
‍
Follow cloud security best practices: Each major cloud provider offers resources, certifications, and courses focused on security. Understanding your provider’s best practices helps prevent misconfigurations that could lead to breaches.
‍
Lock down environments:
- Use Kubernetes network policies to enforce a default-deny posture.
- Implement strict firewall rules (iptables) for virtual machines to block unnecessary access.
  ‍
Secure microservice communication:
- Use the latest versions of TLS to secure data in transit. Limit TLS to only allow modern strong ciphers (TLSv3).
- Enforce strict authentication and authorization using standards like OAuth2.
  ‍
Regular updates:
- Apply security patches on a consistent schedule, whether monthly or quarterly.
- Follow relevant security channels to stay up to date with critical security patches and document the change process for applying them promptly.
  ‍
Screen network traffic: Use OWASP security packages to protect your endpoints from known threats.
‍

Development and change practices

Never store keys in plain text! Say it again. Write it down. Make it your mantra.
‍
Test thoughtfully: Don’t rely solely on code coverage. Focus on edge cases and failure scenarios. Incorporating elements of Test-Driven Development (TDD) can improve your design and security mindset.
‍
Mandatory code reviews.
‍
Use a CI/CD pipeline that guarantees that only code that is reviewed and scanned for vulnerabilities can be deployed to production.
‍
Deployment procedures: Document deployment processes and consider potential failure points. Clear contingency plans prevent disasters during deployments.
‍
Configuration review: All changes should go through a peer review to reduce the likelihood of human error.
‍
Emergency changes: Document and rehearse emergency response procedures so your team knows what to do under pressure.
‍

Ongoing operations

Operational security is a shared responsibility, and accountability is essential.

Assign a security function: Designate a person or team as responsible for maintaining security processes and documentation.
‍
Responsibilities include:
- Regularly reviewing logs from critical systems.
- Participating in development and design processes to ensure security.
- Overseeing configuration changes.
- Vetting third-party dependencies for security vulnerabilities.
- Ensuring security-related documentation is up-to-date.
  ‍

Review process

Regular reviews are crucial to maintaining security over time. This is how you ensure your security practices remain aligned with the latest standards. It's equally important to regularly evaluate and challenge your processes to identify potential weaknesses and ensure your systems remain resilient against emerging threats. Here’s a schedule inspired by PCI DSS best practices (requires registration):
‍

Daily

Monitor logs from security-critical systems such as firewalls, intrusion detection systems (IDS), and authentication servers, as well as events emitted and critical transactions.
Disable terminated user accounts.

Quarterly

Review key custody and custodians and confirm that keys haven’t been tampered with.
Review all user access levels to ensure that no one has unnecessary privileges.
Replace expiring certificates proactively.

Biannually

Reassess all privileged account permissions. In this context, privileged accounts refer specifically to bots or services.
Audit all security configurations across your environment, including cloud security settings and repository and CI configurations like GitHub. Ensure they are up-to-date with the latest best practices.
Review and update all security-related documentation, including change management, deployment procedures, development workflows, and onboarding/offboarding processes. Ensure these documents remain clear, comprehensive, and aligned with current security best practices to address evolving threats.

Annually

Conduct full-scale incident response exercises to test organizational readiness.
Review and update DevOps and developer training programs. Ensure that everyone with access to critical environments has the necessary knowledge and up-to-date understanding of security protocols, tools, and best practices relevant to their roles.
‍

How you can help

The SEAL 911 best practices are a living document—your contributions can help improve them. Join the effort here and help web3 OpSec.
‍