Troubelshooting

At blocktorch we aim at being a reliable partner of our users throughout the whole software development lifecycle, and also equip engineers with the right tooling and data insights when troubleshooting. Below is an outline of steps that can be taken during the troubleshooting process and how blocktorch's toolkit can be of help during the process. Our mission is to help making web3 more reliable and secure for everyone involved.

1. Initial Assessment

  • Identify Symptoms: Determine if the issue is a downtime or a security breach.

    • Building the relevant monitors in blocktorch can be crucial in identifying systems as fast as possible. Blocktorch ships some KPIs and visuals out of the box, which can help in identifying issues, but you know your software better than us, so building custom monitors is an important practice. From the charts you can directly navigate to the related logs by clicking the data points in the chart you need to investigate further

  • Scope of Impact: Assess the extent—how many services, users, or systems are affected.

2. Communication

  • Notify Stakeholders: Inform the relevant team members and stakeholders about the issue.

  • External Communication: If necessary, prepare a communication plan for customers or the public.

    • Home dashboards as well as search queries are shareable also with external stakeholders, so your community of users can get informed as well

3. Isolation

  • Isolate Affected Systems: To prevent further damage, isolate the compromised or malfunctioning components.

  • Limit Access: Restrict access to sensitive systems until the nature and scope of the issue are understood.

    • you can disable functionalities in your UI

    • if your smart contracts are built with the functionality to pause functions, you can think of doing so

4. Investigation

  • Review Logs: Check application, security, and system logs for anomalies or indicators of the cause by utilizing blocktorch's search

  • Identify Vulnerabilities: Look for any vulnerabilities or errors that might have led to the issue.

    • Your smart contract code can be directly accessed in blocktorch's contract details page

    • Blocktorch's step debugger can help you find the exact line of code causing the vulnerability or error

    • If your application is using Oracles and you believe the root cause could be there, you can check the Oracle's out of the box details

5. Mitigation

  • Patch and Update: Apply necessary patches or updates to software to mitigate the vulnerability or error.

    • We are aware that this can be especially hard when the root cause lies within the smart contract, unless your project utilizes upgradable contract architecture

  • Make aware your users: If a security breach is confirmed, prompt users to not sign any malicious smart contract interactions

6. Recovery

  • Restore Services: Gradually restore services, ensuring they are fully sanitized and secure.

7. Postmortem Analysis

  • Analyze Causes: Thoroughly document what happened, why it happened, and how it was resolved.

  • Review Processes: Evaluate and update security policies, response strategies, and monitoring techniques to prevent future incidents.

8. Ongoing Monitoring

  • Continuous Monitoring: Implement additional custom real-time monitoring in blocktorch to detect future issues promptly

  • Regular Audits: Schedule regular security audits to ensure ongoing compliance and security.

Last updated