The SHA-1 algorithm (Secure Hash Algorithm 1) takes an input of any length and produces a fixed 160-bit (20-byte) hash digest — a deterministic fingerprint of the data. Published by the U.S. National Institute of Standards and Technology (NIST) in 1995 under FIPS 180-1, SHA-1 was the backbone of TLS certificates, PGP signatures, and Git's object model for over two decades. Its collision resistance was conclusively broken in 2017 when Google and CWI Amsterdam produced two distinct PDF files with an identical SHA-1 hash for roughly $110,000 in cloud compute. Understanding how the SHA-1 algorithm works internally — its padding scheme, message schedule, compression function, and four-group round structure — is foundational knowledge for any security practitioner evaluating cryptographic risk in legacy systems.
What Is SHA-1 and How the SHA-1 Algorithm Fits in Cryptography
A cryptographic hash function is a one-way mathematical function: it maps an input of arbitrary length to a fixed-size output, and it is computationally infeasible to reverse that mapping. SHA-1 produces a 160-bit output, always rendered as a 40-character hexadecimal string.
For any secure hash function to be useful, it must satisfy three properties:
- Preimage resistance: Given a hash value H, it must be computationally infeasible to find any message M where hash(M) = H.
- Second preimage resistance: Given a known message M1, it must be infeasible to find a different message M2 where hash(M1) = hash(M2).
- Collision resistance: It must be infeasible to find any two distinct messages M1 and M2 where hash(M1) = hash(M2).
SHA-1 satisfies preimage and second preimage resistance in practice, but its collision resistance is broken — which is why it can no longer be trusted for digital signatures, certificate validation, or any context where an attacker might forge two documents with the same hash.
SHA-1 is built on the Merkle-Damgård construction (named after Ralph Merkle and Ivan Damgård, who independently described it in 1989). In this design, the input message is split into fixed-size blocks and fed sequentially through a compression function. Each block's output becomes the "chaining value" fed into the next block. The final chaining value is the hash digest. SHA-1 inherits this structure from its predecessors MD4 and MD5 — which is both the source of its original strength and the root of its eventual weakness.
Input message of any length
│
▼
┌────────────────────────────────────┐
│ Padding: make length ≡ 448 mod 512│
│ + append 64-bit original length │
└──────────────────┬─────────────────┘
│
┌─────────────▼──────────────┐
│ Parse into 512-bit blocks │
│ M[0], M[1], ..., M[N-1] │
└─────────────┬──────────────┘
│
┌─────────────▼──────────────────────────────┐
│ Initialise H0–H4 with fixed IV constants │
└─────────────┬──────────────────────────────┘
│
┌─────────────▼──────────────────────────────┐
│ For each block M[i]: │
│ 1. Expand 16 words → 80 words (W[0..79])│
│ 2. 80-round compression on a,b,c,d,e │
│ 3. Add round output to H0–H4 │
└─────────────┬──────────────────────────────┘
│
┌─────────────▼──────────────┐
│ Digest = H0‖H1‖H2‖H3‖H4 │
│ (160-bit / 40 hex chars) │
└────────────────────────────┘
Step 1: Padding the Message
Before any block is processed, SHA-1 pads the input so its total bit length is congruent to 448 modulo 512. This leaves the final 64 bits of the last block available to encode the original message length — a technique called Merkle-Damgård strengthening that prevents certain length-extension attacks.
The padding procedure:
- Append a single
1bit to the end of the message (represented as byte0x80if the message is byte-aligned). - Append
0bits until the message length in bits ≡ 448 (mod 512). - Append the original message length as a 64-bit big-endian unsigned integer.
Concrete example — padding the string "abc" (3 bytes = 24 bits):
Original: 61 62 63
Step 1: 61 62 63 80
Step 2: 61 62 63 80 00 00 00 00 ... 00 00 (total 56 bytes = 448 bits)
Step 3: 61 62 63 80 00 ... 00 00 00 00 00 00 00 18
└─64-bit LE: 24 decimal = 0x18┘
Final block: 512 bits exactly
If the original message is already close to a 512-bit boundary and appending the 1 bit pushes it past 448 bits, an entire additional 512-bit padding block is appended.
Step 2: Parsing Into 512-bit Blocks
The padded message is split into N blocks of exactly 512 bits each. Each block M[i] is further divided into sixteen 32-bit words: W[0] through W[15]. These are treated as big-endian unsigned integers.
Step 3: Initialising the Five Hash State Variables
Before the first block is processed, five 32-bit state variables (the Initialisation Vector, or IV) are set to fixed constants. These values are not random — they are derived from the fractional parts of the square roots of small primes, a technique used to demonstrate there is no hidden trapdoor:
| Variable | Initial Value | |———-|—————-| | H0 | 0x67452301 | | H1 | 0xEFCDAB89 | | H2 | 0x98BADCFE | | H3 | 0x10325476 | | H4 | 0xC3D2E1F0 |
These five 32-bit values concatenated give the 160-bit output space. They are also shared with MD4 and MD5, which reflects SHA-1's design lineage.
Step 4: The SHA-1 Algorithm Compression Function — 80 Rounds
This is the cryptographic core of the SHA-1 algorithm. For each 512-bit input block, SHA-1 runs an 80-round compression function that mixes the message words into the state variables. It operates in two sub-phases: message schedule expansion and the state update loop.
4a: Message Schedule — Expanding 16 Words to 80
The sixteen input words W[0..15] are expanded into 80 words W[0..79]. The first 16 words come directly from the block. Words 16 through 79 are computed as:
W[i] = ROTL(W[i-3] XOR W[i-8] XOR W[i-14] XOR W[i-16], 1)
Where:
XORis the bitwise exclusive-OR operationROTL(x, n)means rotate the 32-bit wordxleft bynbit positions (bits shifted off the left end re-enter on the right)- The
1inROTL(..., 1)was added in SHA-1 to fix a differential cryptanalysis weakness present in SHA-0's message schedule, which lacked this rotation
This expansion ensures that every output bit of the final hash depends on every bit of the input message.
4b: Working Variable Initialisation
Five working variables a, b, c, d, e are initialised to the current hash state:
a = H0, b = H1, c = H2, d = H3, e = H4
4c: 80 Rounds — Four Groups, Four Functions, Four Constants
SHA-1's 80 rounds are divided into four groups of 20. Each group uses a different logical function f and a different round constant K. The function and constant change the mixing behaviour across groups, increasing diffusion:
| Rounds | Logical Function f(b, c, d) | Constant K | Function Name | |——–|——————————-|——————|—————| | 0–19 | (b AND c) OR ((NOT b) AND d) | 0x5A827999 | Choice (Ch) | | 20–39 | b XOR c XOR d | 0x6ED9EBA1 | Parity | | 40–59 | (b AND c) OR (b AND d) OR (c AND d) | 0x8F1BBCDC | Majority (Maj)| | 60–79 | b XOR c XOR d | 0xCA62C1D6 | Parity |
- Choice (Ch): uses
bto select between bits ofc(ifb=1) ord(ifb=0). - Majority (Maj): outputs the majority vote of corresponding bits in
b,c, andd. - Parity: simple XOR across all three — used in two groups for cheaper computation.
Each round t (0 ≤ t ≤ 79) computes:
temp = ROTL(a, 5) + f(b, c, d) + e + K[t] + W[t]
e = d
d = c
c = ROTL(b, 30)
b = a
a = temp
- All additions are modulo 2^32 (32-bit word size, overflow discarded)
ROTL(a, 5)rotatesaleft 5 bits — introducing non-linearityROTL(b, 30)rotatesbleft 30 bits — equivalent to a right rotation by 2 bits
The working variables shift positions each round: e receives the old d, d the old c, and so on. Only a receives the newly computed temp. This register-shift pattern is borrowed from MD4/MD5.
4d: Updating the Hash State (Feed-Forward Addition)
After all 80 rounds complete for a block, the round output is added back into the hash state (modulo 2^32):
H0 = H0 + a
H1 = H1 + b
H2 = H2 + c
H3 = H3 + d
H4 = H4 + e
This feed-forward addition is the Merkle-Damgård strengthening mechanism. It means the state after processing block N depends on the original IV, every bit of every block 0 through N-1, and block N itself — making the construction resistant to simple inversion.
Step 5: Producing the Final Digest
After all N message blocks have been processed, the five 32-bit state values are concatenated in order to form the 160-bit (40 hex character) message digest:
digest = H0 ‖ H1 ‖ H2 ‖ H3 ‖ H4
For the input "hello world":
H0=2aae6c35 H1=c94fcfb4 H2=15dbe95f H3=408b9ce9 H4=1ee846ed
→ 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
SHA-1 Algorithm in Code
Python
import hashlib
data = b"hello world"
digest = hashlib.sha1(data).hexdigest()
print(digest)
# 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
Java
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.HexFormat;
public class SHA1Example {
public static String sha1(String input) throws NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("SHA-1");
byte[] hash = md.digest(input.getBytes());
return HexFormat.of().formatHex(hash);
}
public static void main(String[] args) throws NoSuchAlgorithmException {
System.out.println(sha1("hello world"));
// 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
}
}
Bash (OpenSSL)
echo -n "hello world" | openssl dgst -sha1
# SHA1(stdin)= 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
# Verify a file's SHA-1 integrity
openssl dgst -sha1 firmware.bin
Go
package main
import (
"crypto/sha1"
"encoding/hex"
"fmt"
)
func main() {
h := sha1.New()
h.Write([]byte("hello world"))
fmt.Println(hex.EncodeToString(h.Sum(nil)))
// 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed
}
Note: all standard library implementations above operate on bytes, not strings. For non-ASCII inputs, ensure consistent encoding (UTF-8 is standard) before hashing, or the same logical string will produce different digests across languages.
SHA-1 Security: The SHAttered Attack and Why the Algorithm Is Broken
SHA-1's security history has three distinct phases: theoretical concern, demonstrated weakness, and practical compromise.
2005: Theoretical Break
Cryptographer Xiaoyun Wang and colleagues published a collision attack requiring only 2^69 SHA-1 computations — compared to the 2^80 expected for a brute-force birthday attack. This did not immediately produce collisions but proved SHA-1 was structurally weaker than designed. NIST responded by accelerating SHA-2 adoption guidance.
2011: NIST Formal Deprecation
NIST formally deprecated SHA-1 in SP 800-131A, prohibiting its use for generating new digital signatures in U.S. Federal systems. SHA-256 (part of the SHA-2 family) became the mandated replacement.
2017: The SHAttered Attack — Practical Collision
On February 23, 2017, researchers from CWI Amsterdam (Centrum Wiskunde & Informatica, the Dutch national research institute for mathematics and computer science) and Google published the SHAttered attack, the first practical SHA-1 collision. They produced two distinct PDF files with identical SHA-1 hashes.
The cost and scale:
- Approximately 2^63.1 SHA-1 evaluations — roughly 100,000 times faster than the theoretical 2^80 birthday-attack baseline
- Equivalent to 6,500 years of single-CPU computation, or 100 years of GPU computation
- An estimated $110,000 USD on Amazon Web Services — reachable by any nation-state actor or well-funded criminal group
- The cost has decreased further since 2017 as cloud compute has become cheaper
The attack exploits a differential path through SHA-1's message schedule — specifically, the weak diffusion in early rounds where the linear word expansion (despite the ROTL-1 fix) still allows controlled differences to be introduced and then cancelled before the hash state diverges unrecoverably.
The two collision PDFs remain publicly available at shattered.io, and the team provides a free online tool to test arbitrary files for SHA-1 collision patterns.
CWE Classification
Using SHA-1 for collision-sensitive contexts is catalogued as CWE-328: Use of Weak Hash in the MITRE Common Weakness Enumeration (a standardised taxonomy of software security weaknesses). Static analysis tools such as Semgrep, CodeQL, and Checkmarx will flag MessageDigest.getInstance("SHA-1") and hashlib.sha1() calls as CWE-328 findings.
Deprecation Timeline
| Year | Event | |——|——-| | 1995 | SHA-1 published (FIPS 180-1) | | 2005 | Wang et al. demonstrate 2^69 collision attack | | 2011 | NIST formally deprecates SHA-1 for digital signatures (SP 800-131A) | | 2013 | SHA-1 disallowed for U.S. Federal digital signatures | | 2017 | SHAttered published; all major browsers drop SHA-1 TLS certificates | | 2020 | Microsoft ends SHA-1 code signing for Windows Update (August 3) | | 2026 | Fully obsolete for any security-critical application |
Where SHA-1 Still Appears in Production Systems
Despite its broken status, SHA-1 persists in several common contexts that security teams encounter during assessments:
Git repositories: Git's object model historically identifies every commit, tree, blob, and tag using SHA-1. This is why every git log entry shows a 40-character hex string. Git 2.29 added experimental SHA-256 object format support, but the vast majority of repositories — including GitHub — still use SHA-1. The practical risk is low because Git uses SHA-1 for integrity, not authentication, and a chosen-prefix collision attack against a Git repo requires an attacker who already has write access — but it is a documented theoretical risk. See Git's hash function transition document for migration guidance.
HMAC-SHA1: SHA-1 used inside HMAC (Hash-based Message Authentication Code — a construction that uses a secret key alongside the hash function) remains secure despite SHA-1's collision weakness. Collision attacks apply to the standalone hash function, not HMAC. OAuth 1.0 signatures (HMAC-SHA1) are technically still safe, though migration to HMAC-SHA256 is recommended for new systems.
Legacy TLS cipher suites: Some embedded devices, industrial control systems, and older VPN appliances negotiate TLS_RSA_WITH_AES_128_CBC_SHA — a cipher suite that uses SHA-1 for the MAC. This does not expose the protocol to SHAttered-style attacks (SHA-1 is not used for certificate authentication in this context), but it is a finding in TLS audits and should be addressed. Related: organisations assessing supply-chain integrity of embedded software often encounter SHA-1 in firmware signing — see our guide on detecting sleeper packages in Ruby and Go supply chains for patterns that apply equally to hash verification workflows.
Password storage (historical): Unsalted SHA-1 was widely used as a password "hash" in early web applications. It is entirely unsuitable: SHA-1 is fast (by design), enabling billions of guesses per second on consumer GPUs. Password breaches at early-2000s platforms routinely yield SHA-1 hashed credentials that are cracked within hours. Authentication bypass vulnerabilities often trace back to misuse of weak hash functions in credential comparison.
SHA-1 vs. SHA-256 vs. SHA-3-256
When choosing a replacement, understand what each algorithm offers:
| Property | SHA-1 | SHA-256 (SHA-2) | SHA-3-256 | |——————-|—————–|———————|———————-| | Output size | 160 bits | 256 bits | 256 bits | | Block size | 512 bits | 512 bits | 1,088 bits | | Rounds | 80 | 64 | 24 | | Construction | Merkle-Damgård | Merkle-Damgård | Keccak sponge | | Collision security| <63 bits broken | 128 bits | 128 bits | | Preimage security | ~160 bits | 256 bits | 256 bits | | Standard | FIPS 180-1 (1995)| FIPS 180-4 (2012) | FIPS 202 (2015) | | Status | Deprecated | Secure | Secure | | Performance | Fastest | Fast | Moderate |
SHA-256 is the recommended default replacement. It uses the same Merkle-Damgård construction as SHA-1 but with a larger 256-bit state, a more complex message schedule (σ0, σ1 functions using right rotations and right shifts instead of a single left rotation), and 64 rounds with 64 distinct round constants derived from the cube roots of the first 64 primes.
SHA-3 (standardised 2015) uses the completely different Keccak sponge construction — a mathematical sponge that absorbs input bits into a large state and then squeezes out the digest. It was selected precisely as a hedge: if a Merkle-Damgård-specific weakness is ever found that affects both SHA-1 and SHA-256, SHA-3 remains unaffected.
Migration: Replacing SHA-1 in Practice
Detecting SHA-1 Usage in Code
# Search source files for SHA-1 references
grep -rni "sha-1\|sha1\|SHA_1\|\"SHA1\"" /path/to/project \
--include="*.py" --include="*.java" --include="*.js" --include="*.go"
# Check the signature algorithm on a live TLS certificate
openssl s_client -connect example.com:443 2>/dev/null \
| openssl x509 -noout -text \
| grep "Signature Algorithm"
# Check all certs in a Java keystore for SHA-1
keytool -list -v -keystore /etc/ssl/certs/java/cacerts 2>/dev/null \
| grep -B5 "sha1WithRSAEncryption"
Replacement Reference
| Use case | Replace SHA-1 with | |———-|——————–| | Digital signatures | SHA-256 or SHA-384 | | TLS/SSL certificates | SHA-256 (minimum) | | File integrity / checksums | SHA-256 | | HMAC (new code) | HMAC-SHA256 (SHA-1 is technically safe in HMAC but prefer SHA-256) | | Password storage | bcrypt, scrypt, or Argon2id — never raw SHA-* for passwords | | Git object format | SHA-256 (Git 2.29+ --object-format=sha256) | | Code signing | SHA-256 or SHA-384 |
Code Migration Examples
Python:
import hashlib
# Before (broken for collision-sensitive contexts)
digest = hashlib.sha1(data).hexdigest()
# After
digest = hashlib.sha256(data).hexdigest()
Java:
// Before
MessageDigest md = MessageDigest.getInstance("SHA-1");
// After
MessageDigest md = MessageDigest.getInstance("SHA-256");
OpenSSL — enforce strong cipher suites in server config:
# /etc/ssl/openssl.cnf — enforce SHA-256 minimum at TLS level
[system_default_sect]
MinProtocol = TLSv1.2
CipherString = DEFAULT@SECLEVEL=2
SECLEVEL=2 disables SHA-1-based cipher suites, certificates with keys shorter than 2048 bits, and other legacy algorithms.
Conclusion
The SHA-1 algorithm is a landmark in cryptographic history — a clean, elegant design that served as the backbone of internet security for two decades before its collision resistance was conclusively broken by the SHAttered attack in 2017. Understanding how its Merkle-Damgård construction, message schedule word expansion, and 80-round compression function operate is essential context for anyone auditing legacy systems, evaluating cryptographic risk, or studying applied cryptography. For any new system design, SHA-256 is the drop-in replacement; SHA-3-256 provides a structurally independent alternative where defence-in-depth is warranted. If SHA-1 still appears in your digital signatures, certificate chains, or integrity checks, migration is overdue — the compute cost of collision attacks has only fallen since 2017.
For related reading on data integrity weaknesses in software supply chains, see our coverage of unsafe deserialization vulnerabilities in machine learning pipelines →
For any query contact us at contact@cipherssecurity.com

