Skipfish - Web Application Security Scanner

Skipfish is an active web application security reconnaissance tool. It prepares an interactive sitemap for the targeted site by carrying out a recursive crawl and dictionary-based probes. The resulting map is then annotated with the output from a number of active (but hopefully non-disruptive) security checks. The final report generated by the tool is meant to serve as a foundation for professional web application security assessments.


  • High performance: 500+ requests per second against responsive Internet targets, 2000+ requests per second on LAN / MAN networks, and 7000+ requests against local instances have been observed, with a very modest CPU, network, and memory footprint. This can be attributed to:
    • Multiplexing single-thread, fully asynchronous network I/O and data processing model that eliminates memory management, scheduling, and IPC inefficiencies present in some multi-threaded clients.
    • Advanced HTTP/1.1 features such as range requests, content compression, and keep-alive connections, as well as forced response size limiting, to keep network-level overhead in check.
    • Smart response caching and advanced server behavior heuristics are used to minimize unnecessary traffic.
    • Performance-oriented, pure C implementation, including a custom HTTP stack.
  • Ease of use: Skipfish is highly adaptive and reliable. The scanner features:
    • Heuristic recognition of obscure path- and query-based parameter handling schemes.
    • Graceful handling of multi-framework sites where certain paths obey completely different semantics, or are subject to different filtering rules.
    • Automatic wordlist construction based on site content analysis.
    • Probabilistic scanning features to allow periodic, time-bound assessments of arbitrarily complex sites.
  • Well-designed security checks: the tool is meant to provide accurate and meaningful results:
    • Handcrafted dictionaries offer excellent coverage and permit thorough $keyword.$extension testing in a reasonable timeframe.
    • Three-step differential probes are preferred to signature checks for detecting vulnerabilities.
    • Ratproxy-style logic is used to spot subtle security problems: cross-site request forgery, cross-site script inclusion, mixed content, issues MIME- and charset mismatches, incorrect caching directives, etc.
    • Bundled security checks are designed to handle tricky scenarios: stored XSS (path, parameters, headers), blind SQL or XML injection, or blind shell injection.
    • Snort style content signatures which will highlight server errors, information leaks or potentially dangerous web applications.
    • Report post-processing drastically reduces the noise caused by any remaining false positives or server gimmicks by identifying repetitive patterns.

That said, Skipfish is not a silver bullet and may be unsuitable for certain purposes. For example, it does not satisfy most of the requirements outlined in WASC Web Application Security Scanner Evaluation Criteria (some of them on purpose, some out of necessity); and unlike most other projects of this type, it does not come with an extensive database of known vulnerabilities for banner-type checks.

List of the security checks offered by Skipfish:
  • High-risk flaws (potentially leading to system compromise):
    • Server-side SQL / PHP injection (including blind vectors, numerical parameters).
    • Explicit SQL-like syntax in GET or POST parameters.
    • Server-side shell command injection (including blind vectors).
    • Server-side XML / XPath injection (including blind vectors).
    • Format string vulnerabilities.
    • Integer overflow vulnerabilities.
    • Locations accepting HTTP PUT.
  • Medium-risk flaws (potentially leading to data compromise):
    • Stored and reflected XSS vectors in document body (minimal JS XSS support present).
    • Stored and reflected XSS vectors via HTTP redirects.
    • Stored and reflected XSS vectors via HTTP header splitting.
    • Directory traversal / file inclusion (including constrained vectors).
    • Assorted file POIs (server-side sources, configs, etc).
    • Attacker-supplied script and CSS inclusion vectors (stored and reflected).
    • External untrusted script and CSS inclusion vectors.
    • Mixed content problems on script and CSS resources (optional).
    • Password forms submitting from or to non-SSL pages (optional).
    • Incorrect or missing MIME types on renderables.
    • Generic MIME types on renderables.
    • Incorrect or missing charsets on renderables.
    • Conflicting MIME / charset info on renderables.
    • Bad caching directives on cookie setting responses.
  • Low-risk issues (limited impact or low specificity):
    • Directory listing bypass vectors.
    • Redirection to attacker-supplied URLs (stored and reflected).
    • Attacker-supplied embedded content (stored and reflected).
    • External untrusted embedded content.
    • Mixed content on non-scriptable subresources (optional).
    • HTTPS -> HTTP submission of HTML forms (optional).
    • HTTP credentials in URLs.
    • Expired or not-yet-valid SSL certificates.
    • HTML forms with no XSRF protection.
    • Self-signed SSL certificates.
    • SSL certificate host name mismatches.
    • Bad caching directives on less sensitive content.
  • Internal warnings:
    • Failed resource fetch attempts.
    • Exceeded crawl limits.
    • Failed 404 behavior checks.
    • IPS filtering detected.
    • Unexpected response variations.
    • Seemingly misclassified crawl nodes.
  • Non-specific informational entries:
    • General SSL certificate information.
    • Significantly changing HTTP cookies.
    • Changing Server, Via, or X-... headers.
    • New 404 signatures.
    • Resources that cannot be accessed.
    • Resources requiring HTTP authentication.
    • Broken links.
    • Server errors.
    • All external links not classified otherwise (optional).
    • All external e-mails (optional).
    • All external URL redirectors (optional).
    • Links to unknown protocols.
    • Form fields that could not be autocompleted.
    • Password entry forms (for external brute-force).
    • File upload forms.
    • Other HTML forms (not classified otherwise).
    • Numerical file names (for external brute-force).
    • User-supplied links otherwise rendered on a page.
    • Incorrect or missing MIME type on less significant content.
    • Generic MIME type on less significant content.
    • Incorrect or missing charset on less significant content.
    • Conflicting MIME / charset information on less significant content.
    • OGNL-like parameter passing conventions.


skipfish [ options ... ] -W wordlist -o output_dir start_url [ start_url2 ... ]

Authentication and access options:
  -A user:pass      - use specified HTTP authentication credentials
  -F host=IP        - pretend that 'host' resolves to 'IP'
  -C name=val       - append a custom cookie to all requests
  -H name=val       - append a custom HTTP header to all requests
  -b (i|f|p)        - use headers consistent with MSIE / Firefox / iPhone
  -N                - do not accept any new cookies
  --auth-form url   - form authentication URL
  --auth-user user  - form authentication user
  --auth-pass pass  - form authentication password
  --auth-verify-url -  URL for in-session detection

Crawl scope options:
  -d max_depth     - maximum crawl tree depth (16)
  -c max_child     - maximum children to index per node (512)
  -x max_desc      - maximum descendants to index per branch (8192)
  -r r_limit       - max total number of requests to send (100000000)
  -p crawl%        - node and link crawl probability (100%)
  -q hex           - repeat probabilistic scan with given seed
  -I string        - only follow URLs matching 'string'
  -X string        - exclude URLs matching 'string'
  -K string        - do not fuzz parameters named 'string'
  -D domain        - crawl cross-site links to another domain
  -B domain        - trust, but do not crawl, another domain
  -Z               - do not descend into 5xx locations
  -O               - do not submit any forms
  -P               - do not parse HTML, etc, to find new links

Reporting options:
  -o dir          - write output to specified directory (required)
  -M              - log warnings about mixed content / non-SSL passwords
  -E              - log all HTTP/1.0 / HTTP/1.1 caching intent mismatches
  -U              - log all external URLs and e-mails seen
  -Q              - completely suppress duplicate nodes in reports
  -u              - be quiet, disable realtime progress stats
  -v              - enable runtime logging (to stderr)

Dictionary management options:
  -W wordlist     - use a specified read-write wordlist (required)
  -S wordlist     - load a supplemental read-only wordlist
  -L              - do not auto-learn new keywords for the site
  -Y              - do not fuzz extensions in directory brute-force
  -R age          - purge words hit more than 'age' scans ago
  -T name=val     - add new form auto-fill rule
  -G max_guess    - maximum number of keyword guesses to keep (256)

  -z sigfile      - load signatures from this file

Performance settings:
  -g max_conn     - max simultaneous TCP connections, global (40)
  -m host_conn    - max simultaneous connections, per target IP (10)
  -f max_fail     - max number of consecutive HTTP errors (100)
  -t req_tmout    - total request response timeout (20 s)
  -w rw_tmout     - individual network I/O timeout (10 s)
  -i idle_tmout   - timeout on idle HTTP connections (10 s)
  -s s_limit      - response size limit (400000 B)
  -e              - do not keep binary responses for reporting

Other settings:
  -l max_req      - max requests per second (0.000000)
  -k duration     - stop scanning after the given duration h:m:s
  --config file   - load the specified configuration file

No comments

Powered by Blogger.