❗️Disclosure: I work at Pitch since December 2021. This write-up aggregates public knowledge previously shared by third parties (e.g., security firms, infosec community, etc.) or us. I hope this can be helpful for younger security teams that need to understand how to identify and mitigate phishing abuses for the first time.

Threat model

When you work in the startup ecosystem long enough, you learn that company cybersecurity risks are not only related to IT failures or the OWASP Top 10 but also to reputation. It doesn’t matter if your infrastructure is solid and redundant, if your engineers follow best practices, if you have real-time monitors and alerts in place, or if you implement other good practices: if stakeholders don’t trust your security standards, you are not secure in their eyes. This is why startups and companies invest money and time into security certifications like ISO 27001 or SOC 2 Type 2 (though this is a controversial topic, and I have strong opinions on it, I won’t delve into that now).

This write-up focuses on a precise threat that can impact startups and companies allowing users content creation and sharing (images, webpages, etc), and it’s not my interest to investigate the entire PhaaS (phishing-as-a-service) phenomenon.

PhaaS: It’s hard to be a phisher

PhaaS threat model

The chart shows how a generic phishing scheme involving a PhaaS could be. This scheme can change according to the actor’s goals, new techniques, and the PhaaS product features. I recommend the Group IB report “W3LL oiled machine: Group-IB uncovers covert BEC phishing empire targeting Microsoft 365 – report”, where they explained exactly how actors use PhaaS and BEC attacks to breach business e-mails (often Microsoft 365, but this detail can depend on the geographic area or the actor’s goal).

The chart shows a part of the supply chain involved in an advanced phishing scheme, but not the entire chain (e.g. we will not focus on stolen credit card markets, it’s another Pandora box): purple and red items are the services offered or probably offered by the PhaaS actors to their customers, the green and light blue items are the actions, steps, and additional resources, used by the PhaaS’s customers. As you can see, it’s hard to be a phisher, the chain is long, and every link can be the weak link of this chain, which can burn the entire campaign.

The PhaaS platforms usually offer:

  • FUD (Fully UnDetectable) domains and links: FUD domains are generally new registered domains that have never been used before to avoid being on the security firms’ blocklists even before having begun the phishing campaign. PhaaS actors register thousands of new domains on various domain registrar services to sell them as part of their services. More details about this strategy in the chapter FUD (Fully UnDetectable) domains and links.
  • Sandbox evasion techniques: In a phishing campaign it is important to “buy time” to maximize the operation results, and “buying time” means also reducing detection rate. The PhaaS products implement various techniques to break automated scanners and sandboxes, from abusing popular antibot solutions (e.g. Cloudflare Turnstile) to obfuscated Javascript. More details about these evasion techniques in the chatper Sandbox evasion techniques.
  • MFA bypass solutions: In the enterprise world, multi-factor authentications (often non-phishing resistant) are becoming a standard: SMS, TOTP, and mobile notifications. A good PhaaS product must offer a solution to bypass at least common MFA solutions like SMS or TOTP. The most common technique is the reverse proxy, a proxy between the victim’s client and the legitimate login page to intercept the user’s inputs. Other techniques, such as notification fatigue, where attackers inundate the victim with MFA prompts to exhaust them into accepting, can also be used to bypass multi-factor authentication. However, these methods are typically uncommon in “standard credential-based” phishing attacks.

One of the first evasion techniques for a phishing page is the URL itself: attackers can send QR codes, HTML files, or PDF files, but in the end, the victim will need to open the phishing page in a browser. Even if every step of a phishing campaign needs tailored evasion solutions, URL evasion techniques are among the oldest and still most used.

  1. Shorter URLs and open redirects: Shorter URLs and chains of URL redirects are two of the most popular and oldest techniques for bypassing allow lists and security scanners and fooling humans. Shorter URL services are legitimate services that shorten long URLs or customize them for marketing campaigns (e.g. Twitter, SMS, QR code, etc): they became quickly abused services by threat actors due to their popularity, and capability to “hide” the original links. Using a short URL (bit.ly), Russian hackers were able to hack John Podesta’s email inbox in 2016 US election. Another technique is open redirect “vulnerabilities” exploitation in high-reputation and popular platforms, like Adobe or Google. Although open redirects are typically not considered a major security risk for users, and security teams often do not prioritize such reports, they can still be exploited in phishing or malware campaigns, because services with good reputations are less likely to be blocked by security vendors.
  2. Abused third-party services: It is now commonplace to observe actors abusing popular third-party services in their phishing campaigns. Sometimes, this overlaps with the previous point (so they abuse the open redirects), and sometimes, they just misuse the services to host phishing content or cheat the victim into clicking the phishing links. If a service allows users to upload files and create and share content, it can be - and probably will be - abused. In a research published in 2022, Palo Alto Networks threat researchers identified six types of SaaS platforms more prone to be misued by phishing threat actors: file sharing, form builders, website builders, note taking platforms, design platforms, personal branding platforms. These platforms share a common feature: a simple and quick way to create and share arbitrary content, as they are specifically designed for this (legitimate) purpose. Startups and early-stage businesses in this area are most exposed to the risk of phishing abuses because they may not have a security team, and they could be less effective at finding malicious content, compared to bigger tech players, becoming a good target for threat actors - at least, this is the threat actors’ bet. In 2021, Notion, a popular collaboration note-taking platform, suffered an hours-long outage caused by phishing complaints, ignored or undetected by their team, showing how risky these abuses are for an early-stage business.
  3. Compromised legitimate websites: If a web component is popular enough and turns out to be vulnerable, it could be a good candidate to be exploited in automated exploitation campaigns, by turning legitimate websites into hosting services for malicious content (malware delivery, phishing, etc). According to a survey of W3Techs (World Wide Web Technology Surveys), WordPress covers 62.4% of all CMS market and 43.7% of all websites. This makes WordPress and its ecosystem (plugins, themes, other customizations) a perfect target for actors who want to scale their infrastructure. It is not uncommon to find small businesses’ websites used to host phishing or other malicious content, and often their owners are unaware of this. Usually, these compromised sites are precious for the actors because the top-level domain sounds legitimate to the security scanners (e.g. it is old enough).
  4. New domains and subdomains: On the other hand, if old domains can have a better reputation than new ones, new domains are precious and highly used in modern phishing campaigns, this is for a basic reason: databases and blocklists solutions are still popular and used to block phishing websites, both for in-house solutions and third-party vendors. So, new domains can give time to the PhaaS customers to run a proper phishing campaign without being blocked too early (e.g. anti-bot + new domain can easily bypass Google Safe Browsing check and similar client solutions), this kind of domain is called Fully UnDetectable domain (FUD), they are often sold with a PhaaS subscription. The popular Phishing-as-a-Service platform ONNX Store (formerly Caffeine) used to sell these services through Telegram bots, by making particularly easy and user-friendly buying them. Recently, Microsoft disclosed an operation in which they seized 240 domains related to this PhaaS platform, by critically impacting the group’s operations. The subdomains for fully undetectable domains play an interesting role in this game too: subdomains are a common feature for customer tenants, implemented by different startups and companies (Metabase, Slack, 1Password, Okta, etc). Security vendors cannot monitor all the new businesses born, so they tend to block abused subdomains instead of the top-level domain, to avoid blocking legitimate services. This can extend the life of the top-level domains registered by the threat actors.

Sandbox evasion techniques

For obvious reasons, many security software follow the rule “If it is not phishing, it is a legitimate resource”, so the PhaaS kits are designed to block automated scanners, which are nothing but bots: if a scanner cannot flag a website as phishing, then it probably flags the website as safe. This is why many modern phishing websites abuse legitimate captcha services (e.g., Cloudflare Turnstile) and run obfuscated JavaScript scripts to detect browser emulation. The techniques listed below were identified in phishing websites created using the ONNX (formerly Caffeine) kit in August 2023. Even if they may already be outdated, they can help in understanding the threat model and the challenges that security teams must face. I also recommend the research on evasion detection techniques implemented in AitM phishing kits, published by Push Security.

Captcha. Free captcha services are one of the most effective and quick solutions to block bots, so security firms’ software. Various vendors offer free and ready-to-go captcha solutions (Cloudflare, Google, hCaptcha), which in recent years have been misused by phishing actors to protect their phishing websites from security scanners.

Browser emulation detection. Emulating an entire sandbox machine can be time-consuming, especially for real-time checks, so many security solutions emulate a browser or - still very commonly - run command-line tools and scripts to open and evaluate URLs. The PhaaS vendors know this is a cost-benefit challenge, so their goal becomes identifying and blocking the most common scanners for two main reasons: delay the detection and make it more expensive for their victims to find effective solutions. Command-line tools can easily be blocked by captchas and Javascript, but emulated browsers can bypass captchas and execute Javascript, which means PhaaS vendors must find ways to detect them. In the Javascript scripts embedded in the phishing pages, it is not so uncommon to find detection rules for common browser emulation frameworks like in the following sample:

<a href="[REDACTED]" id="cnuDKmJliL" hidden=""></a><script>(screen.width>480&&navigator.mimeTypes.length+navigator.plugins.length===0||Array.from(navigator.plugins).some(e=>e.name.includes("Native Client"))||!0===navigator.webdriver||window.document.documentElement.getAttribute("webdriver")||window.callPhantom||window._phantom||window.chrome&&window.chrome.webstore||window.navigator.languages&&window.navigator.languages.includes("webdriver")||"function"==typeof window.webdriver&&window.webdriver.toString().includes("class WebDriver")||window.Capabilities&&window.Capabilities.chrome||window.document.documentElement.getAttribute("webdriver-eval-executed"))&&document.getElementById("cnuDKmJliL").click();</script>

The conditions checked by the script include:

  • screen.width > 480: It checks if the screen width is greater than 480 pixels, which might be an attempt to distinguish between desktop and mobile devices.
  • Browser plugin and MIME type checks: It checks if the browser has no MIME types and no plugins installed or if it includes the “Native Client” plugin.
  • navigator.webdriver: It checks if the navigator.webdriver property is true, which could indicate the presence of a WebDriver instance often used in automated testing (Selenium).
  • window.document.documentElement.getAttribute("webdriver"): It checks if an HTML element attribute named “webdriver” is set (Selenium).
  • window.callPhantom, window._phantom: It checks if certain properties related to the PhantomJS headless browser are present.
  • window.chrome.webstore: It checks if the window.chrome.webstore property is present, which might be an attempt to detect Chrome’s web store.
  • window.navigator.languages: It checks if the browser reports the list of languages supported by the user.
  • window.navigator.languages.includes("webdriver"): It checks if the list of supported languages includes “webdriver” (Selenium).
  • typeof window.webdriver === "function": It checks if the window.webdriver property is a function and if its string representation includes “class WebDriver” (Selenium).
  • window.Capabilities.chrome: It checks if a property named “chrome” exists in the “Capabilities” object.
  • window.document.documentElement.getAttribute("webdriver-eval-executed"): It checks if an HTML element attribute named “webdriver-eval-executed” is set (Selenium).

If any of the conditions in the list evaluates to true, the phishing page redirects the client to a legitimate site, by evading the scanner’s detection. Even if this script is over one year old, it shows how a PhaaS kit can elude the most common scanners.

Image, text, and source code obfuscation. If the detection tool passes the previous detection layers, it cannot easily parse the source code to look for suspicious keywords (e.g. sign-in, password, Microsoft, etc.). The phishing pages obfuscate text using HTML tags and constructors (e.g. <span>,<div>, …) or images, instead of text. Also, Javascript code is often encrypted or obfuscated.

Mitigation: 📍 You are here

Okay, now this write-up doesn’t offer a proper solution to the problem, but it would like to help the security engineers define a threat model and evaluate mitigations accordingly. So, where are we in the previous chart? In this threat model, we are the legitimate third-party service abused by the actor to spread the phishing URLs, we are just a step in the redirection flow to bypass the security scanners (e.g. scanners for the webmail). The question is: how can the security team identify the phishing content created and shared by malicious users?

Trust data, don’t try to play a cat-and-mouse game

Even if modern web platforms are designed to collect a lot of information and data about users’ behaviors and activities, the security teams can trust just a small subset. Frontend data or browser headers can be easily spoofed, so - even if they can still help the security team, they cannot be the main source of trust. Considering our threat model, the teams should start with the only sources they can be sure of: URLs (or other delivery solutions, e.g. QR codes, PDFs, etc, but in this threat model we focus on URLs).

All the phishing content needs to be shared and sent to the targets. The phishing actors want to have as little friction as possible for their visitors, so the phishing content will have at least one public sharing link or be delivered using the platform’s e-mail system (e.g. invitations). If a phishing content cannot be shared, it is not a phishing content. The security teams should focus on this kind of content: public content - where “public” means shared in some way (e.g. invitations or public links). In addition, in our threat model, the phishing content must redirect to a phishing site, so it must have the phishing URL/redirection (or other delivery sources, e.g. QR code).

Secondary - but important if aggregate - data that can help the security team is: user’s information (e.g. activities and creation), IP (hosting and VPN IPs can be suspicious under certain conditions), credit cards (highly recommended a fraud detection, for example, Stripe Radar), URL information (domain creation datetime, Cloudflare headers, redirections can be highly suspicious).

Regarding the investigation of URLs, I wrote some proof-of-concept projects to aggregate suspicious headers, redirections, and WHOIS data: espresso 🤌.

Understand your legitimate users’ behavior

The most common - and legitimate - behavior of users on a platform depends on the platform’s features and purpose. Phishing actors, however, typically do not follow common user behavior because they operate under a strict constraint: cost. They must create phishing content (e.g., URL redirection) as quickly as possible and spread it as fast as possible. This constraint can make it easier to identify phishing patterns on a platform. The security teams could investigate this data in the service’s database and correlate it: account creation datetime, content creation datetime, content sharing datetime, and user activities.

Phishing actors are likely to create throwaway accounts that share only the phishing content. As a result, the account creation timestamp and the content creation timestamp will be close - closer than those of regular users. Additionally, their overall activity will be lower compared to common users.

Procedure to report abuses

Last but not least: a public procedure to report abuses. Content moderation is not just a technical implementation, and a short section cannot help how to define a good solution, which depends on the company structure, culture, and product. Still, a lean and public procedure to report abuses is a great starting point. A contact (e-mail address, form, Google Forms, Tally, whatever) where people can report harmful content is important for the platform’s users, visitors, and the security team: it helps external people - yes, not only the users - to send reports (e.g. security vendors), and the security team to have visibility on malicious content that bypass your detection system. When a threat actor finds a way to bypass the detection system, all the threat actors have found a way to bypass the section system. Make it easy and frictionless send communication to the security team to report content.

In addition, one final recommendation: build solid implementations. This advice may seem trivial, but it is one of our best allies. Phishing campaigns are often just large “spray-and-pray” schemes, where attackers send massive volumes of emails and URLs, hoping to compromise a few accounts. Solid systems enable engineers to monitor infrastructure and products better and - most importantly - increase costs for adversaries (e.g. raising bot automation expenses) while minimizing the impact on the user experience. Platform abuses often stem from a lack of good practices, like appropriate limits and constraints in application usage (e.g. missing upper limits for email delivery).

I think this is a good summary of what I have learned in the last years about phishing threats and actors. I hope this helps!