Excel 4.0 (XL4) macros are becoming increasingly popular for attackers, as security vendors struggle to play catchup and detect them properly. This technique provides attackers a simple and reliable method to get a foothold on a target network, as it simply represents an abuse of a legitimate feature of Excel, and does not rely on any vulnerability or exploit. For many organizations, blacklisting isn’t a viable solution, and any signatures to flag these samples must be precise enough not to trigger on files that leverage this feature legitimately. As this is a 30-year-old feature that has only been discovered and exploited en mass by attackers in the last year, many security vendors do not currently have detection mechanisms in place to trigger on these samples, and building reliable signatures for this type of attack is not a small task.
The Lastline Threat Research Group has observed thousands of samples leveraging this technique, and has been monitoring and tracking trends for the last 5+ months. Intercepting these samples has provided valuable data to build statistics, identify trends, find outliers, and track campaigns. We were able to cluster samples into distinct waves, which clearly display how this technique has evolved through time to become more sophisticated and more evasive. As XL4 macros represent somewhat “uncharted territory,” malware authors and security researchers are making new discoveries daily, pushing the boundaries of this technique and identifying ways to evade detection and obfuscate their code. The techniques employed by these attackers include ways to evade automated sandbox analysis and signature-based detection, as well as hands-on analysis performed by malware analysts and reverse engineers. As previously mentioned, these techniques appear to surface in waves, with each new wave introducing new techniques, building on the previous wave or cluster. Techniques used in the first wave of samples we observed in February are still being leveraged in samples being discovered today. In this blog post, we describe each wave and cluster in detail, by breaking down every new technique discovered, and explaining why each is significant, effective, or ineffective.
Through clustering thousands of samples and performing in-depth code-level analysis of each cluster, we were able to visualize and witness the evolution of this threat. We found that roughly every 1-2 weeks, a new wave of samples emerged, each more evasive and sophisticated than the last. Each of these waves appeared to build on its predecessor, extending its functionality by introducing new techniques on top of what already was being used. The size of these clusters suggest that these samples are being generated with some sort of toolkit or document generator, as these samples resemble one another too closely to not be related. Evasion routines and obfuscation were the primary areas of evolution, as the base functionality of these samples remained the same – download and invoke a more persistent payload, such as an EXE or DLL file.
Excel 4.0, or XLM macros, is a 30-year-old feature of Microsoft Excel that has been gaining popularity among malware authors and attackers, especially over the last year (see Chart 1). This type of macro code is actively being abused and weaponized by attackers to deliver additional, more persistent malware. What makes this technique effective is that much like the more popular and up-to-date VBA macros, Excel 4.0 macros are a component of legitimate Excel functionality, thus will likely never be disabled, as they are used regularly for benign business purposes. For example, the commonly used SUM function is used in many spreadsheets to obtain the sum of a range of cells. Macros of this type are commonly referred to as “formulas.”
This technique has been effective because although it is an old feature, security vendors may not have yet devised detection techniques for this type of attack. On top of this, organizations will likely never be able to disable this feature in Excel, due to the fact that it is used regularly for legitimate purposes. For this reason, malware authors now have another reliable method for initial access, as these malicious documents are being successfully delivered via email attachments.
Security vendors are having difficulty detecting this threat, likely due to not having solutions in place to properly assess and parse the format and structure of how these macros are stored in Excel documents . These macros are very straightforward and easy to create, thus easy to modify to bypass signature-based detection. Macros are also robust, and provide various functions that can be leveraged to evade analysis, such as obfuscating the final payload, modifying the control flow, or detecting automated sandbox analysis through specific host environmental checks.
This technique will likely remain relevant, and join its successor (i.e., VBA macros) as a widely used technique to weaponize document files. This technique does not rely on a bug, it is not an exploit, but it simply abuses legitimate Excel functionality. These macros can be set to auto-execute, and run as soon as a workbook is opened if macros are enabled. As this is somewhat uncharted territory, malware authors and researchers are still exploring the depths of possibilities and capabilities of weaponizing this attack technique.
The Threat Research Group at Lastline has clustered and analyzed thousands of XL4 samples, as we have been tracking and monitoring this threat for over five months. We were able to cluster and classify these samples into clear waves or trends. The bulk of these samples appear to be created with the same toolkit or document generator, as each wave builds on the previous, adding new functionality with each iteration. This additional functionality typically consists of more sophisticated obfuscation or evasion routines. These stage-1 spreadsheet samples are in turn delivering a variety of commodity malware families, such as Danabot, ZLoader, Trickbot, Gozi, and Agent Tesla.
The timeline below (Figure 1) displays how these macros have evolved over the past four months. Each block represents a significant wave or cluster that exhibited new behavior and functionality that we had not yet observed at scale. We will elaborate on each cluster in greater detail in the following sections.
This set of samples is the first significant wave of weaponized XL4 documents we observed. These samples all contain a hidden macro sheet that holds the payload, and also an image (see Figure 2) that is used to social engineer the user into enabling the macro code. The techniques introduced in this cluster will be leveraged and built upon by almost all the waves that followed.
Unhiding the hidden macrosheet shows the payload (Figure 3), which is not obfuscated in any way.
2. Additional sandbox evasions commence, this time using the GET.WORKSPACE function  (see Figure 6). This function will provide information about the environment that the Excel spreadsheet is being viewed in. The two checks are for mouse capability (constant of 19), and audio capability (constant of 42). The malware exits if either of these two conditions are not met.
3. An additional check for the victim’s OS commences (Figure 7). This is performed through checking for the string ‘Windows’ being present in the return value of the call to GET.WORKSPACE:
An additional payload – a cell formula – is downloaded via a web query (Figures 8 and 9). These web queries are stored in the DCONN (data connection) record type .
The data from this network connection will be written to cells mapped to the name fgsb4g (shown at bottom of the hex dump in Figure 8), which the Excel name manager shows are range $Y$100:$Y$103 (Figure 10).
A small loop is then created between two cells that check whether the payload was downloaded successfully (Figure 11). These cells are checking for the string “LOS” anywhere within cell Y103. This is because at the end of the downloaded cell formula payload, the malware calls =CLOSE().
The second cluster of February added minor obfuscation through hiding the payload by using a white font on white background (shown in red in Figure 12), and by scattering the code around the worksheet. This makes following control flow and identifying important code blocks a bit more challenging for manual analysis, but should cause no significant issues for dynamic analysis, noting that the first cluster included significant attempts to evade dynamic analysis.
This cluster protects itself more than the previous clusters, as it hides the payload in a veryhidden macro sheet instead of a hidden macro sheet (note that both types of macro sheets are Excel documented features). This means that the macro sheet cannot be unhidden via the traditional method in Excel, but instead a specific flag in the binary (see Figure 14) must be modified in order to unhide and view the contents of the worksheet.
Once unhidden, this macro appears to extend on the dynamic analysis evasion routines from Cluster 1 by performing additional guest OS environmental checks (Figure 13). These two new checks also use the previously mentioned GET.WORKSPACE function, but this time they attempt to identify the height and width of the workspace. This check ensures that the display size of the workspace points (similar to pixels) are greater than the minimum value, 770×380, in this cluster.
The new techniques introduced by this cluster include building a path to three different files, one being a VBS script, which is a technique we didn’t see in previous clusters. These paths (Figure 15) will be used by the downloaded formula, the payload, using the same DCONN web query described previously.
Also, instead of FORMULA.FILL being used to move the downloaded payload around the sheet, FORMULA is used (Figure 16). This minor change is likely an evasion to signature-based detection of the previously used FORMULA function.
This cluster is the first batch of samples where we observed heavy usage of the CHAR function (Figure 17). This function translates an integer to its corresponding ASCII character. For example: CHAR(0x41) resolves to ‘A’. These characters are resolved, then concatenated one at a time to build the final payload. This is a common obfuscation technique across a variety of file formats and languages, but this is the first time we have seen the technique used in Excel 4.0 macros.
Within this same timeframe, we also observed the first large wave of samples that directly invoked the WinAPI (URLDownloadToFile) via the CALL function (Figure 18) instead of the previously mentioned DCONN download method, which is likely used to evade static detection of the prior download technique.
The added functionality of this cluster includes a check for a non-default Excel Security setting in the registry as a possible dynamic analysis evasion, and the use of the WinAPI to register a DLL to execute the second stage payload. The code in Figure 19 shows that the native utility reg.exe will be spawned (see Figure 20) to extract a specific registry key, and write it to a registry file on disk (<random_integer>.reg).
This registry file (<random_integer>.reg) is then read from bytes 215:470 (see Figure 21). The purpose of this read is to find and store the Excel Security-related registry settings of interest that the malware will check.
This appears to be a check for the File Validation setting in this instance; Figure 22 shows the hexdump of an example (carved out the 255 bytes).
The macro then checks to see if the security setting is set, as shown in Figure 23.
This cluster expects to be executed on a particular day, as it uses the current day of the month in a part of the deobfuscation routine. Hardcoded integers (Figure 24) will be subtracted from the current day of the month, and the difference will then be passed to the CHAR function. These samples were created on April 10th, and are expected to be run on that same day. We don’t see much of this “day-of” technique used outside of this cluster. This may be due to the attackers being less successful with this wave, perhaps because the victims checked their email (opening the document) in the days following its intended execution day.
Each hard-coded integer is subtracted from the day of the month plus 7 (Figure 25), then passed to char.
Figure 26 shows the part of the deobfuscation routine where the values in column A are decoded and concatenated into one string.
If run on the incorrect day of the month, the XL4 macro will not deobfuscate and execute properly (Figure 27 shows an example of a failed deobfuscation).
Emulating this behavior in Python with the proper value showed the expected deobfuscation (See Figure 28).
Hardcoding the correct value (date) in the sheet allows the code to deobfuscate properly, as shown in Figure 29:
Like the previous cluster, this cluster must also be executed on a particular day, but a new check is added to query the font size and row height (Figure 30). GET.CELL is used for both of these checks, font size of the text in cell A1 (19), as well as row height (17). This is likely to check whether the spreadsheet has been tampered with.
The result of these operations is written to cell AA181, which is used repeatedly during the deobfuscation routine. If these dimensions are not precisely as the malware authors expected, the payload will not be deobfuscated properly, and the second stage download will not occur.
The new technique introduced in this cluster employs dozens of independent macrosheets (Figure 31), whereas all previous clusters used only one or two sheets. We believe this is intended to have the sample blend in with benign workbooks, which tend to have a higher number of macro sheets than the malicious samples do. It also may be used to slow down malware analysts who will have to identify where the payload lies across the 20+ macro sheets (Figure 32).
Another interesting characteristic is that instead of the typical Auto_Open name for the automatic execution of the payload, the malware author has named it Auto_Open22 (Figure 33), which still executes as a normal Auto_Open routine.
For the first time, we observed hidden names being leveraged to hide the starting point of Excel 4.0 macro code in a large cluster. This is likely an attempt to thwart static parsing and analysis, both by analysts and automated solutions. Figure 34 shows how the Name Manager shows no names.
This is achieved through setting the fHidden bit in the Lbl record for the defined name at offset 0x0. At offset 0x15 is the Name field with constant 0x1 for Auto_Open.
This value of 0x21 has the fHidden and fBuiltin flags set (Figure 35).
After flipping the hidden bit, and adding a character (“g”) to the first byte of the Name at the end of this record, the name is visible (Figure 36).
This cluster also leverages an interesting deobfuscation routine, which is a bit different from what we have seen in previous clusters. Although we have seen control-flow obfuscation through GOTO and RUN functions, this cluster adds a step through setting a value to a cell for later use in the routine (Figure 37): set cell value, jump to next code block, repeat.
This cluster introduces a few interesting evasion techniques by detecting window activity. These macros attempt to identify if the Excel window is hidden or maximized, through three different usages of GET.WINDOW() (Figures 38 and 39). If the Excel window is not maximized, the malware will exit prior to exhibiting any interesting behavior. This appears to be an anti-analysis or sandbox evasion trick, as Excel windows may not be maximized in these environments.
An additional evasion technique introduced in this malware is identifying if the malware is being debugged/analyzed. This is achieved through checking if the macro is being run in Single Step mode (Figure 40), which is a way to debug and step through Excel 4.0 macro code. If Single Step mode is detected, the malware will exit.
Almost all previously mentioned clusters leverage the CHAR function heavily during their deobfuscation routines to build the final macro payload character by character. This cluster changes the original technique by using the MID function instead of the CHAR function. The MID function is used to extract a substring (text) from a string by providing an index number for where to start, as well as the length of the text to extract. Figure 41 shows the MID function being used to extract a substring for each string in column B, and write it to column C.
Instances of this function are being concatenated and passed to the FORMULA function (Figure 42) to deobfuscate the final payload at runtime.
This cluster also leverages the FILES function (Figure 43). This function is used to obtain a list of files within a target directory. The malware author uses the FILES function along with ISERROR to check if the download succeeded (Figure 44).
Before downloading the next stage, the malware first checks to see it can access the Internet (Figures 45 and 46), by connecting to microsoft.com, and then checking that the requested download succeeded.
If this download is successful, the macro will then repeat this process to obtain the next malware stage, and then invoke it.
Excel 4.0 macros continue to prove their value to attackers, providing a reliable method to get their code to run on a target. In many environments, Excel worksheets with macros are used too heavily for legitimate business purposes to disable or blacklist, thus analysts and security vendors will have to get used to consistently updating tooling and signatures as attacks continue to evolve. Excel 4.0 macros provide a near endless list of possibilities for malware authors and are evolving, becoming more sophisticated each day.
Below a selection of samples for each of the aforementioned clusters.
James Haughom is a Malware Reverse Engineer at Lastline.Prior to Lastline, James supported Incident Response and Intel teams in the federal space as a contractor.This support included Malware Analysis/Reverse Engineering, DFIR, and Tool Development.
Stefano Ortolani is Director of Threat Intelligence at Lastline. Prior to that he was part of the research team in Kaspersky Lab in charge of fostering operations with CERTs, governments, universities, and law enforcement agencies. Before that he earned his Ph.D. in Computer Science from the VU University Amsterdam.