Open Source Intelligence (OSINT) could also be defined as a set of techniques and tools for gathering public information about a company, physical person, etc., analysing the collected data and correlating, turning them into intelligence.
In the "Information Gathering" phase of a pentesting OSINT tools can help the pentester in different aspects:
- Technical information: Discover services, hosts, domains, subdomains, git repositories, credentials, social networks, operating system version, network diagrams, etc.
- Human resources: Gather information about a specific person on social networks, government records, telephone numbers, leaked credentials, metadata...
As a goal, the before mentioned information can lead to different attacks:
- Social engineering
- Passwords Brute force Attacks
- Target infiltration
- Accounts take over
- Identity theft
This sub-phase consists of concatenated processes that allow the attacker to obtain the required intelligence to perform the pentesting. The procedures are the following.
- 1.Requirements: What information do we need from the client as a starting point?
- 2.Sources of information: What sources of information can provide us with trusty information?
- 3.Harvesting: Retrieve data from the identified sources.
- 4.Data processing: Add format and process the obtained data, obtaining meaningful information.
- 5.Data analysis: Join data from multiple sources, producing intelligence.
- 6.Reporting: Create the final report.
This section presents some tools and sources of information the pentester has to consider during the OSINT phase. However, an attacker does not have to rely only on these tools and must be aware of new tools that allow him to obtain further information and confirm the collected data.
OSINT Framework is an interactive web page that contains a vast set of links to free and open-source tools for different purposes such as Domains, emails addresses, usernames, telephone numbers, public records, social network profiles...
Note: Take into account that some sites will require registration or paying money to obtain extra data.
Search engines are an excellent option to conduct passive reconnaissance because they are populated with a ton of information that has been previously indexed.
Google dorks is a technique that uses the google advanced search options for gathering efficient and precise data on any topic from any website on the internet.
The Google search engine interprets search operators and commands for retrieving or accessing sensitive information that was knowingly or unknowingly released on the Internet like credentials, configuration files, documents... However, Google Search results might vary depending on the location or device used, so use it to your advantage depending on your target.
Because Google Dorking can be pretty challenging, there are already tools that help you create Google queries for collecting specific data.
Google Hacking Database
Each dork is categorised in one of these categories:
- Footholds: Searches that can provide a foothold on a server.
- Files containing usernames
- Sensitive directories: Searches for directories with sensitive information.
- Web server detection: searches for web servers with a particular technology
- Vulnerable files
- Vulnerable servers
- Error messages: Searches for error messages of a specific type.
- Files containing juicy info: Searches for files containing important information
- Files containing passwords: Searches for files with passwords
- Sensitive online shopping info: Searches for sensitive information from online shopping sites
- Network or vulnerability data: Searches for specific network information or vulnerabilities.
- Pages containing login portals: Searches for pages containing particular login portals
- Various online devices: Searches for specific online devices
GitHub dorks are pretty similar to google dorks. This can be very handy when searching for sensitive files, API keys, passwords, hidden URLs, employees, etc.
To start with GitHub dorks, you can start from the GitHub Documentation: Search on GitHub or posts like GitHub Recon, Developers are unknowingly posting their credentials online and GitHub for Bug Bounty Hunters. Here are some valuable examples, but if you want more, visit these links: GitHub-dorks and keywords.
filename:id_rsa or filename:id_dsa
extension:sql mysql dump
extension:sql mysql dump password
filename:.env DB_USERNAME NOT homestead
Shodan is a search engine for directly accessible devices connected to the Internet, discovering devices like cameras, traffic lights, power plants.
Furthermore, it counts with several filters obtaining devices with default credentials, vulnerable services, screenshots, cloud providers, location... However, the filters "tag" and "vuln" are not accessible with the free plan requiring an academic membership or a small business plan.
Thanks to shodan, the pentester can create shodan queries looking for physical assets that are part of the target company for a red teaming attack and vulnerable services exposed on the Internet that will allow them to get a foothold on the company's IT infrastructure.
When conducting OSINT research in these fast-paced digital times, analysts often need access to historical versions of websites or content that no longer exists. This is where The Wayback Machine comes into play.
For instance, if you are looking to see historical versions of a website due to the site being deleted or replaced with new content, the Wayback Machine can help. You may need to verify that a target previously worked at a company, but the site's current state does not have the target’s information there. Furthermore, sometimes a target may intentionally hide information from their present website; looking at older dates of the site may reveal new information. Sometimes you can gather relevant data like names, phone numbers, email addresses, and even metadata from older website versions.
- Quick Search Methods: The quickest method to see all the files archived on a particular site is by accessing the following URL.
- Advanced Search Method: By directly visiting the archive advanced search page, the attacker can perform more targeted searches and sometimes find the email address associated with a user who uploaded a file. However, this requires you to register on the platform.
Harvesting email addresses gives an attacker more information to conduct social engineering and password brute-forcing attacks.
TheHarverster is a command-line tool that is already installed in Kali Linux. The primary purpose of the harvester is pervasively gathering e-mail accounts, subdomain names, virtual hosts, open ports/ banners, and employee names from different public sources using various search engines, which has increased in variety for the past years. In recent versions, the authors added active techniques like the capability of doing DNS brute force, reverse IP resolution and Top-Level Domain.
We can obtain many results with just a simple command, thanks to its simplicity. The results can be complementary with other tools like maltego.
theHarvester -d <DOMAIN> -b <Data source/all> [-l <NUMBER OF RESULTS>] [-f <OutPut File>]
Maltego is one of the most powerful open source intelligence tools on the market; It is characterised by its intuitive handling and its representation of information based on graphs that connect information for investigative tasks.
Maltego is used to map the relationships between pieces of information named Entities, resulting from running transformations.
- Entities are bits of information that we have obtained from a data source (a physical location, a website, a company name, an email address, a person’s name and a telephone number).
- Transformations are small pieces of code that fetch related information for a given input and format the results to be returned as Entities to Maltego.
However, be careful when running transformations because they can escalate too quickly, providing you with over information and turning into a gigantic graph populated with useless information.
Once obtained information about a user like an email or phone, the pentester should check on sites like haveibeenpwned.com whether there are compromised accounts that have suffered a data breach, meaning the email and password for that site’s account has been exposed to cybercriminals.
A pentester could obtain these credentials for different means, checking if employees share passwords across multiple accounts in the company's environment.
Recon-NG is a CLI framework installed on Kali Linux that conducts open-source web-based reconnaissance. It can be used for obtaining SQL injections, IPs, IP lookup, port scanning, sub-domain information, etc.
Obtaining its maximum potential can be achieved using API keys that some modules require. However, they might be subscription-based.
For setting the API keys, you need to execute the following commands.
keys add <NAME> <VALUE>
You will also find that no modules are installed by default, but they are easily installed with these commands.
Gives a list of all modules in the marketplace:
marketplace search [<CATEGORY>]
Install an individual module:
marketplace install <RELATIVE PATH MODULE>
Install all modules in a category:
marketplace install <RELATIVE PATH CATEGORY / all>
Remove a module:
marketplace remove recon/ports-hosts/ssl_scan
Note: Some of the modules will require Python dependencies to be installed outside of Recon-NG. Modules with external dependencies will have an asterisk in the D column of Marketplace results, and those requiring an API key will have an asterisk in the K column. In both cases, Recon-NG will warn you about missing dependencies and API keys after installation.
Then, to run any module, you need to follow these steps:
- 1.If you want to take a deeper look at what a module does, you can use the marketplace info module followed by the module name or path.
marketplace info <RELATIVE MODULE PATH>
- 1.Once you have decided which module you want to use, proceed with loading it using:
modules load <RELATIVE MODULE PATH>
- 1.Find any module prerequisites.
- 1.Set the options.
options set <OPTION> <VALUE>
- 1.Execute the module.
An unsecured subdomain can lead to severe risks, so the pentester needs to check them. Here you have some useful tools for obtaining subdomains that do not require direct access to the client's infrastructure.