Website investigation in OSINT
Today we will talk about what information can be obtained from a web resource for subsequent use in OSINT. What services are available for this and what specific information they will dig for us. This information and techniques will also be useful to mere mortals who are not connected with the IT sphere, as they will help to determine “by eye” the reliability of a particular site before entering your bank card details on it …
DISCLAIMER: This article is for informational purposes only and is not intended to be a guide to misconduct or educational material for covering up wrongdoing.
Website research within OSINT can be designed to solve a number of tasks:
- identification of owners or administrators
- monitoring price changes
- competitor analysis
- SEO or used keywords, etc.
The analysis of any website begins with obtaining WHOIS registration data.
There are a lot of WHOIS services. So here are just a few of the popular ones:
https://whois.domaintools.com/
https://whoer.net/en/checkwhois
As you can see, WHOIS contains data about the terms of domain registration, its IP address, owner, registrar and their contacts.
Of course, if an online store promises you fabulously low prices and in general “WE ARE ALREADY 10 YEARS ON THE MARKET!”, But its domain was registered 2–3 days ago, then you should think about whether it’s a scam here for an hour …
However, about the sad thing… After the introduction of the GDPR, we had to forget about the open publication of personal data of domain name owners. And for searching through old WHOIS records in which personal data was stored, I can recommend a couple more useful resources:
https://osint.sh/whoishistory/
https://drs.whoisxmlapi.com/whois-history
We continue about the sad. As you know, there is such a company — Cloudflare, which additionally allows you to hide information about website owners when using its services.
So, her anonymity can sometimes be revealed. To do this, I use an open source product like CrimeFlare: https://github.com/zidansec/CloudPeler.
It also exists in the web interface at the link: https://crimeflare.herokuapp.com. CrimeFlare allows you to do a little, but important — get the real IP address of a site hidden behind Cloudflare.
Another couple of resources where leaked passwords from various online services can come across. The site you are researching may be among the latter:
As we all know, a website is more than just a domain name. The site must be physically located somewhere. Its location is hosting.
The hosting data of a particular site can also be found in open sources:
https://www.whoishhostingthis.com/
https://hostadvice.com/tools/whois/
Let’s move on to the content posted on the site and turn to the analysis of various metrics (advertising identifiers).
The following selection of services will allow you to understand what advertising surveillance technologies (and not only) are present on the site:
https://themarkup.org/blacklight
https://pagexray.fouanalytics.com/
However, you can open the source code of the web page, and then manually search for advertising identifiers for the following keywords:
AdSense: Pub- or ca-pub
Analytics: UA-
Amazon: &tag=
AddThis: #pubid / pubid
Metrika: mc.yandex / ym
Rambler: top100
Mail.ru: Top.Mail.Ru
If we have found the code of one of the metrics, then we can open its public statistics. To do this, enter the metric ID in one of the following hyperlinks:
https://metrika.yandex.ru/dashboard?id=ENTER_ID
https://top100.rambler.ru/search?query=ENTER_ID
https://top.mail.ru/visits?id=ENTER_ID
The public metric is interesting in that it will highlight the administrator who put it on the web resource as the first visitor to the site. It could be the owner of the site. Then we will get information about his gender, age and city of residence. You can check this on the following Yandex metric https://metrika.yandex.ru/dashboard?id=55694881.
In addition, codes of metrics or advertising identifiers allow you to find additional sites on the network that use it in their code. To do this, you can use the following services:
https://intelx.io/tools?tab=adsense
https://dnslytics.com/reverse-analytics
Reverse search for additional (affiliated with the checked) web resources is possible not only by advertising identifiers.
You can reverse lookup the website hosting IP address using the following resources:
https://2ip.ru/domain-list-by-ip
https://www.cy-pr.com/tools/oneip
https://hackertarget.com/reverse-ip-lookup
https://mxtoolbox.com/reverselookup.aspx
You can also reverse search by matching related email addresses:
https://2ip.ru/domain-list-by-email
You can also reverse search the site’s SSL certificate:
https://www.ssllabs.com/ssltest
We turn to the collection of contacts. Part of the contact information of the owner of the web resource is posted on the site or when registering a domain name. Let’s try to find the maximum number of contacts (email addresses) using services such as:
https://2ip.ru/domain-list-by-email
Let’s move on to DNS analysis.
DNS (Domain Name System) is used to get an IP address from a hostname, get mail routing information, and/or server hosts for protocols in a domain. These data will also be useful to us when studying the website:
https://hackertarget.com/dns-lookup/
Historical DNS data can be viewed at the link:
Old copies of web pages and sites (web archives) are also useful in the investigation. You never know what they wrote on the site a couple of years ago. Links to popular web archives below:
Useful collections of additional services for studying websites can be found here:
https://hackertarget.com/ip-tools
https://abhijithb200.github.io/investigator
And also, as a dessert, complex services for conducting investigations on web resources. Let’s start with the Spiderfoot modular service, which allows you to do 3 checks per month for free.
Well, Maltego with its free modules and a bunch of free APIs should not be forgotten. Quite a cool tool for an investigator.
Finally, I’ll tell you about studying the site from the position of a marketer. Here we will be interested in the CMS of the site, which can be found on the resources:
https://linkonavt.ru/services/sitetechnologies
Without a doubt, we will be interested in website traffic statistics, its keywords and other advertising features. They can be obtained from the following services:
Well, that’s all for today. I hope this compilation has been helpful to you. Subscribe to not miss new articles. See you again.
… join my Medium Blog https://medium.com/@ibederov_en, Facebook https://www.facebook.com/ibederov.en/ or Telegram https://t.me/ibederov_en!