Like G.I. Joe says – “Knowing is half the battle!”
Every once in a while I will come across a site that is known (or not known) to have some nefarious code in it, either by way of spam or some other link source. One way of inspecting these sites without worrying about any threats is to download the source code without actually viewing the page itself in your web browser. You can do this via something like WGET, and then look at the files offline, but in doing so; you potentially give the website your IP address in the process, making any visitors IP address another target for attack. That might sound a bit paranoid, but I can assure you, half the reason they send spam links is to get a live person to click and go to their sites, not just to infect them and take over their machine via code in the page, but to earn money for things like click thru traffic, and they also harvest IP addresses for targeting attacks later on, like worms trying to hit every un-patched machine it learns about via the visitors IP address. So, WGET is great for downloading the source, but still poses the chance that you will become a target via your IP address. If you are behind a router and firewall, and fully patched, their chances of getting in are much harder, but if they are aware of some 0-day exploit, you won’t know what hit you!
Services like Proxies can hide your IP address, but they themselves are not without drawbacks. You would have to test these public proxies for the fact that a lot of them are run by un-trusted 3rd parties who they themselves want to harvest your IP address, as well as monitor your actions online, what sites you visit and log into, so they will have your usernames and passwords every time you try to go to them. For users who use these services for something like, oh, I don’t know, Myspace, Facebook, Twitter, etc, while you are at school or work. Proxies used to be a way around filters for a lot of people, but it has become widely known that 99% of these public Proxies can not be trusted and you will only end up doing yourself more harm than good by using them.
Paid for Proxies and pay for VPN’s are alternatives, but who really wants to pay money every month for something they may only use once in a blue moon to watch something like Hulu from outside the US, or even for inspecting sites like we are about to do below.
Then it occurred to me, there are other ways to inspect these sites, get all their source code, and do it without giving these sites your IP address. Have you ever heard of the W3C-Validator? No? Well, what it does is, it lets web designers (and anyone else) work out the bugs in the source code of their websites, fixing proper coding mistakes and making their site valid for HTML and XHTML markup. It can also be used as a sort of 3rd party downloader, or passive proxy to get the source code for any website that doesn’t specifically block their USER-AGENT and IP address. By going to http://validator.w3.org you can plug in the address of any site you want to check for valid markup. There is also an option to SHOW SOURCE. This is good because you can inspect the source code of the web site, see what scripts are listed, I-frames and any nefarious attempts to hijack a visitor’s browser, without having to go to the page itself, and without having to give up your IP address in the process. For me, I find this a great tool to do some inspecting on those spam links you may receive in an email or even on things like your twitter account. Inspect them BEFORE deciding to follow those links and you will be much better off knowing what lies ahead. This also works for things like Twitter’s Bitly links and Tiny URL, while at the same time, showing you the actual address these links lead to without having to visit them!
Now, this won’t let you view external scripts for things like “script src=somelinktosomefile.js” but you will at least see how many of these external scripts there are running on the page.
To get the actual scripts themselves would require other methods of retrieval. One of these tricks can get you both the source code of the page and also the scripts themselves. An old trick a lot of people use when testing their networks and web servers for connectivity issues is Telnet. It is well know that you can use Telnet to connect to any open TCP port on a web server. This opens a direct connection with the device and can then be used to display things like web page source code directly within the telnet sessions windows. For example, from the command line in windows, you can type:
“telnet www.somesite.com 80”
and it will then open a direct connection to that site on port 80 (80 being the service port for HTTP requests). Now you can structure your request for information, that information being the source code of any address you feed it (including images and executables), so long as it is www readable. How you do this is with the proper syntax of an HTTP request. The format will look similar to the following, and is basically what happens when your browser reaches a website. Each of these go on their own line and can be copied and pasted together:
GET /index.html HTTP/1.1
host: www.somesite.com
After you paste that in the window (If you don’t see the text you pasted, that is ok), hit enter once or twice and you will then see a bunch of text scroll across your screen. That text is the source code of the site in question or for the file you requested. Remember those external scripts, like script src=”some-url.js”? You can plug those in as well; you just have to structure your request to do so:
GET /somepath/js/somefile.js HTTP/1.1
host: www.somesite.com
Now, if this scrolls off the screen and you can’t see all of the file, chances are you need to increase your buffer size for CMD window. Linux users shouldn’t have this issue since the terminal usually lets them scroll back a pretty large amount of lines and can also be logged, but windows users using the simple built in telnet in XP will want to edit their CMD windows settings for a larger buffer of lines.
One thing I noticed about using Telnet is that sometimes the request doesn’t show up in the web logs, sometimes it does. I’m not 100% sure why that is, but I guess a raw connection for the HTTP Request isn’t always logged if the server doesn’t see things like a browsers user agent and can’t identify the OS via the browser, so it in turn doesn’t show up on the logs. This is nice since you won’t be giving them your IP address in the process, but if the server is configured properly, your IP address will show up in the web logs for any file request from the server.
So, by using these two methods we can 1, inspect a web pages source code without visiting it and giving up our IP address, or 2, give them our IP, but not load the page in any browser while safely viewing any scripts from within a telnet session.

