Saturday, 30 May 2009

Google Wave Security

Have Google thought about security or got caught up in a Wave of enthusiasm?

The big advantage of email has always been the firmware independence; it doesn't matter if I'm using a Vista machine, a Mac Book Air or my mobile phone, I can always send and receive email. Google have argued that this is a 40-year-old technology that is ripe for an update. This is true, but if Google want widespread adoption of Wave, they need to have two things. Firstly, it needs to be open and platform independent - they have announced that it will be open-source, so that checks out. Although they say that they want to give something back to the community, this project would fall flat if the only way to communicate was via their web app that people have to sign up to. I don't need to sign up to Google to get my email or to receive gmail messages. I have choice. This is critical to the success of the new platform, but, as my wife says, it's always good to turn a necessity into a virtue.

The second thing that Google need is to have security built-in to the architecture from the start. If this is to be used by businesses, rather than become just another social media site, then it must support authentication, message integrity and confidentiality.
  • Login = authentication + confidentiality?

  • Can I forward message?

  • Is message encrypted at rest?

  • Can it be signed?

  • Can I change the attachment of another user?

  • Digital Leakage

  • Can I add another user to the thread?

  • Can it be read-only & non-forwardable?

Thursday, 28 May 2009

Obstacle Poker for Security Assessments

I was talking to Vic Page, a colleague of mine, today about various things and he told me about one of his PhD deliverables on 'Three Card RAG' or 'Obstacle Poker' - very interesting concept that he has come up with. During our discussion, we came up with an embryonic idea for using his concept to aide security assessments of organisations. Before I can describe it I need to extract a few concepts from his 10-page paper in the BT Technology Journal entitled "Security risk mitigation for information systems".

In the paper he describes the Security Obstacle Mitigation Model (SOMM) for developing trustworthy information systems. He also defines some terminology - a subset of which I have reproduced here (his paper is an interesting read):

  • mitigation - a mitigation is a procedure that will counter the effect of an obstacle

  • obstacle - an obstacle is something that, should it occur, will obstruct a trust assumption and affect the CIAA security requirements - obstacles are caused by malicious or inadvertent use of the operational information system

  • RAG code - RAG codes form an intuitive 'traffic light' approach to ranking the ability for load-bearing and the vulnerability of a trust assumption (RAG is an acronym for Red, Amber and Green). R signifies stop and mitigate, A signifies proceed with caution and G signifies continue with trust

So how is this relevant to security assessments particularly rather than information systems in general? Well, it is the way he is applying it that becomes interesting for the security assessment arena. He has moved this concept into the field of agile development and taken the Scrums process of planning poker and made it more accessible for other fields.

Now a quick background to the security assessment field is in order. The majority of security assessment strategies are technology focused and concentrate on system evaluation and tactical issues. This can lead to point solutions and, therefore, gaps between countermeasures. Obviously we need the vulnerability assessments and information system audits, but these should follow an Information Security Risk Evaluation. Schemes like Octave, although big, do follow an organisational evaluation, focused on security practices and strategic issues. The top down approach is driven by business issues and objectives leading to protection of what needs it and awareness of risk. Most other bottom up security practices start with individual components and computing infrastructure, seeking the technological 'silver bullet', which leads to protecting what can be protected rather than what needs to be protected. Of course this is an ongoing cyclical process.


So, what is obstacle poker and how do we incorporate it into an information security risk evaluation? Well, the principle is that representatives of all roles must be included in an information security risk evaluation for it to meet the needs of the organisation. Too often security assessments are left to the ICT departments and technologists, who don't necessarily know the business processes at play. Unfortunately, the people working in the organisation who aren't part of the ICT team usually have little or no knowledge of the technical landscape or what's possible/feasible. It needs everyone at the table and they need to speak a common language. Interpreters, i.e. consultants, can be brought in to help, but there still needs to be a simple way to converse.

The idea of 'Three Card RAG' is that an issue is discussed and then all the members of the team will rate it with the RAG scale given above. This is done by issuing each member of the team three cards - one Red, one Amber and one Green. Each member picks a card and then everyone reveals their cards at once. If everyone agrees, then that is considered the state of play and actions are taken accordingly. If they do not agree then further discussion can be entered into and another round completed. The idea being to come to a consensus between the technical experts and the users of the system quickly and efficiently. Those that are more obvious will be dealt with very quickly, those requiring more discussion will be given a fair hearing.

This principle can be used in information security risk evaluation by identifying the threat ratings and consequences of risks. We all know that a Threat + a Vulnerability = Risk to an Asset, but how can we categorise these when ICT and management departments can be approaching this from very different standpoints? Is the system that seems critical to the ICT department actually critical to the business or not?

Threat ratings have the following defined ranges: Negligible, very low, low, medium, high, very high and extreme. They are quite well defined in terms of how often they are likely to occur, from unlikely, through likely to occur every 6 months or less for medium, to likely to occur multiple times each day for extreme. These threat ratings must be ICT lead, but not exclusively controlled. If everyone holds a green card then it can be classified as very low or negligible. Amber would denote Medium and red would denote very high or extreme threat rating. Mixtures of cards can result in the low or high ratings being used if there is still no total agreement after a few rounds. The consequences are where the ICT department may step back a little. These are defined by the Australian Government's Defence Signals Directorate as:

  • insignificant - can be dealt with by normal operations
  • minor - could threaten the system’s efficiency or effectiveness but can be dealt with internally
  • moderate - does not threaten the system, but could cause major review and modification of operating procedures
  • major - threatens the continuation of basic functions of the system and requires senior-level management intervention
  • catastrophic - threatens the continuation of the system and causes major problems for the organisation and customers

This is where Three Card RAG really comes into its own. Getting agreement on this scale can be very difficult in organisations. However, with a simple traffic light scheme, everyone understands the system and agreement can be reached much more easily between all the stakeholders. Simply put, if it's red it's a priority, amber is secondary and green can be safely ignored until the next iteration (N.B. not ignored completely or never discussed again, just not dealt with in the current round of the cycle).

Saturday, 23 May 2009

Security Through Obscurity

I have been reminded recently, while looking at several products, that people still rely on the principle of 'security through obscurity.' This is the belief that your system/software/whatever is secure because potential hackers don't know it's there/how it works/etc. Although popular, this is a false belief. There are two aspects to this, the first is the SME who thinks that they're not a target for attack and nobody knows about their machines, so they're safe. This is forgivable if misguided and false. See my post about logging attack attempts on a home broadband connection with no advertised services or machines.

The second set of people is far less forgivable, and those are the security vendors. History has shown that open systems and standards have a far better chance of being secure in the long run. No one person can think of every possible attack on a system and therefore they can't secure a system alone. That is why we have RFCs to arrive at open standards that work. An example of a product that failed due to this is DiskLock. This was a few years ago now, but there are modern products that follow a similar philosophy. However, it's not my intention to pick on a particular vendor or product. DiskLock, though, was a program that encrypted files with the DES algorithm. No problems there, but they stored the key with the file, relying on people not knowing this or the scheme used to hide it. Unfortunately, with reverse engineering and chosen-key/plaintext attack techniques this is possible to work out. The problem is that the secrecy won't last long and when that has been bypassed the system should remain secure. If it does, then there was no need to keep it secret in the first place.

The only other time this phrase is used is when talking about the level of security given by implementing NAT. Here the addresses of the internal machines are obscured and an attacker doesn't know how many machines are there or what the internal topology is. Of course NAT will only allow outgoing connections or connections to specific ports due to port forwarding, so that does reduce the chances of attacking some machines. However, a web server will still have ports 80 and 443 open and, if it isn't properly patched, will suffer in exactly the same way as if it wasn't behind NAT.

I'm not saying that you should tell everyone exactly how you have implemented your security, but you can't rely on secrecy to last. The important thing is to thoroughly test your security, preferably with an outside independent agency. This is particularly important if you want others to rely on your system and must include an audit of your code for software and settings for your hardware. Are customers more likely to trust an independent testing agency or a vendor trying to sell a product or system?

Clustering Technologies

I have been asked to give a very brief overview of the clustering technologies that we can utilise for high availability. We are, therefore, going to ignore high power computational clustering, as this is about more power rather than redundancy. The two main techniques that we use are a shared-resource cluster (usually some kind of disk array) and a network load balancing cluster, which does exactly what it says on the tin! We'll deal with each of these in turn here, but they can be used together to provide a complete solution.

The goals of Server Clusters are to share computing load over several systems whilst maintaining transparency for system administrators. There is also transparency for users, in that a user has no idea which node in the cluster they have connected to, or indeed that they are connected to a cluster at all as it will appear as one machine. If a component fails users will suffer degraded performance only, but do not lose connectivity to the service. If more power is required at a later date, it is easy to add more components to the cluster, thereby sharing the load across more systems.

Three principle features of clusters:
  • Availability - continued service in the event of the failure of a component
  • Scalability - new components can be added to compensate for increased load
  • Simplified management - administer the group of systems & applications as one unit

Shared-Resource Clustering


The Shared Resource Cluster is used for Line-of-business applications, e.g. database, messaging, file/print servers. Behind this technology is a physical shared resource, such as a direct attached storage device. Services then fail over from one node to another in the cluster, ensuring high availability of core, critical services. A rolling upgrade process enables the addition or upgrading of resources, which, when coupled with the high availability, ensures that line-of-business applications are online when needed. This technology removes the physical server as the single point of failure.

This type of cluster doesn't give us huge scalability like the Network Load Balancing cluster, as each node 'owns' the required storage at any one time and the others don't. Imagine that you are running a DataBase Management System (DBMS) on the cluster, then only one node can run the DBMS at any one time and all access to the DB is via that node. However, if it fails, then another node will 'seamlessly' take over. Now, in order for this to happen quickly and smoothly all nodes in the cluster will run the DBMS 'minimized', i.e. they will load it into memory, but not service any requests. Then, in the event of a failure, they can start responding very quickly after detecting the failure. The important thing to remember is that they are not all running the DBMS and servicing requests at the same time. Also, there is no replication to worry about, as the actual data is saved on the shared storage array that all nodes have access to. It is like ripping the hard disk out of one machine and putting it in the other. Of course, after a failover, we can failback the service when the node has been fixed.

The logical view of a shared resource cluster is shown here for a 2-node cluster running 4 services (actually, these are a bit made up as you wouldn't run a web server on this, you'd run that on an NLB cluster). Physically, there are two nodes (server boxes) in the cluster and we are running 2 services or virtual servers on each (the blue balls). This is not to be confused with virtualization with VMWare or Hyper-V, we aren't virtualizing whole servers, just services. However, these services are exposed as separate machines to the network.

The diagram shows the client view of the cluster, which has two physical nodes (CNode1.mycompany.com and CNode2.mycompany.com), which aren't directly accessed by clients, only for IT support, configuration and cluster services. However, each 'Virtual Server' is advertised in the DNS with a name and IP address, e.g. C_VS3.mycompany.com with an IP address of 10.1.1.6. Therefore, clients wanting to connect to Exchange will connect to C_VS3 on 10.1.1.6, which is usually running on CNode2 on 10.1.1.3. How does this happen? Well, a machine can have multiple IP addresses assigned to each network card. So CNode2 has a dedicated IP address of 10.1.1.3 as well as the IP addresses of the 'Virtual Servers' 10.1.1.6 and 10.1.1.7. If this node fails, then CNode1 will take over and these IP addresses will be assigned to it. In this way, the name and IP address of a service doesn't change even in the event of the failure of the node you were connected to.

Network Load Balancing Cluster


A Network Load Balancing Cluster (NLB) is used literally for load balancing network traffic and processing load across multiple machines, e.g. Web server farm, remote desktops, VPN & streaming media. This cluster type gives high availability and good scalability. Additional nodes can easily be added to the cluster to expand it and if any single node fails the cluster automatically detects the failure and redistributes the extra load, presenting a seamless service to users. This is achieved by load balancing all incoming IP traffic. Some of the benefits include the ability to scale Web applications by quickly and incrementally adding additional servers in a rolling-upgrade whilst ensuring Web sites are always online. The important distinction between this and the shared resource cluster is that all the nodes are running the same service with the same data at the same time, e.g. web servers with the same website on their local storage or a common network location. There are several different solutions to NLB, each with advantages and disadvantages. The most common are: RRDNS, central dispatcher and Negotiated Statistical Mapping.

The simplest form of NLB is to use Round-Robin Domain Name Service (RRDNS), which, as the name suggests, simply issues IP addresses from a list in a round-robin fashion. For example, if you have four web servers (web1.mycompany.com ... web4.mycompany.com) all serving the same website, then you can enter the Alias www.mycompany.com into your DNS pointing at each of the four nodes. When a client queries your DNS for the IP address of www.mycompany.com then they will be given the first node in the list, web1.mycompany.com. The next query to the DNS will result in the IP address of web2.mycompany.com being given out, etc. This has the advantages of being cheap and easy. Also, you don't need any special equipment or nodes that are aware of clustering technologies. However, the disadvantages are that the load is not distributed fairly and there is no detection of failed nodes. Imagine if every fourth query required very heavy processing, and the rest were simple GET requests. One node would get hammered and the others would sit around spinning their wheels. Also, if one node fails, the DNS server will still send every fourth query to that node as it doesn't know.

The second, and in a lot of ways most sophisticated, is the central dispatcher. This relies on a central device to receive all incoming requests and distribute them out among a set of nodes. This central device does not do the processing itself, it is merely a control to distribute work fairly and to healthy nodes only. The IP address of the central dispatcher is all that is advertised, but responses will come from the nodes directly. The advantages of this are that the dispatcher knows the capabilities of each node, so can distribute requests proportionately, and it knows the current workload of the nodes through querying. So, nodes won't get swamped, as the dispatcher will re-distribute the workload. It also means that you can run different services on different subsets of nodes. The disadvantages of this are cost and a single-point of failure - if your central dispatcher goes down, then the whole cluster is offline. Of course, you can have warm and hot-standby central dispatchers, but this can get very expensive.

The final method to look at here is negotiated statistical mapping. In this scenario, there is no central dispatcher or single-point of failure. The nodes all negotiate what load they will take and answer requests based on a statistical view of requests. Each node in the cluster will have two IP addresses, one dedicated address for it alone and the common cluster IP address. It is this common address that is advertised to clients requesting connection. In this way, all nodes in the cluster will receive all the packets for the cluster. The node that this request is mapped to will respond and all the others will discard the packet. Mapping can be done by individual IP addresses, subnets, etc. If one node fails, then the cluster will renegotiate the mappings and converge to a new model excluding the failed node - this happens within a couple of seconds. Similarly, when adding a new node, re-convergence is triggered and traffic will be distributed to the new node as well. The advantages of this are that it can be cheap to implement, as you have standard server hardware and there is only specific software to configure and control the cluster, and it has no single point of failure. However, there are disadvantages, namely that one node could get hammered again, as the nodes don't know what requests will actually come in as this is all based on statistical models. Also, it is more difficult to have subsets of nodes for different services.

Conclusions

There are two types of clusters to consider for high availability, namely the Shared Resource Cluster and the Network Load Balancing Cluster, which can be used separately or together to provide a complete clustered solution. The selection of which to deploy is based on the requirements of your service. An example is an e-commerce website, which would usually employ both technologies together in a complete solution - the web server farm at the front end will run a NLB cluster, whilst the backend database that they all access for live data will run on a Shared Resource cluster. Whose technology and implementation you choose will depend on budget, platform and feature requirements.

Thursday, 21 May 2009

How Reliable is RAID?

We all know that when we want a highly available and reliable server we install a RAID solution, but how reliable actually is that? Well, obviously, you can work it out quite simply as we will see below, but before you do, you have to know what sort of RAID are you talking about, as some can be less reliable than a single disk. The most common types are RAID 0, 1 and 5. We will look at the reliability of each using real disks for the calculations, but before we do, let's recap on what the most common RAID types are.

Common Types of RAID

RAID 0 is the Stripe set, which consists of 2 or more disks with data written in equal sized blocks to each of the disks. This is a fast way of reading and writing data to disk, but it gives you no redundancy at all. In fact, RAID 0 is actually less reliable than a single disk, as all the disks are in series from a reliability point of view. If you lose one disk in the array, you've lost the whole thing. RAID 0 is used purely to speed up disk access. If you have two disks your access will approach twice as fast; three disks are nearly three times as fast, etc. Actually, you don't quite achieve these performance gains due to overheads, etc.

RAID 5, on the other hand is the stripe set with parity, which can cope with one disk failing. You need a minimum of 3 disks to deploy RAID 5, and you will lose the capacity of one, that is to say that a 3-disk RAID 5 array using 0.5TB disks will have a capacity of 1TB, whereas the RAID 0 solution would have 1.5TB. Similarly, if we had a 5-disk RAID 5 array with the same disks as before then we would have 2TB at our disposal. The reason for this is that data is written in equal sized blocks to each of the disks as with RAID 0, but we write a parity block to one of the disks in each stripe. The diagram below explains. The parity is simply the bit wise Exclusive-OR of all the other blocks. It then becomes obvious that we can regenerate any one disk if it should fail by XORing all the remaining disks together. However, if we lose more than one disk, we've lost the lot again. RAID 5 arrays are also faster than single disk solutions, as we can read from and write to several disks at once, but due to the parity data, it will be slower than RAID 0.

Finally, we have RAID 1, which is the mirror set and consists of two identical disks. All data is written to both disks, so from a reliability point of view they are in parallel and the array is accessible with only one disk working. This means that mirroring two 500GB disks will only give 500GB of storage. You can read from both at once, but as you have to write to both at the same time this is slightly slower than a single disk due to the overheads. However, we can again cope with one disk failure. The power of this technique comes into its own when we start combining RAID arrays, e.g. we can mirror two RAID 5 arrays to cope with total failure of one array and a single disk failure in the other.

Reliability of Disks

In order to calculate the reliability and availability of our RAID array we need to know how reliable our disks are. Manufacturers can quote this in one of two ways: Failure Rate or Mean Time To Failure (MTTF). These figures can sometimes seem misleading, so we'll look at how they're related and what they actually mean in terms of reliability of your disk arrays. Actually, the MTTF and Failure Rate are related via a simple calculation, as the annualised failure rate is usually quoted, which is simply the percentage of disks that will fail in a particular year given the MTTF of the drives.

If we take an actual drive as an example, the Seagate Barracuda 500GB 32MB Cache ST3500320AS, it has a stated MTTF of 750,000 hours or around 85.6 years. This doesn't mean that the drive will actually last that long, it means that on average in 750,000 disk drive hours you will get one failure. So, if we have 1000 disks in our data centre then we will on average suffer a failure every 750 hours or one disk will fail every month, on average (the 'on average' is important, as you could suffer three disk failures this month and none for the next two, for example). These figures are arrived at by the manufacturers in exactly that way. They will run a set of disks (usually under heavy load and probably at high temperature) for a set time and look at how many failed. For example, if they ran 2,000 disks for a month and had 2 failures, then they would get (2000 x 744)/2 = 744,000 hours (assuming a 31-day month as 31 x 24=744).

So how does this relate to the annualised failure rate? Quite simply, if we have one disk failure every 750,000 hours what percentage fail in one year? The first step is to work out how many years 750,000 hours is, so we have 750,000/(24 x 365.25), which is approximately 85.6 years. To get the annualised failure rate we take the inverse, i.e. 1/85.6, which gives a failure rate of nearly 1.2%. Of course, you have to remember that these figures do not take into account any batch failures, i.e. a fault in manufacturing causing a whole batch of disks to be faulty or less reliable.

Calculating the Reliability of RAID

We can now use the annualised failure rate of the disks from the previous section to calculate the reliability of a RAID array. We will look at several scenarios to see how reliable, or not, the common types of RAID are. We will start with a 3-disk RAID 0 solution. Each disk has an annualised failure rate of 1.2% or a probability of failing of 0.012, which gives us a probability of 0.988 that the drive will still be running at the end of the year. Now a RAID 0 array has all the disks in series, i.e. all the disks must be working for the array to work. If any one disk fails then we have lost the whole array. Therefore, we have the probability of reliability:

So, we can see from this that a RAID 0 array might be faster than a single disk, but it is less reliable. OK, we knew that RAID 0 gave us no redundancy, so what if we look at RAID 1 or RAID 5? Which one of these is more reliable? Let's look at a 2-disk RAID 1 array (it can only really be 2-disk remember). In this case the drives are in parallel, so it will only fail if both drives fail; it will still work if either one or the other or both drives are still working. Therefore, we have the probability of reliability:

RAID 1 is clearly much more reliable than RAID 0 and a single disk solution. Now we'll look at a 3-disk RAID 5 solution. The complication here is that it isn't simply in parallel or series, the array will keep working in the event of any one disk failure or no failures. The easiest way to look at this is to add up all the probabilities of the situations where it is still running. A table makes it easy to see.

You can see from the table that we are interested in the first three rows, i.e. when the array is still working. We can now simply add these up to get the overall reliability for the array as follows:

So, we can see that this is much more reliable than a single disk solution, but less reliable than RAID 1. Indeed, the more disks you have in a RAID 5 array, the less reliable it becomes. In fact, a 5-disk RAID 5 array will have a reliability of around 0.99859. At what point does RAID 5 become less reliable than a single disk? If you work it out, it turns out that in this case a 14-disk RAID 5 array has almost the identical reliability to a single disk. Of course, you have much more storage and faster access, but no more reliability.

What if we were to combine these arrays, e.g. mirror a RAID 0 stripe set? Well, it's simply a matter of combining the reliabilities of each RAID 0 array in parallel. The reliability of a 3-disk RAID 0 array was approximately 0.964. If we now put this figure into the RAID 1 calculation above instead of the 0.988 disk reliability, this will give us a reliability of approximately 0.9987, more reliable than a 5-disk RAID 5 array. Of course, if we mirror a 3-disk RAID 5 array, then we would get a reliability of approximately 0.99999982 - very reliable.

The important thing here is to be able to interpret what manufacturers are saying and predict failures. To be able to do this you have to know how to calculate the reliability of your system. If you have 100 servers with 5-disk RAID 5 arrays in them, how many system failures will you get in a 5-year operating life cycle and how many disks will you need to replace? Disk replacement is simple, you have 500 disks in total with a MTTF of 750,000 hours, or one failure every 1500 hours, which is one failure every 62.5 days. If the life span of your data centre is 5 years, then you will get around 29 disk failures, but how many system failures? Well, the reliability of each system is 0.99859, which implies that the annualised failure rate for the system is 1 - 0.99859 = 0.00141. If this is the percentage of failures per year, then the MTTF will be 1/0.00141 which is approximately 709.22 years. We have 100 such systems, so we will get a failure in around 7 years 33 days. This means that we might well get 1 system failure during the 5-year life span, but we may not get any.

Trusteer or no trust 'ere...

...that is the question. Well, I've had more of a look into Trusteer's Rapport, and it seems that my fears were justified. There are many security professionals out there who are claiming that this is 'snake oil' - marketing hype for something that isn't possible. Trusteer's Rapport gives security 'guaranteed' even if your machine is infected with malware according to their marketing department. Now any security professional worth his salt will tell you that this is rubbish and you should run a mile from claims like this. Anyway, I will try to address a few questions I raised in my last post about this.

Firstly, I was correct in my assumption that Rapport requires a list of the servers that you wish to communicate with; it contacts a secure DNS server, which has a list already in it. This is how it switches from a phishing site to the legitimate site silently in the background. I have yet to fully investigate the security of this DNS, however, as most other companies would say their DNS is secure. They do also have an automatic update process, which needs to be tested in my opinion, as this could be the target of attack.

Another alarming thing I have discovered is the following:

"To protect you against phishing attacks Rapport learns the password (and
sometimes even the username) you use with protected websites"(ref).
What? Where and how does it store these? What hash function or encryption is it using? This is potentially a massive security flaw. I did try this feature out and it does ask if you want to remember the details, but in my opinion it should never do this. Now the hacker doesn't need a keylogger anyway, as they can attack the storage of the password! Talking of keyloggers, I was sure that Rapport couldn't protect against rootkits, malicious drivers and all malware keyloggers, and the proof can be found here (a video of someone logging the keystrokes when Rapport is used to protect the ING Direct login showing that Trusteer's Rapport can be bypassed or cracked). I know that some will say that this requires particular malware and this may be detected with your existing AV product. However, don't forget that Trusteer 'guarantee' security even on an infected system. They are also encouraging lax attitudes towards using AV products with their rhetoric.

With these problems in mind, I decided to install Rapport on a virtual Vista machine with no AV and start logging a few things. The install writes to the file system (obviously) and the registry. However, in use, it is writing to the file system, specifically a set of encrypted log files. Further investigation shows that they are using encrypted JavaScript files to access and write to these log files. Rapport also runs a service on your machine called RapportService and was using about 10MB RAM on my VM. This service protects the Rapport files from deletion or modification as far as I can see. On install, the boot sector is updated to run this service at startup. However, if you stop this, then you can play around with the files. (To do this you will need to boot into Safe Mode by running msconfig.exe and selecting this option in the Boot tab. If you do this, then Windows Defender may block Rapport from restarting.)

The files of interest seem to be stored on a per-user basis. There are lots of log files that are accessed each time you hit a Rapport site. The main ones seem to be rooksbas.log, koan.log, backend.log and backend-cfg.log. Of interest though are the .cfg files and the JavaScript files. You have to enter a code during install, which may be just for registration, but may also be some kind of seed for the key, because the program itself must have the encryption key for these files. In which case, it should be a matter of reverse engineering the code to find that key and then everything becomes open (but this is just a guess on my part and may not be true). They don't tell you what algorithms they use for this encryption though, which isn't always a good sign.

Running packet sniffing software on the machine whilst connecting to http://www.rbs.co.uk/ I found a few things. Firstly, RBS is using ATDMT to track users' habits and install tracking cookies - a form of malware! Rapport didn't pick up on this tracking cookie or block it. Also, it doesn't appear to contact Trusteer directly, however, it does tell the server that it is running, as "Trusteer-Rapport/3.5.0903" was added to the userAgent string and the following data was also sent to http://www.rbsdigital.com/: "X-Trusteer-Rapport: ver=3.5.0903.22; ak=C056E35A634C288C2BA683A7B21DBC6274417C4CBF7FCE0CBB561651EE30EB60; av=a0; rs=0.01372". This makes me wonder how their secure DNS server comes into play. This wasn't the first time I had gone to the RBS site though, so maybe it is cached, but again, where? Presumably in those encrypted log files.

The problem for Trusteer is that the more successful Rapport is, the more it will become a target for attack, and the less use it will be. They are trying to do something good and it is another level of protection, but the false claims make it dangerous. I believe that this product will make users complacent and take less care of their machine and credentials. Why bother having any form of AV product if Rapport protects my details anyway? People are being educated into thinking that if they see the green box at the top of the browser then they are safe and I think they will then throw caution to the wind. Even worse is that if they use a browser other than IE, then they have no protection at all. I don't think this is drummed into the users enough on the third party sites that use this.

It goes to show the old addage that a little learning is a dangerous thing. You are teaching users only part of the story and they will get lost in marketing hype and false claims. Trusteer should be open and honest about the capabilities of Rapport and push for more user education, then I would recommend their product. As it is (forgetting the compatibility issues) I cannot recommend that system administrators install this on their machines and let users believe that they are safe no matter what.

Edit: I have a new post here and a series of demo videos of Rapport blocking spyware.

Edit (10/4/10): I am still getting a lot of hits on this blog post so I thought that I ought to point out that Rapport as a product has matured a lot in the last year and many of the problems with compatibility, etc., have been sorted out. Also, the marketing has changed a lot to be much more realistic. If this is used as a layer in your overall security arsenal and is combined with user education, then it will help to protect your machine, data and identity. Download a keylogger for yourself and try using it before and after installing Rapport and you might see why your Banks are pushing it. I still think that the Banks have a duty to educate their users and to standardise the process of conducting online transactions and authentication to help users and stop many of the attack vectors currently being exploited.

Tuesday, 19 May 2009

Trusteer's Rapport

NatWest have just sent through an important information letter to their customers highlighting a new security solution to help secure their online banking. They are using Trusteer's Rapport product (more info). Anything people do to combat malware, phishing, pharming, etc., is a good thing, and for this they should be commended. However, Trusteer make some bold claims and I'm wondering how true they really are. This needs a lot more investigation than reading through their sales rhetoric, but I'm going to get some of my initial thoughts down here, then see what I can find out.

The problem statement is well defined by Trusteer and centres around lack of user education. In a previous blog entry I wrote about 2 successful phishing attacks against an organisation that only needed one person to send an email containing their password to bring the whole network down (here). Users need to be educated into not handing out secret or personal information to anyone who asks, e.g. a bank will never ask for your PIN number - why would they? Again, see my previous blog entry as to what is happening with phishing, pharming and brand hijacking. One thing I find alarming is their statement:
"Recent malware in the wild have proved to be capable of bypassing the most advanced multi-factor authentication and security controls put in place... At Trusteer labs we have identified malware that bypass device identification, hardware and software tokens, client-side certificates, SMS authentication and transaction verification, and even card-readers..."
Well, I can see how some of these can be done relatively easily, but not all. I realise that a man-in-the-middle attack will defeat most, but if we can secure ourselves from the man-in-the-middle attack then we're fine in most cases. This is a big IF, of course, but SSL goes some way towards this (although it is also flawed in many implementations, but not in the way most phishers mount their attacks). The problem also comes from particular implementations being predictable, e.g. the RSA token that can be cracked if you know the Serial Number and the codes.

On to the Rapport solution. They claim that they can protect against: Man-in-the-Browser, Man-in-the-Middle, Keyloggers, Session Hijacking, Screen Capturing, Pharming, Phishing and Phishing Malware. Without going through all of these I will look at a few points. Firstly, let's look at keyloggers. This can't combat hardware keyloggers. It claims that it can combat software keyloggers by encrypting "all keystrokes from keyboard to browser." As I don't have an encrypting keyboard or driver, how does this work? Rapport is a browser plug-in, not a new driver. How does this stop me from rewriting the drivers on the machine and logging all the keystrokes? One wonders about the case for stopping this as well, because most banks only get you to enter three random characters of your passphrase anyway, and most have drop-down lists of numbers to select from for online PIN numbers (not card PIN numbers, which are never asked for). Also, what's the value of logging my one-time password? It's not valid by the time the attacker gets it. Having said that, combating keyloggers is a worthy goal and something we should implement if available.

Man-in-the-Middle attacks and Pharming attacks are both defeated, because Rapport "diverts traffic to the real website." How? If I have poisoned an external DNS server to point to my IP address, how do you know it's wrong, unless you have a store of my IP address already? Where do you store that? Can I update it? If not, what happens when the bank does change its servers? If this is done via strong authentication, how does that work? Is this like SSL, which has to have a valid certificate? A Pharming website won't have a valid certificate, so does this mean I'm safe? Only if I take heed of the warnings and assuming they can't update the local machine's store of the certification authorities (which you can do). Rapport supports automatic updates; are these secure or can I break into the update process?

"Rapport transparently terminates the connection to the proxy server and
diverts traffic directly to the real website."

Again, how without a list of the servers? Also, what about technologies like Microsoft's CardSpace? I know, it's Microsoft, so can we trust it, etc. However, by utilising something like CardSpace, the user doesn't enter information via the keyboard or in normal usage; it is done in a protected mode. Users can't enter their PIN number into a card that doesn't accept that information - indeed users wouldn't be the originator of the card in this scenario, so wouldn't be able to add information to it anyway. This looks to see if you are submitting the 'card' to the same site as before, by looking at things like IP address, etc. Doesn't this help against Phishing, Pharming, keyloggers and man-in-the-middle attacks? Incidentally, Rapport is only supported on XP and Vista - the same as CardSpace.

I will try to have a better look at this product and see if I can find out what the exact technologies are behind it. It may well be that their technology is secure and does help guard against these attacks, but it seems on the surface to be a collection of current technologies rather than anything new. In the mean time, NatWest should follow many other banks and institutions and get an EV SSL certificate so that the browser bar goes green and the site is authenticated by the browser with the certification authority directly. This seems as though it should be done even if other mechanisms are in place. However, I do admit that this only truly works alongside user education, but so does any security solution, including Rapport.

Friday, 15 May 2009

APWG Report 2nd Half 2008

The Anti-Phishing Working Group produce two reports a year now on Phishing Activity Trends. I was reminded to look at the report from the second half of last year recently by problems encountered by an organisation I'm involved with(!), which has suffered two successful phishing attacks in the last 9 months. The two incidents both followed the same pattern: a phishing email was sent round purporting to be from the technical staff talking of lack of storage space on the user's account. They asked for the user's password in order to be able to reconfigure the quota. Now, the vast majority of people deleted this email, but on each occasion one person (a different one in each incident) replied with their username and password. This resulted in vast amounts of spam being sent through the users' email accounts, sufficient for the domain to be blacklisted by Message Labs in the first incident. The problem is that people still fall for this type of scam, and it only needs one person in an organisation to do it.

Anyway, back to the report. The APWG highlight a few interesting things in their report. Phishing reports and unique phishing sites detected both peaked in October, at 34,758 and 27,739 respectively, then fell sharply to December, when the figures were 23,187 and 15,709 respectively. These are way off the high of 55,643 detected in April 2007 and the lowest since August 2006, when they fell to 10,091. This is in sharp contrast to the password stealing malicious code URLs, which soared to 31,173 in December from 11,834 in November and a low of 3,113 in May last year. Although, interestingly, the number of unique keyloggers and malicious applications has dropped from 1,519 in July to 559 in December.




This trend has been followed by a new category of 'Rogue Anti-Malware Programs.' This is where fake applications are sold, purporting to be anti-malware (e.g. AV products) but either do little or nothing but make money off people buying it, or they become harvesters of information. The rise is from a low of 2,084 in September to 9,287 in December.



The final thing to note concerns Brand Hijacking. Brand hijacking is still high, with 269 brands hijacked in November, but the balance in the sectors is changing. Back in 2007 Financial Services accounted for 93.8% of the hijacked brands, which peaked at a high of 178 in November 2007. The remainder were: Retail 2.8%, ISP 2.2% and Government 1.2%. In Q3 2008 Financial had dropped to 61% and dropped further in Q4 2008 to only 46%. This shows a drop in real terms as well as percentage as around 124 financial brands were hijacked in Q4 2008 against around 167 in 2007. Retail has shown a drop from around 5 brands to around 3 and makes up for only 1% in Q4 2008. The 'big winner', if it can be described thus, is Payment Services, which is up to 38%, with Auction sites making up 11% and 4% for other brands. This 'other' category is up from 3% in Q3 2008, which is attributed to more attacks against MySpace, Facebook, etc., by the authors.



For more information about the Anti-Phishing Working Group, to report phishing attacks or to see their reports yourself, visit http://apwg.org/

Thursday, 14 May 2009

Log Attack Attempts

While I was answering emails and writing another blog post, I was reminded that lots of people ask me about the seriousness of the threat of attack as they are sceptical. So, I decided to turn on additional logging on my router to view all incoming traffic to see what it is blocking silently. I can tell you that in 2 hours on my home connection here I have had 22 different IP addresses making 47 different attempts to connect to me. One tried to launch a Denial-of-Service (DoS) attack on my web server three times (unsuccessfully), and another attached to my web server to view the options available and didn't bother retrieving any pages - this is a classic sign of footprinting before an attack. The first was host 123-204-6-150.dynamic.seed.net.tw and the second was host164-120-static.29-87-b.business.telecomitalia.it.

What were the other 20 connection attempts? Well I don't actually have anything else open on my network, but by logging all connection attempts I can view what people are trying. One tried to connect to port 3389 (Microsoft's RDP port) and then 3306 (mySQL), which is maybe a slightly strange combination - Microsoft's Terminal Services machine running open source mySQL? There were 4 attempts to connect to port 135 (Microsoft's DCOM Service Control Manager), 3 attempts to connect to port 445 (Microsoft's Directory Services), 4 attempts to connect to port 3128 (Active API Server), 2 attempts at port 2967 (SSC-Agent), 3 attempts at port 8000 (iRDMI, but sometimes used as secondary web port), and so on.

There are well-known attacks for these ports that are supported under tools such as the Metasploit Framework. So, this is an unlisted private broadband connection that had 22 attack attempts in 2 hours. What people have to realise is that computers are very good at doing repetitive, mind numbing tasks. You can scan a range of IP addresses very easily from tools that you can download for free. If they get a reply to these sweeps then they can investigate. If your firewall isn't blocking all these ports then you will get attacks. That's not the end of the story though. If your firewall responds to these connection attempts then you are telling the attacker that there is a machine there to be attacked. How do you stop this?

When one machine tries to connect to another over the network several things can happen, but the two main ones are: the port is open and a service is running on it accepting connections, in which case the connection will be successful; secondly, the port is not open and the service that would use that port isn't running, in which case your machine will respond telling the originating host that it isn't listening on that port. Now it is worth running a full port scan to find out what it is listening to (and yes, before you ask, you can defeat the port scanning alerts on firewalls fairly easily in most cases). This is what you must stop by stealthing unused ports, i.e. if a connection attempt comes in on a port you aren't using, discard the packet and do not reply. Most firewalls will do this for you if you configure it, but make sure that it is set up correctly.

One final note, always log attack attempts and never set your firewall to reply to pings on the WAN port!

P.S. I've had 7 more since starting to write this post.

InfoSecurity Europe 2009

Well, a couple of weeks ago I went to InfoSecurity Europe 2009 at Earl's Court, as I do every year. If you've never been, but are at all interested in network and information security or are looking for vendors, then I highly recommend visiting. To my surprise, and for the first time I can remember, Microsoft wasn't there. Apart from that, however, all the usual players were there and, to be honest, it was all much the same as before. There was no new emerging technology or hot topic, just new developments of old technologies. A couple of years ago we had the hot topic of 'social engineering' and 'securing the user'; ok, we all knew that the users were our weak link and phrases like "we spend all our time securing the first 2000 miles and forget about the last 2 feet", and "our network would be totally secure, reliable and fully functional if we didn't let users login", have always been commonplace, but there were new mass threats, new education programmes and new tools at the disposal of net admins to deal with them.

However, this year was different; nothing really sprang out. OK, cloud computing and Software-as-a-Service (SaaS) has expanded and matured, but other than that we see the same products and services as before. What amazes me is how some vendors and speakers can get it so wrong and don't appear to understand the actual level of security offered or operating environment in which their products will be used. I'm not going to list actual vendors here, but how can an encryption solution for mobile users that doesn't encrypt the data at rest be viable? Encrypting network traffic is commendable, but not the only safeguard required. What if they now lose their laptop, mobile or pen drive? A secure USB pen drive vendor admitted, when questioned, that files were decrypted into the C:\Temp folder while in use, then deleted after encrypting for storage on the drive again. They couldn't tell me if this was a secure delete or a simple removal of the pointer in the file allocation table as normal. Regardless of the deletion process, however, how many applications can read and write to that folder? What's to stop me from writing a very small bit of code to monitor that folder every few seconds and take copies? This might be secure enough if it is on a corporate machine, but why not just store the file on it then and not bother with the drive?

Similarly, email security is always a problematic area, with almost no solution fitting the technology properly. The big advantage of email, and the reason we all use it, is that it is independent of firmware - i.e. it doesn't matter what hardware, OS or email app we use, it still works. Unfortunately, security was never built in to email, so every email is like sending an electronic postcard. However, I would argue that a solution that only allows you to send encrypted email within your organisation is of limited value. What about all your customers and partners? There are also still solutions that store your files on their servers and send a link to the recipient. Why trust your files to them? I asked several vendors how they deal with password transfer, only to be told that they don't. "It's up to the user how they tell the recipient what the password is," was a common reply. We know users are unreliable, why leave it to them? I had to transfer some confidential files to someone via email recently (the only transport method they would accept), only to be told that they didn't want the hassle of decrypting it, so could I send them in plaintext. Having got over that hurdle, they wouldn't give me a mobile phone number so that I could transfer the passphrase via a call or SMS (as people seem to need them written down), asking me to email it to them. I know there are technologies out there to solve these problems, but they aren't without any problems.

This is turning into a bit of a rant and I'm getting off topic, so back to InfoSec. I was pleased to see that there were some voices of caution out there about the wholesale adoption of virtualisation without considering the security implications. One that springs to mind is Steve Moyle who has produced 10 points to consider implementing on his blog here. Virtualisation is a good technology for a number of reasons, but it does bring in new security threats and it must be implemented with these in mind and secured accordingly. I do also think that cloud computing and SaaS could be very important to SMEs (Small to Medium-sized Enterprises) who don't have the in-house expertise or large budget. They can still have large enterprise-level configurability and security, without the overheads. Finally, people seem to be taking information governance and user education seriously rather than just paying lip-service to them. In all, the show was encouraging, but many vendors are not quite there yet, which only goes to highlight that the majority of organisations are not taking the new threat-landscape seriously enough and countermeasures must be lacking.

Welcome to the RLR UK Blog

This blog is about network and information security issues primarily, but it does stray into other IT related fields, such as web development and anything else that we find interesting.

Tag Cloud

Twitter Updates

    follow me on Twitter

    Purewire Trust