What is Database?

A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full-text, numeric, and images.

In computing, databases are sometimes classified according to their organizational approach. The most prevalent approach is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses.

Formally, a “database” refers to a set of related data and the way it is structured or organized. Access to this data is usually provided by a “database management system” (DBMS) consisting of an integrated set of computer software that allows users to interact with one or more databases and provides access to all of the data contained in the database (although restrictions may exist that limit access to particular data). The DBMS provides various functions that allow entry, storage and retrieval of large quantities of information as well as provide ways to manage how that information is organized.

Because of the close relationship between them, the term “database” is often used casually to refer to both a database and the DBMS used to manipulate it.

Outside the world of professional information technology, the term database is often used to refer to any collection of related data (such as a spreadsheet or a card index). This article is concerned only with databases where the size and usage requirements necessitate use of a database management system.

Existing DBMSs provide various functions that allow management of a database and its data which can be classified into four main functional groups:

  • Data definition – Creation, modification and removal of definitions that define the organization of the data.
  • Update – Insertion, modification, and deletion of the actual data.
  • Retrieval – Providing information in a form directly usable or for further processing by other applications. The retrieved data may be made available in a form basically the same as it is stored in the database or in a new form obtained by altering or combining existing data from the database.
  • Administration – Registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control, and recovering information that has been corrupted by some event such as an unexpected system failure.

Both a database and its DBMS conform to the principles of a particular database model. “Database system” refers collectively to the database model, database management system, and database.

Physically, database servers are dedicated computers that hold the actual databases and run only the DBMS and related software. Database servers are usually multiprocessorcomputers, with generous memory and RAID disk arrays used for stable storage. RAID is used for recovery of data if any of the disks fail. Hardware database accelerators, connected to one or more servers via a high-speed channel, are also used in large volume transaction processing environments. DBMSs are found at the heart of most database applications. DBMSs may be built around a custom multitasking kernel with built-in networking support, but modern DBMSs typically rely on a standard operating system to provide these functions. Since DBMSs comprise a significant economical market, computer and storage vendors often take into account DBMS requirements in their own development plans.

Databases and DBMSs can be categorized according to the database model(s) that they support (such as relational or XML), the type(s) of computer they run on (from a server cluster to a mobile phone), the query language(s) used to access the database (such as SQL or XQuery), and their internal engineering, which affects performance, scalability, resilience, and security.

Easy Lesson on 1-Tier vs 2-Tier vs 3-Tier

Three-tier or multi-tier architecture is often used when describing how clients connect to database servers. But what does it all mean?

Let me try to explain this in non-technical terms (or as close to it I can get).

 

Software 

Let’s first take a look how a database software program (the software) works.

There are three major tiers to the software:

  • User Interface (UI). This is what you see when you work with the software. You interact with it. There  might be buttons, icons, text boxes, radio buttons, etc. The UI passes on clicks and typed information to the Business Logic tier.
  • Business Logic (BL). The business logic is code that is executed to accomplish something. When a user clicks a button it will trigger the BL to run some code. The BL can send information back to the UI, so the user can see the result of clicking a button or typing something in a field. For instance when you enter something in a cell in Excel, the BL will recalculate other cells once you hit Enter and the UI will present the new information to you. The BL also needs to be able to store and retrieve data and that is handled in the Database tier.
  • Database (DB). The database is where the data is stored and where the BL can retrieve it again.

 

1-Tier Architecture

This architecture has the UI, the BL, and the DB in one single software package. Software applications like MS Access, MS Excel, QuickBooks, and Peachtree all have the same in common: the application handles all three tiers (BL, UI, and DB). The data is stored in a file on the local computer or a shared drive. This is the simplest and cheapest of all the architectures, but also the least secure. Since users have direct access to the files, they could accidentally move, modify, or even worse, delete the file by accident or on purpose. There is also usually an issue when multiple users access the same file at the same time: In many cases only one can edit the file while others only have read-only access.

Another issue is that 1-tier software packages are not very scalable and if the amount to data gets too big, the software may be very slow or stop working.

So 1-tier architecture is simple and cheap, but usually unsecured and data can easily be lost if you are not careful.

00223092014

2-Tier Architecture

This architecture is also called Client-Server architecture because of the two components: The client that runs the application and the server that handles the database back-end. The client handles the UI and the BL and the server handles the DB. When the client starts, it establishes a connection to the server and communicates as needed with the server while running the client. The client computer usually can’t see the database directly and can only access the data by starting the client. This means that the data on the server is much more secure. Now users are unable to change or delete data unless they have specific user rights to do so.

The client-server solution also allows multiple users to access the database at the same time as long as they are accessing data in different parts of the database. One other huge benefit is that the server is processing data (DB) that allows the client to work on the presentation (UI) and business logic (BL) only. This mean that the client and the server are sharing the workload and by scaling the server to be more powerful than the client, you are usually able to load many clients to the server allowing more users to work on the system at the same time and at a much greater speed.

00323092014

3-Tier Architecture

In this architecture all three tiers are separated onto different computers. The UI runs on the client (what the user is working with). The BL is running on a separate server, called the business logic tier, middle tier, or service tier. Finally the DB is running on its own database server.

In the client-server solution the client was handling the UI and the BL that makes the client “thick”. A thick client means that it requires heavy traffic with the server, thus making it difficult to use over slower network connections like Internet and Wireless (4G, LTE, or Wi-Fi).

By introducing the middle tier, the client is only handling presentation logic (UI). This means that only little communication is needed between the client and the middle tier (BL) making the client “thin” or “thinner”. An example of a thin client is an Internet browser that allows you to see and provide information fast and almost with no delay.

As more users access the system a three-tier solution is more scalable than the other solution because you can add as many middle tiers (running on each ownserver) as needed to ensure good performance (N-tier or multiple-tier).

Security is also the best in the three-tier architecture because the middle tier protects the database tier.

There is one major drawback to the N-tier architecture and that is that the additional tiers increase the complexity and cost of the installation.

00423092014

1-Tier

2-Tier Multi-Tier
Benefits Very simple

Inexpensive

No server needed

Good security

More scalable

Faster execution

 

Exceptional security

Fastest execution

“Thin” client

Very scalable

Issues

Poor security

Multi user issues

More costly

More complex

“Thick” client

Very costly

Very complex

Users

Usually 1 (or a few)

2-100

50-2000 (+)

 

 

Interview Question: Difference between 1-tier/2-tier & 3-tier architecture?

Tier” can be defined as “one of two or more rows, levels, or ranks arranged one above another“.

1-Tier Architecture is the simplest, single tier on single user, and is the equivalent of running an application on a personal computer. All the required component to run the application are located within it. User interface, business logic, and data storage are all located on the same machine. They are the easiest to design, but the least scalable. Because they are not part of a network, they are useless for designing web applications.

2-Tier Architectures supply a basic network between a client and a server. For example, the basic web model is a 2-Tier Architecture. A web browser makes a request from a web server, which then processes the request and returns the desired response, in this case, web pages. This approach improves scalability and divides the user interface from the data layers. However, it does not divide application layers so they can be utilized separately. This makes them difficult to update and not specialized. The entire application must be updated because layers aren’t separated.

3-Tier Architecture is most commonly used to build web applications. In this model, the browser acts like a client, middleware or an application server contains the business logic, and database servers handle data functions. This approach separates business logic from display and data.So the 3 layers commonly known as:Presentation Layer(PL/UI),Business Logic Layer(BLL) & Data Access Layer(DAL).

00123092014

 

Learn more about these architectures at Easy Lesson on 1-Tire vs 2-Tire vs 3-Tire – Winged Post

CSharp Interview Question: What is Action in CSharp

One of my friends called me after his interview as for a developer role in an esteemed organization. One of the question, interviewer asked him,

What is Action

 

After talking to him, I thought why not to blog about it? I am trying here to use minimum words and optimum code samples can be discussed with interviewer as answer of this question.

Action is type of delegate

  1. It returns no value.
  2. It may take 0 parameter to 16 parameters.

For example below Action can encapsulates a method taking two integer input parameter and returning void.

imgp10

So if you have method like below,

imgp9

You can encapsulate method Display in Action MyDelegate as below,

imgp8

An Action with one input parameter is defined in System namespace as below,

imgp7

Where in T is type of input parameter and T obj is value passed for the parameter.

Action with Anonymous method

You can work with Action and anonymous method as well. You can assign anonymous method to Action as below,

imgp6

Above code will print 9 as output.

Action with Lambda Expression

Like any other delegates, Action can be worked with lambda expression also as below,

imgp5

Above code will also print 9 as output.

Passing Action as input parameter

You can pass Action as parameter of a function also. Let us say you have a class

imgp4

And two functions called Display and Show to display Name and RollNumber of Student.

imgp3

Now you got a function where you need to pass either Display or Show. Or in other words you need to pass any function with the same signature of Display or Show. In that case you will be passing a delegate as input parameter to the function.

imgp2

You can call CallingAction method in Main as below,

imgp1

Above we are creating instance of Student class and one by one passing Display function and Show function as input parameter to CallingAction function. In CallingAction function, we are printing name of the function being passed as input parameter. On running you will get below output.

output action

I hope now you would be able to answer what is Action in simple words. I hope this post is useful. Thanks for reading.

 

Types of Constructor in .Net

Constructor is nothing but a function (with the same name as Class) and is used for initializing the members of a class whenever an class object is created. You can initialize the members with any value (as per member datatype) or with the default values through constructor.

If a class is not defined with the constructor then the CLR (Common Language Runtime) will provide an implicit constructor which is called as Default Constructor.

A class can have any number of constructors provided they vary with the number of arguments that are passed, which is they should have different signatures.

Some of the basic properties of Constructor are :

  1. Constructors do not return a value.
  2. Constructors can be overloaded.
  3. If a class is defined with static and Non-static constructors then the privilege will be given to the Non-static constructors.

Using the code

A. Public Constructor:

These are the most common, widely used and simple to implement constructors in Object Oriented Programming. The constructor is defined as public and get be called whenever a class is instantiated from external location of your program. Public Constructors is also known as default constructor of any class.

Also the Classes without a specified public constructor will have an implicit public constructor and maintained internally by CLR. Public constructor cannot be used in abstract class because public constructors always create instances of a type, and you cannot create instances of an abstract type.

[ads1]

using System;

class MyClass
{
 // defalut public constructor
 public MyClass()
 {
   Console.WriteLine("This is default constructor");
 }
 //public constructor with parameter
 public MyClass(int a)
 {
  Console.WriteLine("Value is "+a);
 }
}

class MyProgram
{
// start execution here from main method.
static void Main()
 {
 // initialize the class object
 MyClass myclass1 = new MyClass(); // default constructor
 MyClass myclass2 = new MyClass(5);
 Console.ReadLine();
 }
}

B. Protected Constructor:

A constructor is defined as protected in such cases where the base class will initialize on its own whenever derived types of it are created.

Means protected constructor is used with an abstract class where object initialization could not happen (because of abtract nature). The base class initialization happen only when derived class is initialized and call base class constructor.

 

public class Base
{
 protected Base() { }
 protected static void staticFoo() { }
 protected void instanceFoo() { }
}

public class Derived : Base
{
 Derived()
 : base() // initialize protected constructor through derived constructor
 { }
// start execution here from main method.
 void Main()
 {
  Base b = new Base(); // Compiler Error: Can not access protected member
  b.instanceFoo(); // Compiler Error: Can't call a protected instance method on a Base.
  Base.staticFoo(); // You call protected static method
  Derived d = new Derived();
  d.instanceFoo(); // Can call an inherited protected instance method on a Derived
  Derived.staticFoo(); // Can access static method of base class.
 }
}

C. Private Constructor :

Private constructor is a soul of Singleton pattern and used whenever we have to restrict the initializatio of class using ‘new’ keyword from outside world. The private modifier is usually used explicitly to make it clear that the class cannot be instantiated.

A private constructor is a special instance constructor. It is commonly used in classes that contain static members only. If a class has one or more private constructors and no public constructors, then other classes (except nested classes) are not allowed to create instances of this class.

 

public class MyClass
{
 private MyClass() { } // Private constructor
 public static int iCount;
 public static int IncrementCount()
 {
  return ++iCount;
 }
}
class MyTestClass
{
 // start execution here from main method.
 static void Main()
 {
  MyClass myclass = new MyClass(); // Compiler Exception: Can't initialize class because of private constructor
  MyClass.iCount = 100; // static variable directly accessible and initialized
  MyClass.IncrementCount(); // Direct call to static method
  Console.WriteLine("New value: {0}", MyClass.iCount);
 }
}

Private constructor plays a very special role in singleton design pattern. Singleton class is same as static class and both are used relatively for similar purpose (to provide only one “instance”) but still they have major difference.

  1. Singletons class can implement interfaces or derive from useful base classes.

  2. Singleton class object can be use as a parameter which is not possible for Static class.

  3. Singletons can be handled polymorphically without forcing their users to assume that there is only one instance.

 

class Program
{
static void Main(string[] args)
{
// get the class object without initializing through 'new' keyword
MyClass instance = MyClass.Instance;
}
}
/// <summary>
/// Sealed class. Implemented through Singleton pattern.
/// </summary>
public sealed class MyClass
{
// The internal static valiable of class object
static readonly MyClass _instance = new MyClass();
/// <summary>
/// This is a private constructor, meaning no outsiders have access.
/// </summary>
private MyClass()
{
// initailize the internal variables
}
/// <summary>
/// The Instance object of an MyClass class.
/// Return the _instance object everytime whenever the MyClass class is initialized.
/// </summary>
public static MyClass Instance
{
 get { return _instance; }
}
}

D. Static Constructor :

A static constructor is used to initialize any static data, or to perform a particular action that needs to be performed only once.

It is called automatically before the first instance is created or any static members are referenced. Static Constructor is special in nature as a static constructor does not take access modifiers or have parameters and can’t access any non-static data member of a class.

A static constructor cannot be called directly and the user has no control on when the static constructor is executed in the program.

A static constructor is called automatically to initialize the class before the first instance is created or any static members are referenced. The static constructors are basically useful when creating wrapper classes for unmanaged code.

 

class Program
{
 static void Main(string[] args)
 {
 MyClass.foo(); // call the static member directly
 MyClass myclass = new MyClass();
 myclass.foo(10);
 Console.ReadLine();
 }
}

public class MyClass
{
 static int a;
 int b;
 // Static constructor:
 static MyClass()
 {
  a = 10; // initialize the static variable
  //b = 20; // Compile Exception : non-static member can't be access through static constructor
  Console.WriteLine("I am in static constructor");
 }
 // non-static constructor
 public MyClass()
 {
  a = 10; // non-static constructor can initialize both static and non-static members
  b = 20;
  Console.WriteLine("I am in non-static constructor");
 }
 public static void foo()
 {
  Console.WriteLine("I am in static foo");
 }
 public void foo(int a)
 {
  Console.WriteLine("I am in non-static foo");
 }
}

Something More

Constructor Overloading : A constructor with zero arguments is known as the default constructor. A constructor can take zero or more arguments as parameters. The program have multiple constructors in a class with different sets of signatures. Such constructors are known as “overloaded constructors”. The overloaded constructors of a class must differ in their number of arguments, type of arguments, and/or order of arguments. This gives the user the ability to initialize the object in multiple ways.

The class in the program shown below contains three constructors. The first is the default constructor, followed by the two argument constructors. Note that the constructors differ in their signature.

using system;
public class MyClass
{
 public MyClass()
 {
 //Default constructor
 }
 public MyClass(int sampleValue)
 {
 // This is the constructor with one parameter.
 }
 public MyClass(int firstValue, int secondValue)
 {
 // This is the constructor with two parameters.
 }
}

Constructor Chaining : 

Constructor chaining refers to the ability of a class to call one constructor from another constructor. In order to call one constructor from another, we use base (parameters) or : this (parameters) just before the actual code for the constructor, depending on whether we want to call a constructor in the base class or in the current class.

using system;
public class MyClass
{
public MyClass(): this(10)
{
// This is the default constructor
// calling base class constructor and passing value 10 as parameter
}
public MyClass(int firstValue)
{
// This is the constructor with one parameter.
}
}

Conclusion

The best practice is to always explicitly specify the constructor, even if it is a public default constructor. Proper design of constructors goes a long way in solving the challenges faced in class designs.

 

Something You Are : Fingerprint

Fingerprint

 

Since people forget things and lose things, one might contemplate basing an authentication scheme for humans on something that a person is. After all, we recognize people we interact with not because of some password protocol but because of how they look or how they sound — “something they are”. Authentication based on “something you are” will employ behavioral and physiological characteristics of the principal. These characteristics must be easily measured accurately and preferably are things that are difficult to spoof. For example, we might use

  • Retinal scan
  • Fingerprint reader
  • Handprint reader
  • Voice print
  • Keystroke timing
  • Signature

To implement such a biometric authentication scheme some representation for the characteristic of interest is stored. Subsequently, when authenticating that person, the characteristic is measured and compared with what has been stored. An exact match is not expected, nor should it be because of error rates associated with biometric sensors. (For example, fingerprint readers today normally exhibit error rates upwards of 5%.)

Methods to subvert a fingerprint reader give some indication of the difficulties of deploying unsupervised biometric sensors as the sole means of authenticating humans. Attacks include:

    • Steal a finger. Difficult to do without the owner of the finger noticing. Good supervision of the biometric sensor defends against this attack. 
    • Steal a fingerprint. Lifting a fingerprint is not that hard (at least, according to those TV crime-drama shows). Again, though, good human supervision of the biometric sensor defends against this attack because a guard will notice if somebody is not inserting a naked finger into the reader. 
    • Replace the biometric sensor. At first glance, this type of attack might seem even more difficult to execute than the two above. Social enginnering might be easier for the attacker to employ, here, though. It suffices that the guard believe that the senor should be changed (maybe because the the old one is “broken”).

 

There are several well known problems with biometric-based authentication schemes:

  • Reliability of the method. Similarity of physical features (faces, hands, or fingerprints) and inaccuracy of measurement may together conspire to create an unacceptably high false acceptance rate (FAR). 
  • Cost and availability. Currently, some readers cost $40-50 and more. Are end users willing to pay that much for an authentication method that does not work as well as passwords? 
  • Unwillingness or inability to interact with biometric input devices. Some people are uncomfortable putting a body part into a machine; some are uncomfortable having lasers shined in their eyes for a retinal scans; and some don’t have fingers or eyes to be measured. 
  • Compromise the biometric database or system. It might be possible to circumvent the system’s biometric sensor and provide an “input” from another source. The sensor is, after all, connected to a system and hijacking that channel might be possible. Knowledge of the stored representation for a characteristic would then allow an attacker to inject the correct characteristic and impersonate anyone. 
  • Revocation. What does it mean to revoke a fingerprint?

The literature on biometric authentication uses the following vocabulary to characterize what a scheme does and how well it works:

    • FAR: (false acceptance rate). This is the probability that the system will fail to reject an impostor (aka FMR: false match rate)

 

    • FRR: (false reject rate). This is the probability that the system will reject a bona fide principal. (aka FNMR: false non-match rate)

 

    • One-to-one matching: Compare live template with a specific stored template in the system. This corresponds to authentication.

 

  • One-to-many matching: Compare live templates with all stored templates in the system. This corresponds to identification.

Something You Have : Card System

Smart Card

Instead of basing authentication on something a principal knows and can forget, maybe we should base it on something the principal has. Various token/card technologies support authentication along these lines. For all, 2-factor authentication becomes important — an authentication process that involves 2 independent means of authenticating the principal. So, we might require that a principal not only possess a device but also know some secret password (often known as a PIN, or personal identification number). Without 2-factor authentication, stealing the device would allow an attacker to impersonate the owner of the device; with 2-factor authentication, the attacker would still have another authentication burden to overcome.

Here are examples of technologies for authentication based on something a principal might possess:

  • A magnetic strip card. (eg. Cornell ID, credit card) One serious problem with these cards is that they are fairly easy to duplicate. It only costs about $50 to buy a writer, and it’s easy to get your hands on cards to copy them. To get around these problems, banks implement 2-factor authentication by requiring knowledge of a 4 to 7 character PIN whenever the card is used. Magnetic Strip CardShort PINs are problematic. First, they admit guessing attacks. Banks defend against this by limiting the number of guesses before they will confiscate the card. Second there is the matter of how to check if a PIN that has been entered is the correct one. Storing the PIN on the card’s magnetic stripe is not a good idea because a thief who steals the card can easily determine the associated PIN (and then subvert the 2-factor authentication protocol). Storing an encrypted copy of the PIN on the card’s magnetic stripe does not exhibit this vulnerability, though.
  • Proximity card or RFID. These cards transmit stored information to a monitor via RF. There is currently a debate in this country as to the merits of using RF proximity cards (RFID tags) for identification of people and products. Walmart speaks about puttung RFID tags on every product they shelve, and both the German and U.S. governments are including them in passports. With RFID tags on Walmart products, for example. then somebody with a suitable receiver could tell what you have purchased (even though your purchase is hidden in a bag) — and this is seen by some as a privacy violation. With RFID tags in passports, somebody with a suitable receiver could remotely identify on the street citizens of a given country and single them out for “special treatment” (likely unpleasant). RFID CardThere are two types of RF proximity cards: passive and active. The former is not powered, and use the RF energy from the requester to reply with whatever information is being stored by the card. The latter is powered and broadcasts information, allowing anyone who is in range and has a receiver to query the card. You could imagine that if RF tags are put into passports, then some people might start carrying them in special Faraday-cage passport holders, because now an interloper can learn about someone without the victim’s knowledge (or permission).
  • Challenge/Response cards and Cryptographic Calculators. These are also called smart cards and perform some sort of cryptographic calculation. Sometimes the card will have memory, and sometimes it will have an associated PIN. A smart card transforms the authentication problem for humans, because we are no longer constrained by stringent computational and storage limitations. Unfortunately, today’s smart cards are vulnerable to power-analysis attacks. Furthermore, one must exercise care in using a cryptographic calculator — if it is used to generate digital signatures, for example, then somehow the device owner must be made aware of what documents are being signed. Smart CardOne prevalent form of smartcard is the RSA secure id. It continuously displays encrypted time; and each RSA secure id encrypts with a different key. Whoever has an RSA secure id card responds to server challenges by typing the encrypted time (so, in effect, it is secret) — a server, knowing what key is associated with each user’s card, can then authenticate a user. (The server must be somewhat generous with respect to what times it will accept. Accept too many and replay attacks become possible; accept too few and message delivery delays and execution times prevent people from authenticating themselves).

Something You Know : Password

Something You Know (Password)

The idea here is that you know a secret — often called a password — that nobody else does. Thus, knowledge of a secret distinguishes you from all other individuals. And the authentication system simply needs to check to see if the person claiming to be you knows the secret.

Unfortunately, use of secrets is not a panacea. If the secret is entered at some sort of keyboard, an eavesdropper (“shoulder surfing”) might see the secret being typed. For authenticating machines, we used challenge/response protocols to avoid sending a secret (key) over the wire where it could be intercepted by a wiretapper. But we can’t force humans to engage in a challenge/response protocol on their own, because people cannot be expected to do cryptographic calculations.

Furthermore, people will tend to choose passwords that are easy to remember, which usually means that the password is easy to guess. Or they choose passwords that are difficult to guess but are also difficult to remember (so the passwords must be written down and then are easy for an attacker to find).

Even if a password is not trivial to guess, it might succumb to an offline search of the password space. An offline search needs some way to check a guess without using the system itself, and some methods used today for storing passwords do provide such a way. (See below.)

Finally, changing a password requires human intervention. Thus, compromised passwords could remain valid for longer than is desirable. And there must be some mechanism for resetting the password (because passwords will get forgotten and compromised). This mechanism could itself be vulnerable to social-engineering attacks, which rely on convincing a human with the authority to change or access information that it is necessary to do so.

With all these concerns about passwords, you might wonder what is required for a password to be considered a good one. There are three dimensions, and they interact so that strengthening one can be used to offset a weakness in another.

  • Length. This is the easiest dimension for people to strengthen. Longer passwords are better. A good way to get a long password that is seemingly random yet easy to remember is to think of a passphrase (like the first words of a song) and then generate the password from the first letters of the passphrase. 
  • Character set. The more characters that can be used in a password, the greater the number of possible combinations of characters, so the larger the password space. To search a larger password space require doing more work by an attacker. 
  • Randomness. Choose a password from a language (English, say) and an attacker can leverage regularities in this language to reduce the work needed in searching the password space (because certain passwords are now “impossible”). For instance, given the phonotactic and orthographic constraints of English, an attacker searching for an English word need not try passwords containing sequences like krz (although this would be a perfectly reasonable to try if the password was known to be in Polish). Mathematically, it turns out that English has about 1.3 bits of information per character. Thus it takes 49 characters to get 64 bits of “secret”, which comes out to about 10 words (at 5 characters on average per word). 

When passwords are used for authenticating a user, the system must have a way to check whether the password entered is valid. Simply storing a file with the list of usernames and associated passwords, however, is a bad idea because if the confidentiality of this file were ever compromised all would be lost. (Similarly, backup copies of this file would have to be afforded the same level of protection, since people rarely ever change their passwords.) Better not to store actual passwords on-line. So instead we might compute a cryptographic hash of the password, and store that. Now, the user enters a password; the system computes a hash of that password; and the system then compares that hash with what has been stored in the password file.

Even when password hashes instead of actual passwords are what is being stored, the integrity of this file of hashes must still be protected. Otherwise an attacker could insert a different hash (for a password the attacker knows) and log into the system using that new password.

The problem with having a password file that is not confidential — even if cryptographic hashes are what is being stored — is the possibility of offline dictionary attacks. Here, the attacker computes the hash of every word in some dictionary and then compares each hash with the stored password hashes. If any match, the attacker has learned a password. An alternative to confidentiality for defending against offline dictionary attacks is use of salt. Salt is a random number that is associated with a user and is added to that user’s password when the hash is computed. With high probability, a given pair of users will not have the same salt value. And the system stores both h(password + salt) and the salt for each account.

Salt does not make it more difficult for an attacker to guess the password for a given account, since the salt for each account is stored in the clear. What salt does, however, is make it harder for the attacker to perpetrate an offline dictionary attack against all users. When salt is used, all the words in the dictionary would have to be rehashed for every user. What formerly could be seen as a “wholesale” attack has been transformed into a “retail” one.

Salt is used in most UNIX implementations. The salt in early versions of UNIX was 12 bits, and it was formed from the system time and the process identifier when an account is created. Unfortunately, 12 bits is hopelessly small, nowadays. Even an old PC can perform 13,000 crypt/sec, which means such a PC so can hash a 20k word dictionary with every possible value of a 12 bit salt in 1 hour.

Secret Salt

Another defense against offline dictionary attacks is to use secret salt (invented by Manber and independently by Abadi and Needham). In this scheme, we select a small set of possible “secret salt” values from a large space. The password file then stores for each user: userid, h(password, public salt, secret salt), public salt. Note that the value of the secret salt used in computing the hash is not saved anyplace. When secret salt is being employed, a user login involves having the system guess the value of secret salt that was used in computing the stored, hashed password; the guess involves checking through the possible secret salt values. The effect is to make computing a hashed password very expensive for attackers.

Examples of Password Systems

We now outline several widely-used password systems.

  • Unix. Unix stores a hashed salted password and salt. For the hash, it iterates DES 25 times with an input of “0” and with the password as the key; it then adds the 12-bit salt. As discussed above, this is not strong enough for today’s machines. Some versions of Unix employ a shadow password file, so that it is harder for an attacker to retrieve the hashed passwords. There are then two files: /etc/shadow and/etc/master.password
  • FreeBSD. FreeBSD stores a hashed password (where the hash is based on MD5). There is no limit to the length of the password, and 48 bits of salt are used. 
  • OpenBSD. OpenBSD does a hash based on blowfish encryption, and then stores the hashed password along with 128 bits of salt. The system guarantees that no two accounts will have the same salt value. 
  • Windows NT/2000/XP. NT stores 2 password hashes: one called the LanMan hash and another called the NT hash. The LanMan hash is used for backwards compatibility with Windows 95/98, and it is a very weak scheme. The following diagram shows how it works.
    Windows NT Password Flow
    Windows NT/2000/XP. NT stores 2 password hashes: one called the LanMan hash and another called the NT hash. The LanMan hash is used for backwards compatibility with Windows 95/98, and it is a very weak scheme.

    To see the weakness, consider how much work an attacker would have to do to break this scheme. The numbers and uppercase letters together make up 36 characters. Each half of a 14-character password then has 367 possible values, which comes out as 78,364,164,096. The actual work factor then is 2 x 367 (whereas the theoretical work factor for 14 characters is 3614 = 367 x 367). 

    Note that if upper and lower case were both allowed, then there would be (2 x 26) + 10 = 62 possible characters and thus 627 = 3,512,614,606,208 possible values, which is 100 times greater than the LanMan value.

    The NT hash is somewhat better. In the NT operating system, there was still a 14 character limit, although this limit was removed in Windows 2000 and XP. The password is then passed through 48 iterations of MD4 to get a 128 bit hash. This hash is stored in the system, but no salt is used at all.

Defense Against Password Theft: A Trusted Path

Given schemes that make passwords hard to guess, an attacker might be tempted to try theft. The attack is: install some some sort of program to produce a window that resembles a login prompt or otherwise invites the user to reveal a password. Users will then type their passwords into this program, where the password is saved for later use by the attacker.

How can you defend against such attacks? What we would like is some way for a user to determine the pedigree of any window purporting to be a login prompt. If each point in the pedigree is trusted, then the login prompt window must be trusted and it is safe to enter a password. This idea is called a trusted path.

To implement a trusted path, the keyboard driver recognizes a certain key sequence (Ctl-Alt-Del in Windows) and always then transfers control to some trusted software that displays a (password prompt) window and reads the contents. Users are educated to type passwords only into windows that appear after typing that special key sequence.

Notice, however, that this scheme requires that a trusted keyboard driver is executing. So, that means the system must be running an operating system that is trusted to prevent keyboard driver substitutions. One might expect that rebooting the machine would be a way to ensure that a trusted operating system is executing (presuming you trust whatever operating system is installed), but what if the OS image on the disk had been altered by an attacker? So, one must be certain that the operating system software stored on the disk has not been modified, too. But even that’s not enough. What about the boot loader, which might have been altered to read a boot block from a non-standard location on the disk? And so it goes. Even if you start each session by booting from your own fresh OS CD, a ROM or even the hardware might have been hacked by an attacker. Physical security of the hardware then must also have been maintained. In the end, though, to the extent that you can trust all layers from the hardware to the keyboard driver, the resulting trusted path provides a way to defend against attacks implemented by programs that attempt to steal passwords by spoofing.

Methods for authentication

Methods for authenticating people differ significantly from those for authenticating machines and programs, and this is because of the major differences in the capabilities of people versus computers. Computers are great at doing large calculations quickly and correctly, and they have large memories into which they can store and later retrieve Gigabytes of information. Humans don’t. So we need to use different methods to authenticate people. In particular, the cryptographic protocols we’ve already discussed are not well suited if the principal being authenticated is a person (with all the associated limitations).

All approaches(methods for authentication) for human authentication rely on at least one of the following:

  • Something you know (eg. a password). This is the most common kind of authentication used for humans. We use passwords every day to access our systems. Unfortunately, something that you know can become something you just forgot. And if you write it down, then other people might find it.
  • Something you have (eg. a smart card). This form of human authentication removes the problem of forgetting something you know, but some object now must be with you any time you want to be authenticated. And such an object might be stolen and then becomes something the attacker has.
  • Something you are (eg. a fingerprint). Base authentication on something intrinsic to the principal being authenticated. It’s much harder to lose a fingerprint than a wallet. Unfortunately, biometric sensors are fairly expensive and (at present) not very accurate.

Local Variable Type Inference

Local Variable Type Inference

Local variables can be given an inferred “type” of var instead of an explicit type. The var keyword instructs the compiler to infer the type of the variable from the expression on the right side of the initialization statement. The inferred type may be a built-in type, an anonymous type, a user-defined type, or a type defined in the .NET Framework class library

With this feature, the type of the local variable being declared is inferred from the expression used to initialize the variable. This is achieved using the var keyword (familiar to those who work with scripting languages, but actually it is quite different). It allows us to write the following code:

 

var num = 50; 
var str = "simple string";
var obj = new myType();
var numbers = new int[] {1,2,3};
var dic = new Dictionary<int,myType>();

The compiler would generate the same IL as if we compiled:

int num = 50; 
string str = "simple string";
myType obj = new myType();
int[] numbers = new int[] {1,2,3};
Dictionary<int,myType> dic = new Dictionary<int,myType>();

Note that there is no un-typed variable reference nor late-binding happening, instead the compiler is inferring and declaring the type of the variable from the right-hand side of the assignment. As a result, the var keyword is generating a strongly typed variable reference.