10 February 2013

A little about virtual functions

   Virtual function is the most common way to support one of the three concepts of object-oriented programming, namely, polymorphism. So what is polymorphism? Wikipedia says the following: polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. Maybe this is correct, but I'd rather formulate this in another way: polymorphism is a way to treat different objects the same way without worrying about their exact types.
    So how is polymorphism supported with virtual functions? To answer this, we should know what a virtual function is, and how it is different from ordinary functions. When talking about virtual functions we consider only member functions of a class, global or static member functions cannot be virtual, as they have nothing to do with objects. Suppose we have a class with two member functions:

class OrdinaryClass
{
public:
       void func1();
       void func2();
};

To declare a function virtual we need to add keyword virtual to the declaration. Let's do that for function 'func2'.

class PolymorphicClass
{
public:
       void func1();
       virtual void func2();
};

So what is the difference? From first sight nothing special. But, let's find out some differences between 'OrdinaryClass' and 'PolymorphicClass'. If we compile exactly these examples we will get an unresolved symbol error for the 'PolymorphicClass', even if we do not call 'func2', but in case of 'OrdinaryClass' we won't get an error if we don't call 'func2'. If we add an empty definition for 'func2' in 'PolymorphicClass' (to make the sample compile), create objects of both classes and just print the results of operator sizeof for both, the sizes will be different. C++ standard does not define the size of 'PolymorphicClass', but in most implementations it will be 4 bytes (32-bit system), and the size of 'OrdinaryClass' instance will be 1. So far we have discovered two differences, but I'll tell the reasons for these a little later.
    Let's define the 'func2' in both classes to see if there is any difference in behaviour:

class OrdinaryClass
{
public:
       void func1();
       void func2()
       {
              std::cout << "OrdinaryClass :: func2 " << std::endl;
       }
};

class PolymorphicClass
{
public:
       void func1();
       virtual void func2()
       {
              std::cout << "PolymorphicClass :: func2 " << std::endl;
       }
};

void main()
{
       OrdinaryClass obj1;
       PolymorphicClass obj2;

       obj1.func2(); // OrdinaryClass :: func2
       obj2.func2(); // PolymorphicClass :: func2
}


If you run this piece of code you will see that both objects behave the same way. The difference comes to stage when we inherit from polymorphic class. Suppose we have polymorphic class 'Base' with single virtual function and a derived class 'Derived' which redefines the mentioned function:

class Base
{
public:
       virtual void Do()
       {
              std::cout << "Base::Do" << std::endl;
       }
};

class Derived : public Base
{
public:
       virtual void Do()
       {
              std::cout << "Derived::Do" << std::endl;
       }
};

void main()
{
       Base b1;
       Derived d1;

       b1.Do(); // Base::Do
       d1.Do(); // Derived::Do

       Base *pb = new Base();
       pb->Do(); // Base::Do
       Derived *pd = new Derived();
       pd->Do(); // Derived::Do

       Base *pb2 = new Derived();
       pb2->Do(); // Derived::Do
}

The outputs of each call of 'Do' is provided in the comment of corresponding line. As we can see, all the calls but the last one result in an expected behavior. The last one executed the function defined in 'Derived', though we called the function on a pointer to 'Base' type. If 'Do' was not declared as virtual the last line should have printed 'Base::Do'.
    So this is how virtual member functions differ from ordinary member functions: when calling a virtual function using a pointer to an object, the function defined in the most derived type is executed. The same is correct for references, because they store the address of the object as the pointers. Redefining virtual function in a derived class is called overriding. C++11 standard introduced a new keyword override to explicitly mention the intent of redefinition of already existing virtual function. The derived class is not obliged to override virtual functions defined in the immediate or upper base classes. In that case calling that function on a pointer to base class pointing to an object of most derived class will execute the function of the deepest derived class in the hierarchy overriding that function.

Which implementation of function to call is determined at run-time, not at compile-time, because the compiler cannot know what type of object the pointer points to, for example:

int num = GetObjectTypeNumber();

Base *ptr = 0;

if (num == 1)
{
       ptr = new Base();
}
else
{
       ptr = new Derived();
}

ptr->Do();

Thus, the function call should be somehow redirected at run-time. So how can the environment know which function definition to execute? Actually C++ standard does not put any requirements on the implementation details of the virtual functions, anyway, most of the compilers do this way: for each polymorphic (containing a virtual function) class a special data structure is created called virtual function table (vtbl), which stores pointers to the functions that should be executed for all the virtual functions defined in the class. This table is created per class. So if we have hundreds of objects of the same polymorphic class, the virtual function table is one. And each instance of a polymorphic class contains a pointer, known as virtual pointer (vptr), pointing to the vtbl of that class. I will explain how the virtual function table is created and what it contains in another article. Just keep in mind that each call of a virtual function on a pointer or a reference is converted to another call of some function from virtual function table through virtual pointer.
    Seems now we know the reasons for both firstly discovered differences between a polymorphic and non-polymorphic classes:
  • the instances differ in sizes because polymorphic class contains an additional pointer (vptr),
  • polymorphic class definition does not compile without the virtual function definition, because it should be used in the virtual function table initialization.
And finally, we started with polymorphism, but did not give a single example. I prefer the most renowned example, one with the shapes. So consider we have a base 'Shape' class and we derive 'Ellipse', 'Rectangle' and 'Triangle' from 'Shape'. 'Shape' has a virtual function 'Draw()', which is overridden in three sub-classes. So now we can keep a list of Shape objects and call Draw() on each of them, without worrying about exact type of the objects, and the correct Draw() will be called for each one:

class Shape
{
public:
       virtual void Draw()
       {
              // do something generic for all shapes
       }
private:
       // generic shape data
};

class Ellipse : public Shape
{
public:
       void Draw() override
       {
              // Draw ellipse using data
       }
private:
       // Ellipse-specific data
};

class Rectangle : public Shape
{
public:
       void Draw() override
       {
              // Draw rectangle using data
       }
private:
       // Rectangle-specific data
};

class Triangle : public Shape
{
public:
       void Draw() override
       {
              // Draw triangle using data
       }
private:
       // Triangle-specific data
};

void main()
{
       std::list<Shape*> listOfShapes;

       listOfShapes.push_back(new Ellipse());
       listOfShapes.push_back(new Rectangle());
       listOfShapes.push_back(new Triangle());

       std::for_each(listOfShapes.begin(), listOfShapes.end(),
[] (Shape *s) {s->Draw();});
}


If we add implementations instead of comments and run the code, we will see that for each shape the correct version of 'Draw()' is executed. This is the concept of polymorphism, we seem to be doing the same thing, but actually different things happen, and on the other hand we treat different types of objects the same way.

This was just a simple introduction about virtual functions in C++, I will cover more advanced features like pure virtual functions, abstract classes and interfaces, virtual destructors, etc.; answers to some questions like why virtual functions work only for pointers and references, can we call a virtual function in a constructor or destructor, etc.; and some compiler-specific implementation details in my next article, which I am going to post very soon. Thanks for your time and interest.



No comments:

Post a Comment