11 February 2013

A little more about virtual functions

I wanted to share a piece of knowledge on virtual functions in C++, then I thought it might be useful to make a little introduction to virtual functions, then only come back to the main topic, so you can find an introduction in my previous post. So I consider that we already know what virtual functions are, what they are meant for, and how they may be implemented (not in much details).

C++ standard puts requirements for the behavior of virtual functions, but it does not specify how they should be supported. Nevertheless, this feature is usually implemented using virtual function tables (vtbl). vtbl is, simply explained, a table of function pointers. The compiler generates a vtbl for each polymorphic class, i.e. a class defining a virtual function or a class derived from the former. vtbl contains entries for all virtual functions available in the class, I say available, because virtual functions defined in upper level of inheritance hierarchy also have their place in vtbl. Let's discuss on an example:

class Base
{
public:
       virtual void func1();
       virtual void func2();
       virtual void func3();
       virtual void func4();
};

class Child : public Base
{
public:
       void func1() override;
       void func2() override;
       virtual void func5();
       virtual void func6();
};

class GrandChild : public Child
{
public:
       void func1() override;
       void func3() override;
       void func5() override;
       virtual void func7();
};

I missed the function bodies and all the other stuff of classes for brevity purpose. Thus, we have three classes - Base, Child and GrandChild, and 7 functions in total. The compiler will generate a virtual function table for each of these classes. Here is what the virtual function tables will contain for each class:

vtbl for ‘Base’
0
Base::func1
1
Base::func2
2
Base::func3
3
Base::func4

vtbl for ‘Child
0
Child::func1
1
Child::func2
2
Base::func3
3
Base::func4
4
Child::func5
5
Child::func6

vtbl for ‘GrandChild’
0
GrandChild::func1
1
Child::func2
2
GrandChild::func3
3
Base::func4
4
GrandChild::func5
5
Child::func6
6
GrandChild::func7

As we can see, each vtbl contains the correct addresses of functions for each virtual function in corresponding class. Here correct means the address of the corresponding function of the most derived class that defines/overrides that function. When we instantiate the class, the instance will contain a pointer to the virtual function table (vptr) of the instantiated class. When we call a virtual function on an object, the object always has the type of its class, so in any case the function defined in that very class will be called, that's why virtual functions do not considered when working with objects. When we store an object of a derived class within a pointer to a base class, then virtual functions "work". The call of a virtual function is somehow translated at run-time and the corresponding function from the virtual function table of the class of underlying object (not the pointer type) is chosen. So let's see several examples of calls:

Base *pb1 = new Child();
Base *pb2 = new GrandChild();
Child *pc = new GrandChild();

pb1->func1(); // calls Child::func1
pb2->func1(); // calls GrandChild::func1

pb1->func3(); // calls Base::func3
pb2->func3(); // calls GrandChild::func3

pc->func2(); // calls Child::func2
pc->func3(); // calls GrandChild::func3
pc->func4(); // calls Base::func4

Thus, what function is called depends on what type of object the pointer points to, which can't be known at compile-time, that is why the virtual function call is decided at run-time. The same works for references, because they also work with addresses, not objects. As I am testing my examples with Visual Studio, I will give the implementation details for Visual C++ compiler. It stores the vptr as the first member of object. So, for example, the first call from the example above is converted to something like this:

reinterpret_cast<void (*)()>(*(int *)*((int *)pb1 + 0))();

If we run this code, it will do exactly the same as the first call in the list above (pb1->func1()). So let's break it down into smaller pieces to understand what really happens there. Here is the extracted version of a virtual function call:

typedef void (*fptr)();

// vptr is aligned with the start of the object, so we can get the vptr
int *vptr = (int *)pb1;

// vptr points to the correct vtbl, so the address of the vtbl is
int *vtbl = (int *)*vptr;

// func1 is the first function in the vtable,
// so its offset is 0, as it is stored in vtbl
// offset may be stored in bytes also (+4 bytes for each pointer on 32-bit platform)
fptr func1 = reinterpret_cast<fptr>(*(vtbl + 0));

// call the function
func1();

As I already mentioned, the standard does not specify the implementation details, so this kind of code will be not only superfluous, but either non-portable. Anyway, I hope it gave a better understanding of how things turn over during the run-time. Thus, the virtual function call (by means of a pointer or a reference) is not an ordinary call and has a little performance overhead, which may turn into a huge if we have numerous calls.

We now know that a virtual function call via a pointer is converted to something else using vtbl and vptr. But someone should create them and correctly initialize. And this someone is the compiler for the virtual function table. Remember, I wrote in the last article, that if we do not define a body for a virtual function the compiler will complain even if we do not create any objects of that class. That is because it creates a virtual function table, but the function body is not there, and the compiler cannot initialize the corresponding function pointer. As for vptr, the compiler generates some extra code in the constructor of the class, which it usually adds before the code we write. Even if we do not define a constructor, the compiler generates a default one, and the initialization of vptr is there. So each time we create an object of a polymorphic class, vptr is correctly initialized and points to the vtbl of that class.

Let's see what happens when a class inherits from several polymorphic classes. Again it is implementation-dependent, though, generally, the picture is the following. In this case, the number of virtual function tables for a class is equal to the number of polymorphic base (immediate) classes. And each object of the class contains exactly as many vptr's as many polymorphic base (immediate) classes it has. Now in order for each function to know which sub-object it should use, the vtbl also stores the offset of the corresponding sub-object in the most-derived object. The rest of the story is actually the same thing as for single inheritance hierarchy, so I will not go further. Just to note, Visual C++ compiler stores the virtual functions of the most-derived class in the virtual function table of the first base, so the first vptr points to a virtual function table which has the function pointers for all virtual functions of first base class, plus, the function pointers for all virtual functions of the derived class itself.

What else should we know about virtual functions in C++? A virtual function can be pure. To declare the virtual function as pure we should assign 0 at the end of the declaration:

virtual void func() = 0;

A class containing a pure virtual function is called an abstract class. An abstract class cannot be instantiated, if we try to we will get a compile error. This is natural, because the idea of abstract class is abstractness, and we cannot have something that does not exist, i.e. is abstract. A class containing only pure virtual functions is known as an interface. Though, in C++ context interface is often referred to as the part of the class intended for access from other components (clients), i.e. the public member functions, and any other functions which accept an object of the class as an argument. A class which overrides all the pure virtual functions forms a concrete type and can be instantiated.
    A pure virtual function is usually used when the function body cannot be defined, not because of complex implementation, or long code, just because the class itself is not responsible for that function, and does not know what to do in that function, the class is only aware that it should do something. For example we can have a class 'Appliance', and a pure virtual function 'MakeSound'. We do not know what sound an appliance makes and how it makes it. But we know if the appliance is an audio player it just plays the music, if it is a refrigerator or a kettle, it will make some noise, etc. So we should override the 'MakeSound' function in each inheriting concrete appliance.
    Most often a pure virtual function will not have a body defined, though it is not restricted. Since we cannot create an instance of an abstract class, we cannot directly call that function, but we can call it in a child class, for example:


class Base
{
public:
       virtual void func() = 0;
};

void Base::func()
{
       // Do something very general, or output some message
}
    
class Child : public Base
{
public:
       void func() override
       {
              if (hasItsOwnBehaviour())
              {
                     // Child knows how to behave,
                     // so implement func here for Child
              }
              else
              {
                     Base::func();
              }
       }
};

Hooh, just a little more, and I'll finish. We still have not seen the case with special member functions, particularly, constructor and destructor. The constructor cannot be virtual, because it always needs the exact type of the object being created, i.e. the static and dynamic types are the same, and there is no sense in polymorphic behavior in this case. Also, we cannot have a pointer to constructor. If we want a polymorphic behavior when creating objects, we can take advantage of techniques used in design patterns like factory method and prototype.
    As opposed to constructor, the destructor can be and in most cases must be virtual. Suppose this example:

Base *pb = new Derived();
delete pb;

If the destructor of Base is not virtual, operator delete will call the destructor of Base and then free the allocated memory. This means that Derived object will not be fully destructed. For example if the constructor of Derived allocates a chunk of memory, it won't be freed, because only the destructor of Base will be called. To solve this problem we just need to declare the destructor of Base as virtual. In that case the correct destructor (~Derived) will be executed. So if you are designing a polymorphic type, always declare the destructor as virtual.
    The destructor also can be declared as pure virtual. In this case the class will become an abstract one, and cannot be instantiated. But in this case, it should always define a body for the destructor, because an object of a more derived class will call this destructor when being destructed itself, so the body is mandatory for a pure virtual destructor. A pure virtual destructor may be needed when we do not have any pure virtual functions in the class, but we want to specify and emphasize that the class is abstract.
    If we call a virtual function inside a class constructor the function defined in that class or if not there, in the nearest base which defines it, will be called. And it is logical, because in the constructor the object may not be fully constructed, i.e. if we create an object of a more derived class, there are still other constructors in the queue to be executed to completely construct the object. So, first, vptr may not be correctly initialize (pointing to incorrect vtbl of a less-derived class), and second, the members of the more-derived class that the virtual function is going to use may not be yet initialized.
    The same works for destructors: if we call a virtual function in destructor, the local version of the function will be executed. Again, if it worked as ordinary virtual function call, it may access some members of a more-derived class object, which may have been already destroyed, because the destructors of more derived classes are called first. Here is a little example demonstrating the virtual function calls inside the constructor and destructor:

class Base
{
public:
       Base() : data(2)
       {
              f();
       }
      
       virtual ~Base()
       {
              f();
       }

       virtual void f()
       {
              data = 7;
       }

protected:
       int data;
};

class Derived : public Base
{
public:

       Derived() : extra_data(10)
       {
       }

       virtual void f()
       {
              data = 17;
              extra_data = 20;
       }

private:
       double extra_data;
};

int main()
{
       {
              Derived d; // data = 7, extra_data = 10.0
       }
       return 0;
}

When 'f' is called from the Base constructor, the Derived part of the object does not exist yet, that's why the 'Base::f' is executed and 7 is assigned to 'data'. If you put a breakpoint in the destructor of Base and run, then step into the 'f' call, again you will see that 'Base::f' is called, this time because the Derived part of the object has already been destroyed.

I think I covered some interesting points. There is still a lot to talk about virtual functions, some usages, some nuances... anyway, I have come to a logical end, and I want to stop here. Thanks for your time, I am very glad if you have learnt anything new from this article. Until the next post.





3 comments:

  1. This is a really good post! Thx!

    Another good page:

    C++ Pure virtual functions example

    ReplyDelete
  2. I am happy that you liked it.

    ReplyDelete
  3. No Deposit Bonus Casino | Real Money Slots for 2021 - Dr
    No Deposit Bonus Casino 문경 출장안마 – Win Real 양주 출장마사지 Money Playing For 속초 출장샵 Free! No Deposit Bonus Casino 안양 출장안마 Online is 충주 출장샵 an online gambling platform. The casino offers players

    ReplyDelete