Data Hiding in C

Stephen FriederichsApril 20, 201317 comments

Strictly speaking, C is not an object-oriented language. Although it provides some features that fit into the object-oriented paradigm it has never had the full object-oriented focus that its successor C++ offers. C++ introduced some very useful concepts and abilities that I miss when I’m developing in ANSI C. One such concept is protected member variables and functions.

When you declare a class in C++ you can also declare member variables and functions as part of that class. Often, these members contain and manipulate internal information about the state of the class and aren’t meant to be accessed or modified by code external to the class. C++ allows you to declare these member variables in a way that code outside the class has limited access to them. There are various levels: the variable may be completely hidden, it may be read-only or it could be fully-accessible but only to other specific code (a “friend” class) that needs to access it. Hiding internal implementation details in this way is considered good practice for several reasons: it produces cleaner, better organized code, it improves security and stability by preventing code (malicious or otherwise) from corrupting the internal state of an object and it allow and separates a class’s interface (which should ideally change as little as possible) and its implementation (which may change due to bugs, security issues, performance improvements, etc.). C doesn’t allow the same level of control as C++ but there is still a way to hide implementation details in C.

I’ve uploaded an example of a First-In-First-Out (FIFO) style stack to the code snippets section. In the real world there would be a header file advertising the public functions and variables for the stack and a source file that contains the implementation. For the sake of the example I had to put everything in the same ‘file’. The comments show where the header and source parts begin and end. However, to demonstrate the fact that the internal implementation can change while the public interface remains the same there are two separate source files in the code separated by preprocessor defines. One implementation uses a stack that grows up, the other grows down. The header file is the same for both implementations. The point to take home is that if you relied on accessing the internal structure of the stack object you might eventually be disappointed if the internal implementation changes.

There are two features C offers to help hide data. The first feature is that you can declare a pointer to a struct that hasn’t yet been defined. This is used in the example to declare a typedef for the stack type that is a pointer to a stack object struct without declaring the stack object struct. The struct type is declared inside the source file. Since the header file doesn’t contain the struct definition other programmers don’t know the internal structure of the type and can’t manipulate the struct directly. Instead they have to rely on the public functions declared in the header file.

The other feature used in this example is the use of the typing system in C to ensure that stack objects can’t be modified directly. The stack type is defined as a pointer to a const stack struct object. This ensures that a programmer can’t accidentally modify the stack object without angering the compiler. Structs in C are just different variables stored together in memory. The pointer to the struct is simply the address of the first variable in the struct. A programmer could accidentally modify the contents of the struct by assigning a value to the de-referenced stack object. By defining the stack type as pointing to a const stack struct, the compiler will complain if a programmer tries to directly modify the stack object.

This isn’t without drawbacks. It forces all of the functions that need to write to the object to recast the object as a non-const pointer. By the same token, any programmer who desires to modify the internals of the object can simply cast the pointer to the same type. As a safeguard against a unintentional modifications (malicious or otherwise) the stack struct contains a canary variable. This entry is set to a predetermined value when the object is initialized and every function that accesses the object checks it to ensure that it’s set to the correct value. When the object is destroyed the canary is cleared to ensure that if there are any pointers to the object remaining, any calls made with the pointer will fail.

One final note - C offers the capability to hide functions as well as data. Any functions declared as a static are to be used only within that file. The compiler enforces this rule. In the provided example, the _stack_valid function can only to be used within the stack.c file. If you attempt to use it outside of that file the compiler will complain. The _stack_valid function is a private function for the stack class.

Obviously this approach doesn’t have the versatility of C++. You can’t define members of objects as accessible by friend classes or even declare object members as public. However, this approach lets you hide the internal workings of classes to ensure that anyone else who uses your code utilizes only the publicly-available interface and doesn’t fool around with the internal implementation of your of your objects. This allows you to enjoy the benefits of data hiding without needing to use C++.

[ - ]
Comment by SFriederichsApril 21, 2013
You may not always have the choice to use C++. For instance, not all microcontrollers have a C++ compiler. Alternately, the codebase you're working on may be C and can't be changed for whatever reason. This isn't borne out of a fanatical desire to avoid C++ whenever I can - this is a useful technique for C that one day you may have to utilize.
[ - ]
Comment by SFriederichsApril 23, 2013
@Clifford: I guess I'm averse (averse, not 'foaming at the mouth opposed') to using C++ for the same reason I don't automatically use dynamic memory allocation in my embedded software: I prefer to keep my solutions simple because more complexity allows for more opportunities for error. For one, so many embedded applications are so simple and one-off that there's no real need for most of the features in C++ - C (or even assembly) will do just fine . Also, I have much more experience with C and am more familiar with the various quirks of the C compilers for the microcontrollers I use. I've found that there's so much wiggle room in the C standard that you can't expect any two C compilers to produce the same machine code. Being familiar and proficient with your tools saves time and frustration. I like many features of C++ and use it when the situation calls for it but 90% of the time the added benefits using C++ don't (for me in particular) outweigh the proficiency and familiarity I have with C in the environments I program in.

I realize that you argue that I only need to use the features of C++ that I need and I agree. The thing is, I only need the features of C++ that are C most of the time. So most of the time I just use C.
[ - ]
Comment by wingnutApril 22, 2013
There are no C++ toolchains for the 8/16bit PIC and AVR microcontrollers that I have used. The const struct and canary flag used in this approach will help prevent unintentional coding bugs, not malicious code, but that was the stated intent. Always good to know what can be done with the tools you have.
[ - ]
Comment by CliffordOctober 25, 2014
Lets be clear, however my argument was primarily against the implication that you might use this technique to *avoid C++*. Using this technique when C++ is not an option I entirely support.

It is true that Microchip's compiler does not support C++, even on PIC24 where their C31 compiler appears to in fact be a port of GCC. But Microchip appear to have an allergy to C++ - even their MIPS based PIC32 compiler lacks C++ support. Low end PICs barely support an ISO compliant C compiler, so you can see how C++ might be a problem. Despite the arguable suitability, there do exist C++ compilers for low-end PICs.

AVR on the other hand has an instruction set explicitly designed to support C and C++. IAR's AVR compiler supports the EC++ subset.

[ - ]
Comment by CliffordApril 21, 2013
All good practice perhaps, but you say...

- "This allows you to enjoy the benefits of data hiding without needing to use C++."

,,, when it is actually far simpler and cleaner to simply use C++.

If C++ is not your "favourite language" how is C any better? C++ has all the tools of C and then some. The critical thing is that:

a) you don't have to use all the tools, and
b) If you don't use C++ specific features, the code is no more or less efficient that the same code compiled as C; i.e. you only pay for what you use.

If there are features of C++ you dislike, you do not wish to use, carry unacceptable overhead or even you are just not confident using, then simply do not use them - any subset of C++ you choose has got to be better than C, even if that subset is just the C subset.

My point is that when you say "This allows you to enjoy the benefits of data hiding without needing to use C++." you rather imply that that is a good thing, while in fact you have simply created a largely unnecessary chore implementing a feature better supported by a different language. If you have a C compiler you very often also have a C++ compiler in the same toolchain, so lack of a suitable C++ compiler is seldom an issue. The techniques described in this article are perhaps helpful when that is not true, but should perhaps be used when C++ is not an option rather than simply in order to avoid C++.
[ - ]
Comment by hroptatyrMay 9, 2013
Data hiding in C++ by means of class declarations is hardly the same. With the illustrated approach you also stay *binary* compatible. Changing private (or otherwise declared) members in a class will nearly always result in an incompatible ABI.
[ - ]
Comment by pirgoMay 27, 2013
1. C++ is not compatible with C, it's compatible only with C89. Big difference.
2. C is small and consise, C++ has so many features that you can say it is over-engineered, bloat. Small is beautiful.
3. D and GO are better alternatives then language that parasites on C.
[ - ]
Comment by CliffordOctober 25, 2014
@hroptatyr : No, it is not the same. My point is that it is a language supported mechanism that is more powerful and syntactically simpler than the hoops you have to jump through in C to achieve data hiding.
[ - ]
Comment by CliffordOctober 25, 2014
@pirgo : It is not clear what you mean by "compatible". It is true that C++ is not an exact superset of C - that is true even of C89 - there have always been semantic differences between the two. I am not sure how that is an argument against utilising C++ for the purposes of achieving encapsulation.

C and C++ are "compatible" with respect to interoperability - regardless of standard version; you can link a C object code to C++ code, and can provide a C linkage interface to C++ code. That is to say you can use both as true C and C++ object code. In most cases well formed ISO C90 code is also valid C++ code. C99 differs in a number of ways most of which are also compatible with C++ (some of which were derived from C++ in fact). The one major difference in C99 is VLA, but in very restricted embedded systems you would be unwise to use these in any case.

w.r.t. your second point; you either disagree with my point, or never read it. C++ carries no overhead over equivalent C code.

Your third point may well be true, but these languages are somewhat niche and hardly ubiquitous. I cannot see them as viable choices in embedded systems.

Your points boil down to:

* "C++ is not C" - true, but so what?

* "C++ results in bloat" - not true - it supports features that can be expensive; in particular templates - so avoid those features. In the context of this article; my point was that data hiding is free in C++ and has no more overhead than teh proposed C solution while being far simpler.

* "There are better languages" - perhaps, but none of them viable for use in embedded systems or with sufficient cross-target tool support in terms of compilers, debuggers, libraries etc. to make them practical as productivity tools. Unless you are perhaps deploying Linux or WIndows on an embedded target, your defacto language choice will remain C, C++ and Assembler, with perhaps Ada and Forth in some cases.

However none of your points argue against the point made here that you can hide data in C, but it is simpler to do it in C++. It is probably also simpler in D and Go, but hardly relevant in this context. Surely as engineers we are just trying to get work done efficiently? Let me know how well your next embedded Go project on a bare-metal MegaAVR goes (no pun intended)!
[ - ]
Comment by CliffordApril 21, 2013
addendum to point b) [...] you only pay for what you use, and much of it (including data hiding) is in fact free - having no run-time overhead.
[ - ]
Comment by CryptomanApril 21, 2013
I also use C predominantly and do not prefer to use C++ unless I really have to. However, C++ is not only about data hiding. Its philosophy is based on "object oriented" design. Encapsulation, which is mentioned here is one of the strengths of object oriented design. Other advantages of object oriented designs are polymorphism (i.e. one interface, multiple methods) and inheritance (i.e. an object acquiring the properties of another one).

One can implement all features of C++ using C but the problem with that approach is the amount of code one has to write to achieve that. C++ offers all those benefits in a tidy and easily accessible way.

For someone who is used to C (like myself), it is quite difficult and daunting to get used to object oriented design. However, I think it is beneficial to give C++ a try at least to appreciate its benefits.

[ - ]
Comment by CliffordApril 23, 2013
@Wingnut: Both GNU and IAR on AVR support C++. Microchip seem strangely averse to C++ even on PIC24 and PIC32 where their compilers are GNU based but for some reason they have disabled C++ compilation. Third party compilers support C++ on PIC24 and PIC32, lower end PIC architectures do have C++ compilers but I am not sure how effective they are - experimental perhaps? The PIC instruction set is not particularly suited to either C or C++ compilation unfortunately.

@SFreidrichs; I agree that a C++ compiler is not always available, but the phrase "[...]without needing to use C++" suggests a decision not to use C++ rather than lace of availability. Had it said "[...] when C++ is not an option", I would not have raised the issue.

@Cryptonman: C++ supports both OO and procedural programming paradigms; you do not need to use all of C++ or even OOP to benefit from many of its features, and there is a "half-way house" of "Object Based Programming" (OBP) - although the term is admittedly ill-defined.

Coding in C after C++ is like having someone steal half your tools.

[ - ]
Comment by jms_nhApril 24, 2013
"Microchip seem strangely averse to C++ even on PIC24 and PIC32 where their compilers are GNU based but for some reason they have disabled C++ compilation."

??? You're behind the times and/or have a typo. Microchip released C++ compilers for 32-bit PICs (PIC32) in October 2012. (http://www.microchip.com/pagehandler/en-us/press-release/microchips-free-mplab-xc32-com.html) The PIC24 and dsPIC33 (not PIC32) lines, which have gcc-based compilers, do not.

"The PIC instruction set is not particularly suited to either C or C++ compilation unfortunately." -- Alas, I think you may have answered your own question. I don't know the exact reasons why we don't have a C++ compiler for 16-bit PICs (yes, I work at Microchip); I would have to guess that it has more to do with debugging facilities rather than the core code generation functions of a compiler. The development tool chain is not just a compiler and linker, it's also an IDE and a debugger, and just because you compile a line of C/C++ code into assembly instructions doesn't mean the compiler is done with its job.

"Coding in C after C++ is like having someone steal half your tools."

Well put -- although I think it's more like having to run a 3-legged race against people who can sprint. I miss C++, primarily for its namespace encapsulation.
[ - ]
Comment by CliffordOctober 25, 2014
@jms_nj : Bit how long after the launch of th3 PIC32 did Microchip finally realise that users might want C++? It is perhaps Microchip that are "behind the times".
[ - ]
Comment by florinApril 24, 2013
It would have been nice to reference existing designs that use this pattern and not assume all the credit for this. GLIB and GTK+ are employing this technique (without claiming credit) for more than 10 years now: https://developer.gnome.org/glib/2.34/glib-data-types.html .
[ - ]
Comment by stephanebApril 24, 2013
For me, the article doesn't come through as if the author is "assuming all the credit" for this technique, although I agree that some references to other examples could have been nice.
[ - ]
Comment by jillesApril 27, 2013
This approach can also be useful in C++ because it avoids the need to recompile everything and the various ways to bypass access control such as templates. Combining the hidden struct with OO features results in a two-part object (the class's only field being a pointer to the private data); this is known as the pimpl pattern and is used frequently in C++ libraries that care about binary compatibility such as Qt.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: