Next: , Previous: Lexer, Up: Top



Hash Nodes

When cpplib encounters an “identifier”, it generates a hash code for it and stores it in the hash table. By “identifier” we mean tokens with type CPP_NAME; this includes identifiers in the usual C sense, as well as keywords, directive names, macro names and so on. For example, all of pragma, int, foo and __GNUC__ are identifiers and hashed when lexed.

Each node in the hash table contain various information about the identifier it represents. For example, its length and type. At any one time, each identifier falls into exactly one of three categories:

The same identifiers share the same hash node. Since each identifier token, after lexing, contains a pointer to its hash node, this is used to provide rapid lookup of various information. For example, when parsing a #define statement, CPP flags each argument's identifier hash node with the index of that argument. This makes duplicated argument checking an O(1) operation for each argument. Similarly, for each identifier in the macro's expansion, lookup to see if it is an argument, and which argument it is, is also an O(1) operation. Further, each directive name, such as endif, has an associated directive enum stored in its hash node, so that directive lookup is also O(1).