class responsible for parsing the provided text into tokens and tracking information about the current token. More...
#include <tokenizer.hpp>
Public Member Functions | |
tokenizer (std::istream &in) | |
~tokenizer () | |
const token & | next_token () |
Reads characters off of in_ to return the next token type and its value. More... | |
const token & | current_token () const |
const std::string & | textdomain () const |
const std::string & | get_file () const |
int | get_start_line () const |
Private Types | |
enum | character_type { TOK_NONE = 0 , TOK_SPACE = 1 , TOK_NUMERIC = 2 , TOK_ALPHA = 3 } |
the different types of characters while parsing TOK_NONE is also the default for anything beyond standard ascii More... | |
Private Member Functions | |
tokenizer () | |
void | next_char () |
increments the line number if the current character is a newline set current_ to the next character that's not \r More... | |
void | next_char_skip_cr () |
set current_ to the next character skip the \r in the \r\n Windows-style line endings the test_cvs_2018_1999023_2.cfg file also uses \r\n line endings for some reason - otherwise that check isn't needed on non-Windows platforms since \r characters are removed from cfg files on upload More... | |
int | peek_char () |
return the next character without incrementing the current position in the istream More... | |
character_type | char_type (unsigned c) const |
bool | is_space (int c) const |
bool | is_num (int c) const |
bool | is_alnum (int c) const |
void | skip_comment () |
handles skipping over comments (inline and on a separate line) as well as the special processing needed for #textdomain and #line More... | |
bool | skip_command (char const *cmd) |
Returns true if the next characters are the one from cmd followed by a space. More... | |
Private Attributes | |
int | current_ |
int | lineno_ |
int | startlineno_ |
std::string | textdomain_ |
std::string | file_ |
token | token_ |
buffered_istream | in_ |
std::array< character_type, END_STANDARD_ASCII > | char_types_ |
class responsible for parsing the provided text into tokens and tracking information about the current token.
can also track the previous token when built with the DEBUG_TOKENIZER compiler define. does not otherwise keep track of the processing history.
Definition at line 98 of file tokenizer.hpp.
|
private |
the different types of characters while parsing TOK_NONE is also the default for anything beyond standard ascii
Enumerator | |
---|---|
TOK_NONE | |
TOK_SPACE | |
TOK_NUMERIC | |
TOK_ALPHA |
Definition at line 178 of file tokenizer.hpp.
tokenizer::tokenizer | ( | std::istream & | in | ) |
Definition at line 20 of file tokenizer.cpp.
References c, char_types_, END_STANDARD_ASCII, in_, next_char_skip_cr(), buffered_istream::stream(), t, TOK_ALPHA, TOK_NONE, TOK_NUMERIC, and TOK_SPACE.
tokenizer::~tokenizer | ( | ) |
Definition at line 45 of file tokenizer.cpp.
References in_, and buffered_istream::stream().
|
private |
|
inlineprivate |
Definition at line 186 of file tokenizer.hpp.
References c, char_types_, END_STANDARD_ASCII, and TOK_NONE.
Referenced by is_alnum(), is_num(), and is_space().
|
inline |
Definition at line 109 of file tokenizer.hpp.
References token_.
|
inline |
Definition at line 126 of file tokenizer.hpp.
References file_.
|
inline |
Definition at line 131 of file tokenizer.hpp.
References startlineno_.
|
inlineprivate |
Definition at line 201 of file tokenizer.hpp.
References c, char_type(), and TOK_SPACE.
Referenced by next_token().
|
inlineprivate |
Definition at line 196 of file tokenizer.hpp.
References c, char_type(), and TOK_NUMERIC.
Referenced by skip_comment().
|
inlineprivate |
Definition at line 191 of file tokenizer.hpp.
References c, char_type(), and TOK_SPACE.
Referenced by next_token(), skip_command(), and skip_comment().
|
inlineprivate |
increments the line number if the current character is a newline set current_ to the next character that's not \r
Definition at line 146 of file tokenizer.hpp.
References current_, lineno_, token::NEWLINE, and next_char_skip_cr().
Referenced by next_token().
|
inlineprivate |
set current_ to the next character skip the \r
in the \r\n
Windows-style line endings the test_cvs_2018_1999023_2.cfg file also uses \r\n
line endings for some reason - otherwise that check isn't needed on non-Windows platforms since \r
characters are removed from cfg files on upload
Definition at line 158 of file tokenizer.hpp.
References current_, buffered_istream::get(), and in_.
Referenced by next_char(), next_token(), skip_command(), skip_comment(), and tokenizer().
const token & tokenizer::next_token | ( | ) |
Reads characters off of in_ to return the next token type and its value.
Definition at line 51 of file tokenizer.cpp.
References token::CLOSE_BRACKET, token::COMMA, current_, token::DOLLAR, token::DOUBLE_QUOTE, token::END, token::EQUALS, INLINED_PREPROCESS_DIRECTIVE_CHAR, is_alnum(), is_space(), token::LEFT_ANGLE_BRACKET, lineno_, token::MISC, token::NEWLINE, next_char(), next_char_skip_cr(), token::OPEN_BRACKET, peek_char(), token::PLUS, token::POUND, token::QSTRING, token::RIGHT_ANGLE_BRACKET, skip_comment(), token::SLASH, startlineno_, token::STRING, token_, token::type, token::UNDERSCORE, token::UNTERMINATED_QSTRING, and token::value.
|
inlineprivate |
return the next character without incrementing the current position in the istream
Definition at line 169 of file tokenizer.hpp.
References in_, and buffered_istream::peek().
Referenced by next_token().
|
private |
Returns true if the next characters are the one from cmd followed by a space.
Skips all the matching characters. Currently only used by #textdomain (specified by the WML) and #line (added by the preprocessor)
Definition at line 202 of file tokenizer.cpp.
References current_, is_space(), and next_char_skip_cr().
Referenced by skip_comment().
|
private |
handles skipping over comments (inline and on a separate line) as well as the special processing needed for #textdomain and #line
Definition at line 222 of file tokenizer.cpp.
References current_, dst, file_, is_num(), is_space(), lineno_, token::NEWLINE, next_char_skip_cr(), skip_command(), and textdomain_.
Referenced by next_token().
|
inline |
Definition at line 121 of file tokenizer.hpp.
References textdomain_.
|
private |
Definition at line 224 of file tokenizer.hpp.
Referenced by char_type(), and tokenizer().
|
private |
Definition at line 138 of file tokenizer.hpp.
Referenced by next_char(), next_char_skip_cr(), next_token(), skip_command(), and skip_comment().
|
private |
Definition at line 218 of file tokenizer.hpp.
Referenced by get_file(), and skip_comment().
|
private |
Definition at line 223 of file tokenizer.hpp.
Referenced by next_char_skip_cr(), peek_char(), tokenizer(), and ~tokenizer().
|
private |
Definition at line 139 of file tokenizer.hpp.
Referenced by next_char(), next_token(), and skip_comment().
|
private |
Definition at line 140 of file tokenizer.hpp.
Referenced by get_start_line(), and next_token().
|
private |
Definition at line 217 of file tokenizer.hpp.
Referenced by skip_comment(), and textdomain().
|
private |
Definition at line 219 of file tokenizer.hpp.
Referenced by current_token(), and next_token().