Why We Use Web Components

https://voltomedia.com/fresh/wp-content/uploads/Why-We-Use-Web-Components.png December 28, 2019 nullprogram.com/blog/2019/12/28/ This article was discussed on Hacker News and on reddit. Despite the goal of JSON being a subset of JavaScript — which it failed to achieve (update: this was recently fixed) — parsing JSON is quite unlike parsing a programming language. For invalid inputs, the specific cause of error is often counter-intuitive. Normally this doesn’t matter, but I recently ran into a case where it does. Consider this invalid input to a JSON parser: To a human this might be interpreted as an array containing a number. Either the leading zero is ignored, or it indicates octal, as it does in many languages, including JavaScript. In either case the number in the array would be 1. However, JSON does not support leading zeros, neither ignoring them nor supporting octal notation. Here’s the railroad diagram for numbers from the JSON specficaiton: Or in regular expression form: -?(0|[1-9][0-9]*)(.[0-9]+)?([eE][+-]?[0-9]+)? If a token starts with 0 then it can only be followed by ., e, or E. It cannot be followed by a digit. So, the natural human response to mentally parsing [01] is: This input is invalid because it contains a number with a leading zero, and leading zeros are not accepted. But this is not actually why parsing fails! A simple model for the parser is as consuming tokens from a lexer. The lexer’s job is to read individual code points (characters) from the input and group them into tokens. The possible tokens are string, number, left brace, right brace, left bracket, right bracket, comma, true, false, and null. The lexer skips over insignificant whitespace, and it doesn’t care about structure, like matching braces and brackets. That’s the parser’s job. In some instances the lexer can fail to parse a token. For example, if while looking for a new token the lexer reads the character %, then the input must be invalid. No token starts with this character. So in some cases invalid input will be detected by the lexer. The parser consumes tokens from the lexer and, using some state, ensures the sequence of tokens is valid. For example, arrays must be a well formed sequence of left bracket, value, comma, value, comma, etc., right bracket. One way to reject input with trailing garbage, is for the lexer to also produce an EOF (end of file/input) token when there are no more tokens, and the parser could specifically check for that token before accepting the input as valid. Getting back to the input [01], a JSON parser receives a left bracket token, then updates its bookkeeping to track that it’s parsing an array. When looking for the next token, the lexer sees the character 0 followed by 1. According to the railroad diagram, this is a number token (starts with 0), but 1 cannot be part of this token, so it produces a number token with the contents “0”. Everything is still fine. Next the lexer sees 1 followed by ]. Since ] cannot be part of a number, it produces another number token with the contents “1”. The parser receives this token but, since it’s parsing an array, it expects either a comma token or a right bracket. Since this is neither, the parser fails with an error about an unexpected number. The parser will not complain about leading zeros because JSON has no concept of leading zeros. Human intuition is right, but for the wrong reasons. Try this for yourself in your favorite JSON parser. Or even just pop up the JavaScript console in your browser and try it out: Firefox reports: SyntaxError: JSON.parse:…

Like to keep reading?

This article first appeared on nullprogram.com. If you'd like to keep reading, follow the white rabbit.

View Full Article

Leave a Reply