HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Are HTML and CSS regular languages?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
cssarelanguagesregularandhtml

Problem

I have a question whether or not CSS and HTML are regular languages.

I believe CSS is a regular language, since it should be possible to create a regular expression to match the structure of CSS.

However, I believe that HTML is not a regular language since you have nested attributes that could be defined recursively.

Solution

Providing a regular expression or DFA for a language and proceeding to demonstrate that it is correct for the language in question generally constitutes a pretty convincing argument for the regularity of the language. To prove a language is non-regular, you have a few options: the pumping lemma for regular languages is a classic standby, but the Myhill-Nerode theorem is pretty nifty, too (especially since you can use it to prove regularity as well as non-regularity).

Your first job should be to define what constitute valid strings in the languages $HTML$ and $CSS$. Note that most browsers will take anything you call $HTML$ and display something without error; in that sense, anything is valid $HTML$, and the language $\Sigma^*$ is trivially regular. However, I think it's safe to assume that you have something a bit more strict in mind: $HTML$ consists only of those strings that conform to some standard, which probably calls for matched tags. In this case, you should be able to prove that $HTML$ isn't regular, since you can consider the following kinds of strings (whitespace is added only for clarity and could be removed in the real string):


   
      
         
            
               
                  ...
               
            
         
      
   


This language, ` $($ $)^n($ $)^n$, should be enough to get to a proof of non-regularity, provided you accept the above as valid $HTML$, and would say that a missing ` tag would make the resulting string invalid $HTML$, even though browsers would try to render it anyway.

For $CSS$, first figure out what strings you think are valid, then try to come up with a regular expression or DFA for the set of all $CSS$ strings, or figure out how to define a subset of $CSS$ that requires non-local checks and unbounded memory (counting, matching, nesting, etc.) If you can define such a subset (like we did for $HTML$ above), then you're good to go.

Code Snippets

<html>
   <body>
      <table>
         <tr>
            <td>
               <table>
                  ...
               </table>
            </td>
         </tr>
      </table>
   </body>
</html>

Context

StackExchange Computer Science Q#12867, answer score: 11

Revisions (0)

No revisions yet.