I just started working for a software company and my first task is to write a program that converts HTML documents into plain text. Besides the text that I'm interested in, HTML documents also contain formatting tags. Any formatting tag in my documents begins with a start label in form <...> (where by ... I represented the name of the tag and its possible attributes) and it ends in an end-label in form </...> (where ... is the name of the tag). What is between the start and end label can be simple text or text formatted using other formatting tags. Plain text not contained between formatting tags can also show up in my documents. Document tag formatting is correct (a tag will not be closed if there is a tag inside it that has not been closed), and no other label can appear inside a label.
Task
Seeing as I couldn't do it, write a program that converts an HTML document into plain text.
Input Data
Input file convert.in contains the HTML document.
Output Data
Output file convert.out will contain the plain text that is in the HTML document, structured in lines and with the same spacing.
Constraints and Statements
convert.in | convert.out |
<html><head><title>vasile</title></head> |
vasile |
Time limit: 0.1 seconds/test
prof. Emanuela Cerchez
Computer Science
High School "Grigore Moisil" Iasi
Contact:emanuela.cerchez@gmail.com