A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://www.tutorialspoint.com/javaexamples/extract_content_from_xml.htm below:

How to extract content from an XML document using Java

How to extract content from an XML document using Java Problem Description

How to extract content from an XML document using java.

Solution

Following is the program to extract content from an XML document using java.

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.html.HtmlParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;

public class ExtractContentFromXMLDoc {
   public static void main(String[] args) throws IOException,SAXException, TikaException {
      
      //detecting the file type
      BodyContentHandler handler = new BodyContentHandler();
      
      Metadata metadata = new Metadata();
      FileInputStream inputstream = new FileInputStream(new File(
         "C:/tika/xmlExample.xml"));
      ParseContext pcontext = new ParseContext();

      //Html parser
      HtmlParser htmlparser = new HtmlParser();
      
      htmlparser.parse(inputstream, handler, metadata,pcontext);
      System.out.println("Contents of the document:" + handler.toString());
      System.out.println("Metadata of the document:");
      String[] metadataNames = metadata.names();

      for(String name : metadataNames) {
         System.out.println(name + ": " + metadata.get(name));
      }
   }
}
Input Output
Contents of the document: 
   Tanmay Patil 
   TutorialsPoint 
   (011) 123-4567   

Metadata of the document: 
Content-Encoding: windows-1252 
Content-Type: text/html; charset = windows-1252 

java_apache_tika


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4