Here is a broad list of changes to Indian Kanoon that I am planning for the next release (hopefully out by July end 2014)
1. Remove duplicate judgment. Many high courts are like Uttarakhand, Jharkhand, Orissa and Bombay are publishing the same judgment on different URLs. Perhaps every case number has a new URL even when cases are combined together in a judgment.
2. Improve the judgment layout. It will involve removing page numbers and case numbers sprinkled in between the text. It will also involve improving the layout and paragraph detection algorithm.
3. Improve the pdf output. Currently IK uses htmldoc for generating pdf output. But it has limitations on the fonts we can use. So plan is to move to Pisa.
4. Add judgments from Consumer Courts at national and state levels. District consumer forums in Maharasthra, Gujarat and Karnataka are using local languages that are using custom embedded fonts in the PDF. As the text is not in unicode, it will require lot more effort to reverse engineer the fonts for any text processing. So district consumer courts will not be added.
5. Improve titles for Bombay, Kolkata, Andhra and Kerala high court. Since these courts do not provide meta information, the petitioner and extracted from the free text. Currently there are lot of error in these extractions and the plan is to improve them.
1. Remove duplicate judgment. Many high courts are like Uttarakhand, Jharkhand, Orissa and Bombay are publishing the same judgment on different URLs. Perhaps every case number has a new URL even when cases are combined together in a judgment.
2. Improve the judgment layout. It will involve removing page numbers and case numbers sprinkled in between the text. It will also involve improving the layout and paragraph detection algorithm.
3. Improve the pdf output. Currently IK uses htmldoc for generating pdf output. But it has limitations on the fonts we can use. So plan is to move to Pisa.
4. Add judgments from Consumer Courts at national and state levels. District consumer forums in Maharasthra, Gujarat and Karnataka are using local languages that are using custom embedded fonts in the PDF. As the text is not in unicode, it will require lot more effort to reverse engineer the fonts for any text processing. So district consumer courts will not be added.
5. Improve titles for Bombay, Kolkata, Andhra and Kerala high court. Since these courts do not provide meta information, the petitioner and extracted from the free text. Currently there are lot of error in these extractions and the plan is to improve them.