Here is a broad list of changes to Indian Kanoon that I am planning for the next release (hopefully out by July end 2014)
1. Remove duplicate judgment. Many high courts are like Uttarakhand, Jharkhand, Orissa and Bombay are publishing the same judgment on different URLs. Perhaps every case number has a new URL even when cases are combined together in a judgment.
2. Improve the judgment layout. It will involve removing page numbers and case numbers sprinkled in between the text. It will also involve improving the layout and paragraph detection algorithm.
3. Improve the pdf output. Currently IK uses htmldoc for generating pdf output. But it has limitations on the fonts we can use. So plan is to move to Pisa.
4. Add judgments from Consumer Courts at national and state levels. District consumer forums in Maharasthra, Gujarat and Karnataka are using local languages that are using custom embedded fonts in the PDF. As the text is not in unicode, it will require lot more effort to reverse engineer the fonts for any text processing. So district consumer courts will not be added.
5. Improve titles for Bombay, Kolkata, Andhra and Kerala high court. Since these courts do not provide meta information, the petitioner and extracted from the free text. Currently there are lot of error in these extractions and the plan is to improve them.
1. Remove duplicate judgment. Many high courts are like Uttarakhand, Jharkhand, Orissa and Bombay are publishing the same judgment on different URLs. Perhaps every case number has a new URL even when cases are combined together in a judgment.
2. Improve the judgment layout. It will involve removing page numbers and case numbers sprinkled in between the text. It will also involve improving the layout and paragraph detection algorithm.
3. Improve the pdf output. Currently IK uses htmldoc for generating pdf output. But it has limitations on the fonts we can use. So plan is to move to Pisa.
4. Add judgments from Consumer Courts at national and state levels. District consumer forums in Maharasthra, Gujarat and Karnataka are using local languages that are using custom embedded fonts in the PDF. As the text is not in unicode, it will require lot more effort to reverse engineer the fonts for any text processing. So district consumer courts will not be added.
5. Improve titles for Bombay, Kolkata, Andhra and Kerala high court. Since these courts do not provide meta information, the petitioner and extracted from the free text. Currently there are lot of error in these extractions and the plan is to improve them.
1 comment:
Please create some sort of system in which spammers IP addresses are blocked on the forums page. There could be some very good discussions and serious change could be possible if people just talked about things through objectively.
Ps: I am sick of reading how this baba and that baba is going to paste his magical jadibooty on my forehead and that will bring back the love of my life.
Post a Comment