patternjavaMinor
Matrix and Complex Numbers Parser
Viewed 0 times
parsernumbersandmatrixcomplex
Problem
I have been working on a simple calculator project in Java since November 2014 as hobby. Now I want to add complex number support and Matrices. I also wants to add support for symbolic computation in future. For this purpose I wrote a Literal class which is meant for to hold all types like Matrix class objects and Complex class objects in one class and Variables in future. Currently I am dealing with expressions like this
So far I have written tokenizer that tokenize this type of expressions in to tokens of type functions, expression brackets, functions brackets, matrices, constants and numbers.
To verify syntax and parse Complex Numbers and Matrices from token I have written LiteralParser class which parse and return Literal Object in case of success and if fails throws NumberFormatException. I could use regular expression but I think that will be slow.
I have also tried Apache ComplexFormat and RealMatrixFormat. Apache ComplexFormat does a weird thing that it treats "2i" as "2+0i"
and Apache RealMatrixFormat does not support expressions on indices and it is only for real numbers.
Now I want that my code should be less nested by utilizing some functions and some performance enhancements. if is there any already existing library that does this job then let me know.
LiteralParser.java
```
package com.kmstudios.evaluator;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.ArrayList;
public class LiteralParser
{
private final MatrixFormat format;
private final Method evaluator;
public LiteralParser(MatrixFormat format, Method m)
{
this.format = format;
this.evaluator = m;
}
public void parse(String literal) throws NumberFormatException, InvocationTargetException, IllegalAccessException
{
if (literal == null || literal.isEmpty()) throw new NumberFormatException("cannot parse empty literal");
//If format is null then
sin(2+3i*[[2+3,4][9,average(3,5,9)]]*9.8*PI)So far I have written tokenizer that tokenize this type of expressions in to tokens of type functions, expression brackets, functions brackets, matrices, constants and numbers.
To verify syntax and parse Complex Numbers and Matrices from token I have written LiteralParser class which parse and return Literal Object in case of success and if fails throws NumberFormatException. I could use regular expression but I think that will be slow.
I have also tried Apache ComplexFormat and RealMatrixFormat. Apache ComplexFormat does a weird thing that it treats "2i" as "2+0i"
and Apache RealMatrixFormat does not support expressions on indices and it is only for real numbers.
Now I want that my code should be less nested by utilizing some functions and some performance enhancements. if is there any already existing library that does this job then let me know.
LiteralParser.java
```
package com.kmstudios.evaluator;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.ArrayList;
public class LiteralParser
{
private final MatrixFormat format;
private final Method evaluator;
public LiteralParser(MatrixFormat format, Method m)
{
this.format = format;
this.evaluator = m;
}
public void parse(String literal) throws NumberFormatException, InvocationTargetException, IllegalAccessException
{
if (literal == null || literal.isEmpty()) throw new NumberFormatException("cannot parse empty literal");
//If format is null then
Solution
From the tenor of your question, I'm going to just give some high-level suggestions of approaches you might use to make your code simpler and more manageable, rather than an in-depth line-by-line review.
Performance
You expressed a few concerns about performance and mentioned design decisions you'd taken for performance reasons.
I wouldn't worry about parser performance too much: partly because premature optimization is sadness; partly because in this kind of application, parser input is generally small and parsing is relatively infrequent; and partly because optimizers are generally quite good at optimizing well-structured code.
Unless you are using your calculator in some surreally absurd way, parser performance is not something you should be worrying about. So I would suggest that you not make your life more complicated for the sake of performance.
Parsing balanced expressions
You have a lot of special-case code and state flags to handle being in particular parts of the expression (like
All of this stuff would be easier to manage with a less ad-hoc approach to parsing. A recursive descent parser would probably be the easiest approach to get started with, but generally thinking about the grammar of your input language and looking at "standard" approaches to parsing should lead you in a helpful direction.
I'm sorry if the above seems a little vague---it's quite a large and broad point. But I would suggest you read up on parsing.
Parsing numbers
Numeric literals (in every context I can think of, and certainly as you have implemented them) form a regular language. You will probably find that the easiest way to parse them is either to use (simple!) regular expressions, or an explicit state machine.
There are a number of standard implementation techniques for implementing finite-state machines; the state pattern is one obvious choice, although for a language this simple I would consider a simple conditional/switch statement implementation.
Input representation
Your parser currently expects to be given a string which exactly contains a literal. This means that you effectively have to parse everything twice: once to determine the boundaries of the literal, and once to determine the meaning of the literal. This has a minor performance impact, but more critically, means that you have to smear multiple versions of your parsing code all over the place.
I would expect that you would find things more straightforward if you allowed parsing functions to greedily consume as many characters as they felt appropriate from a "stream" of input characters (either an actual stream, or some arrangement with string+offset, with each parsing function updating the offset). It should not be too difficult to arrange your language such that greedily consuming characters is always correct (although you might need a few characters of lookahead).
Evaluation
You have a little throwaway comment about an evaluator that takes a String and produces a String. I suspect it would work better if the evaluator took either a string or a parsed representation, and returned some richer object as a result (e.g., a
Performance
You expressed a few concerns about performance and mentioned design decisions you'd taken for performance reasons.
I wouldn't worry about parser performance too much: partly because premature optimization is sadness; partly because in this kind of application, parser input is generally small and parsing is relatively infrequent; and partly because optimizers are generally quite good at optimizing well-structured code.
Unless you are using your calculator in some surreally absurd way, parser performance is not something you should be worrying about. So I would suggest that you not make your life more complicated for the sake of performance.
Parsing balanced expressions
You have a lot of special-case code and state flags to handle being in particular parts of the expression (like
isOpen, various checks that prev is a particular kind of character, etc).All of this stuff would be easier to manage with a less ad-hoc approach to parsing. A recursive descent parser would probably be the easiest approach to get started with, but generally thinking about the grammar of your input language and looking at "standard" approaches to parsing should lead you in a helpful direction.
I'm sorry if the above seems a little vague---it's quite a large and broad point. But I would suggest you read up on parsing.
Parsing numbers
Numeric literals (in every context I can think of, and certainly as you have implemented them) form a regular language. You will probably find that the easiest way to parse them is either to use (simple!) regular expressions, or an explicit state machine.
There are a number of standard implementation techniques for implementing finite-state machines; the state pattern is one obvious choice, although for a language this simple I would consider a simple conditional/switch statement implementation.
Input representation
Your parser currently expects to be given a string which exactly contains a literal. This means that you effectively have to parse everything twice: once to determine the boundaries of the literal, and once to determine the meaning of the literal. This has a minor performance impact, but more critically, means that you have to smear multiple versions of your parsing code all over the place.
I would expect that you would find things more straightforward if you allowed parsing functions to greedily consume as many characters as they felt appropriate from a "stream" of input characters (either an actual stream, or some arrangement with string+offset, with each parsing function updating the offset). It should not be too difficult to arrange your language such that greedily consuming characters is always correct (although you might need a few characters of lookahead).
Evaluation
You have a little throwaway comment about an evaluator that takes a String and produces a String. I suspect it would work better if the evaluator took either a string or a parsed representation, and returned some richer object as a result (e.g., a
Literal).
Character identification
You have a lot of code like "c == '{' || c == '[' || c == '('" etc. I expect that you would find it easier to work with sets of characters, either represented using Java Sets, or, for convenience (with small sets) strings; then you could just ask openBrackets.contains('c').
On a similar note, a function like closingBracketFor(char openingBracket) that returned ) for (, } for {`, etc. would simplify various things.Context
StackExchange Code Review Q#119485, answer score: 5
Revisions (0)
No revisions yet.