HiveBrain v1.2.0
Get Started
← Back to all entries
snippetjavaMinor

Read text file filled with numbers and tranfer to an array

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
filetranferreadwitharraytextnumbersfilledand

Problem

This is my ProcessDataFileParallel.Java file. I'm taking numbers from a .txt file and putting it into an array in Java. While it works, I would like to improve the code and possibly change some algorithms to better alternatives.

```
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;

public class ProcessDataFileParallel {

public static void main(String[] args) throws IOException {
int[] array = new int[100000];
int index = 0;
Scanner inputFile = null;
Scanner scan = new Scanner(System.in);
boolean running = false;

String emptyString;
FileReader file = new FileReader("dataset529.txt");
BufferedReader br = new BufferedReader(file);
emptyString = br.readLine();

try {
while ((emptyString = br.readLine()) != null) {
array[index++] = Integer.parseInt(emptyString);

}
} finally {
if (br.readLine() == null)
br.close();
}

inputFile = new Scanner(file);

// Read File from Dataset529.txt
if(inputFile != null){
System.out.println("Number of integers in: " + file);

try{
while (inputFile.hasNext() && inputFile.hasNextInt()){
array[index] = inputFile.nextInt();
index++;
inputFile.next();
}

}finally{
inputFile.close();
}

// Print dataset529.txt to Array in Java with For Loop below.
for (int ai = 0; ai " + " " + array[ai]);
}

System.out.println("How many Threads?"); // Scanner - Ask user how many threads

int n = scan.nextInt(); // Create variable 'n' to handle whatever integer the user specifies. nextInt() is used for the scanner to expect and Int.

Thread[] myThreads = new Thread[n]; // Variable n placed here to determine how many Threads user wanted
Worker[] workers = new Worker[n]; // Variable N placed here - the nu

Solution

Bug

If the array length is not a multiplier of the number of threads, you'll miss the remainder! Quick example:

public static void printRange(int length, int n) {
    int range = length / n;
    for (int index = 0; index < n; index ++){
        int start = index * range;
        int end = start + range;
        System.out.printf("[%s, %s)%n", start, end);
    }
}

// Output for printRange(10, 3):
[0, 3)
[3, 6)
[6, 9)

// Output for printRange(10, 6):
[0, 1)
[1, 2)
[2, 3)
[3, 4)
[4, 5)
[5, 6)


As you can see, you'll miss the 10th element for printRange(10, 3), and up to half the array for printRange(10, 6).

edit: A simplistic approach is to spin a new Thread to handle the remaining elements, but as you can see from the second example, you will end up a relatively high number of five elements for it, whereas the first five threads are only processing (and returning) one element. An alternative approach is to add an extra element per Thread so that you have more threads doing slightly more work, than one single thread doing most of the work.

Multi-threading

The preferred way of doing multi-threading since Java 5 is to use an ExecutorService to help you manage the lifecycle of threads. You should read up more on Oracle's tutorial to understand how they are used.

In addition, since you want each thread to compute and return a result for you, you should be looking at the Callable interface. Implementations override the call() method to return an appropriate result.

try-with-resources

If you are on Java 7 and above, you should use try-with-resources for efficient handling of the underlying I/O resource:

public static void main(String[] args) {
    try (Scanner scanner = new Scanner(System.in)) {
        // ...
    }
}


Hard-coding array length

int[] array = new int[100000];


What happens if your file has more than this number of elements? Actually, you also seem to be processing your file twice, once using the BufferedReader approach, and the second using a Scanner. Is this expected?

edit:

Getting the max value

Your current implementation forgets to actually compare the max of each partition with each other to arrive at the desired result.


Is this meant to be more like an academic exercise on multi-threading? - myself

The reason why I'm asking is that while it looks like you're looking more at a map-reduce approach to your problem, it may not even be necessary in the first place. Processing 100,000 integers (based on your int[] declaration) or less in a single thread is relatively fast enough on any modern hardware capable of running the JVM. If you are however trying to compare billions of numbers and/or given a (strangely low) memory constraint, then that's when multi-threading will help. However, your question will be quite different at that stage.

Getting the max value (part 2)

From Java 8 onwards, there are two approaches of running tasks asynchronously:

  • ExecutorService to create Futures, and then getting the results from each of them.



  • Chaining CompletableFutures (Java 8 onwards).



The code below demonstrates both:

public class AsyncMax {

    private static final int MAX = 10;

    public static void main(String[] args) {
        System.out.println("Using ExecutorService:");
        ExecutorService service = Executors.newWorkStealingPool();
        List> esResults = IntStream.range(0, MAX).parallel()
                .mapToObj(i -> service.submit(() -> produce(i).get()))
                .collect(Collectors.toList());
        esResults.stream()
                .map(f -> get(f))
                .max(Comparator.naturalOrder())
                .ifPresent(System.out::println);
        service.shutdown();
        System.out.println("Using CompletableFuture:");
        CompletableFuture> cfResults = IntStream.range(0, MAX).parallel()
                .mapToObj(i -> produce(i))
                .map(CompletableFuture::supplyAsync)
                .collect(allOf());
        cfResults.thenApply(results -> results.stream().max(Comparator.naturalOrder()))
                .join()
                .ifPresent(System.out::println);
    }
}


For both approaches, IntStream.range(0, MAX).parallel() is used to parallelize the production of MAX integers, via the produce(int) method (shown below). The intermediary esResults and cfResults variables are created solely to highlight where the asynchronous tasks will 'end', to be followed by how the results are collected. One can certainly daisy-chain the method calls completely.

  • ExecutorService:



  • Start an ExecutorService implementation, the Java 8 newWorkStealingPool() is an example here.



  • submit() a Callable that returns an Integer upon completion.



  • Collect the results to a List>, i.e. a List of Futures that return Integers.



  • Stream the List by get()-ting (method shown below) each resulting Integer.



  • From the Stream, call `max(Comparator.naturalOrde

Code Snippets

public static void printRange(int length, int n) {
    int range = length / n;
    for (int index = 0; index < n; index ++){
        int start = index * range;
        int end = start + range;
        System.out.printf("[%s, %s)%n", start, end);
    }
}

// Output for printRange(10, 3):
[0, 3)
[3, 6)
[6, 9)

// Output for printRange(10, 6):
[0, 1)
[1, 2)
[2, 3)
[3, 4)
[4, 5)
[5, 6)
public static void main(String[] args) {
    try (Scanner scanner = new Scanner(System.in)) {
        // ...
    }
}
int[] array = new int[100000];
public class AsyncMax {

    private static final int MAX = 10;

    public static void main(String[] args) {
        System.out.println("Using ExecutorService:");
        ExecutorService service = Executors.newWorkStealingPool();
        List<Future<Integer>> esResults = IntStream.range(0, MAX).parallel()
                .mapToObj(i -> service.submit(() -> produce(i).get()))
                .collect(Collectors.toList());
        esResults.stream()
                .map(f -> get(f))
                .max(Comparator.naturalOrder())
                .ifPresent(System.out::println);
        service.shutdown();
        System.out.println("Using CompletableFuture:");
        CompletableFuture<List<Integer>> cfResults = IntStream.range(0, MAX).parallel()
                .mapToObj(i -> produce(i))
                .map(CompletableFuture::supplyAsync)
                .collect(allOf());
        cfResults.thenApply(results -> results.stream().max(Comparator.naturalOrder()))
                .join()
                .ifPresent(System.out::println);
    }
}
private static Supplier<Integer> produce(int i) {
    return () -> {
        try {
            Thread.sleep((int) ((1 + Math.random()) * (MAX - i) * 500));
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException(e);
        }
        System.out.printf("Returning [%s] from Thread[%s]%n",
                i, Thread.currentThread().getName());
        return i;
    };
}

Context

StackExchange Code Review Q#109205, answer score: 4

Revisions (0)

No revisions yet.