Submit new task to executor after worker finishes

Multi tool use
Multi tool use


Submit new task to executor after worker finishes



I am working on a web crawler that visits a page and extracts the link to look for a specific domain, if it does not find it it views the extracted links and repeats until it hits a page limit or finds the page. I find myself struggling to come up with sound logic to have the bot continue to queue tasks after it extracts the links because the tasks are being completed quickly and not enough time is given to submit the newly extracted links. How could I go about implementing that the crawler wait until it has no more links before shutting down the executor? I have included a basic overview of my multi threading implementation. I set the max threads to 3, and submit example.com 10 times (Seed domains)



Spawn Thread visits the site and extracts the links then returns them to a string. My issue is that I need to be able to take those results and then put them into the queue. But the queue has already finished by that time. Any suggestions?



Update So to clarify, my issue is that when I submit a seed, and get the results, I cannot get it to continue searching the returned seeds.
Unless I block and wait for results and then add them in manually.



Update 2 To clarify a bit more, I am trying to prevent blocking from occurring on future.get so I can add the returned results as they
come to be scheduled as tasks.


future.get


int MaxThreads = 3;
ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(MaxThreads); // How many threads
List<Future<String>> resultList = new ArrayList<>();// Create results list

for (int i = 0; i < 10; i ++) {
SpawnThread task = new SpawnThread("example.com");// Create Tasks
Future<String> result = executor.submit(task);//Launch tasks
//System.out.println("Added " + CurrentNum + " to the que!");
resultList.add(result);//Store Task Result
}

for(Future<String> future : resultList) //Loop through results
{
String resultfinished;
try {
resultfinished = future.get();
System.out.println(resultfinished);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (ExecutionException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}
executor.shutdown();



I think what I need is a non blocking queue for the results that can
be added back in to the list that is giving new domains to crawl, but
I cannot seem to get it to work.


BlockingQueue queue = new ArrayBlockingQueue(1024);
Executor executor = Executors.newFixedThreadPool(4);
CompletionService<List<String>> completionService =
new ExecutorCompletionService<List<String>>(executor);
List<String> pagesToVisit = new ArrayList<String>();
Set<String> pagesVisited = new HashSet<String>();

String SeedPage = "https://example.com/";
String currentURL = null;

boolean done = false;
while(!done) {

int listsize = pagesToVisit.size();
if(pagesToVisit.isEmpty())
{
currentURL = SeedPage;
pagesVisited.add(SeedPage);
listsize = pagesToVisit.size() + 1;
}
else
{
currentURL = nextUrl();
}


for(int k = 0; k < listsize; k ++)
{

completionService.submit(new Spider(currentURL,"IP","PORT" ) {
});
}

int received = 0;
boolean errors = false;
while(received < listsize && !errors)
{
Thread.sleep(1000);
Future<List<String>> resultFuture = completionService.take(); //blocks if none available
try
{
List<String> result = resultFuture.get();
pagesToVisit.addAll(result);
received ++;
}
catch(Exception e)
{
//log
e.printStackTrace();
errors = true;
}
}

}




1 Answer
1



I'm not sure if I got you question right but



You can use awaitTermination(); method


awaitTermination();



public boolean awaitTermination(long timeout,
TimeUnit unit)
throws InterruptedException



Blocks until all tasks have completed execution after a shutdown
request, or the timeout occurs, or the current thread is interrupted,
whichever happens first.



Parameters: timeout - the maximum time to wait unit - the time unit of
the timeout argument



Returns: true if this executor terminated and false if the timeout
elapsed before termination



Throws: InterruptedException - if interrupted while waiting



For example


try{
executor.awaitTermination(5, TimeUnit.Seconds);
}catch(InterruptedException e)
{
// Catch block
}



shutdown() method does not wait for threads to complete



Initiates an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted. Invocation has no additional effect if already shut down.
This method does not wait for previously submitted tasks to complete execution.





So with this would I be able to submit a seed, then submit the links returned and do that x amount of times? Wouldn't blocking prevent new tasking being submitted?
– Kabone
Jun 30 at 12:50





After using shutdown you are not able to generate a new threads, doing awaitTermination() is enough once as it performs its action on the whole thread pool that it executed
– Athl1n3
Jun 30 at 16:07






Not sure that is what I am looking for. I am looking for a way to submit 1 task wait for the results which is a List<String>(of all the extracted links). From there I need to resubmit each one of the strings from the list as a task to find more links which would be returned as a List<String>. This process would continue until it hit a MaxPage count or ran out of links to continue to extract new links. I have it visiting the seed page to grab the List<String> but it does not see the new items in the list it needs to crawl over and therefor stops.
– Kabone
2 days ago





Then I think you'd look somewhere else than ExecutorService , try using Thread class with its join method I'd say
– Athl1n3
yesterday






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

eJTokei5Xk,B K,2q4,ry44apg,xsOI3jAm5OeGnJuva,w,Uhknp9mZ,Fr3fgLQC4ypC,dfBpBUiLDRbAbhj2pSS4mhf3Kv,JiM,66
7geJXCoZ,cujbelvbziP7 CO xmwkRgTniD3

Popular posts from this blog

Delphi Android file open failure with API 26

.

Amasya