Hard to Find Papers - PDF Requests

Translating another language is another puzzle but probably google translate has an API you can hit which would allow that to be implemented programmatically.

Here’s the Elderly paper translated to a raw txt format using Google tesseract

EDERY 1972.txt (46.8 KB)

The txt file could be reformatted back to a pdf for easier reading and presumably a non-english paper could be translated sentence by sentence after parsing the text from the pdf.

This script below is pulled directly from the net, I just pasted in the name of the PDF and setup my Debian based OS to be able to run it.

# Import libraries
from PIL import Image
import pytesseract
import sys
from pdf2image import convert_from_path
import os
  
# Path of the pdf
PDF_file = "EDERY 1972.pdf"
  
'''
Part #1 : Converting PDF to images
'''
  
# Store all the pages of the PDF in a variable
pages = convert_from_path(PDF_file, 500)
  
# Counter to store images of each page of PDF to image
image_counter = 1
  
# Iterate through all the pages stored above
for page in pages:
  
    # Declaring filename for each page of PDF as JPG
    # For each page, filename will be:
    # PDF page 1 -> page_1.jpg
    # PDF page 2 -> page_2.jpg
    # PDF page 3 -> page_3.jpg
    # ....
    # PDF page n -> page_n.jpg
    filename = "page_"+str(image_counter)+".jpg"
      
    # Save the image of the page in system
    page.save(filename, 'JPEG')
  
    # Increment the counter to update filename
    image_counter = image_counter + 1
  
'''
Part #2 - Recognizing text from the images using OCR
'''

# Variable to get count of total number of pages
filelimit = image_counter-1
  
# Creating a text file to write the output
outfile = "out_text.txt"
  
# Open the file in append mode so that 
# All contents of all images are added to the same file
f = open(outfile, "a")
  
# Iterate from 1 to total number of pages
for i in range(1, filelimit + 1):
  
    # Set filename to recognize text from
    # Again, these files will be:
    # page_1.jpg
    # page_2.jpg
    # ....
    # page_n.jpg
    filename = "page_"+str(i)+".jpg"
          
    # Recognize the text as string in image using pytesserct
    text = str(((pytesseract.image_to_string(Image.open(filename)))))
  
    # The recognized text is stored in variable text
    # Any string processing may be applied on text
    # Here, basic formatting has been done:
    # In many PDFs, at line ending, if a word can't
    # be written fully, a 'hyphen' is added.
    # The rest of the word is written in the next line
    # Eg: This is a sample text this word here GeeksF-
    # orGeeks is half on first line, remaining on next.
    # To remove this, we replace every '-\n' to ''.
    text = text.replace('-\n', '')    
  
    # Finally, write the processed text to the file.
    f.write(text)
  
# Close the file after writing all the text.
f.close()
4 Likes

I think it’s a website where you can get people in different countries to do something at your request for some money.

4 Likes

Nice!

Yup. You can get all kinds of services from all over

https://www.fiverr.com/search/gigs?query=translate%20german%20to%20english

Probably worth doing something like that in certain instances

4 Likes

https://py-googletrans.readthedocs.io/en/latest/

There’s even already a python library that can leverage googles translate API, which from the sounds of it will gladly take in whole documents.

3 Likes

http://dx.doi.org/10.1038/s41586-021-04296-3

Is anybody subscribed to Nature?

Deepl.com is the best translation engine I know off. Translates whole pdfs and word documents while keeping the formatting. A trial pro account does it all and they have an app. Specializing in European languages.

2 Likes

Looking to get a pdf of this paper, anyone have access?

https://doi.org/10.1111/jvp.12896

Here you go!

s41586-021-04296-3.pdf (7.9 MB)

2 Likes

Right on!

1 Like

i am entirely shocked they are still in business

1 Like

Energy use in cannabis. Anyone got access to this publication?

Sci-hub (https://sci-hub.se/DOI-HERE) typically has access to most papers. Sometimes very recent papers may not be accessible (like those published the same month as the sci-hub search). If it doesn’t have login credentials for specific journals, there are other ways to get the papers.

I read this paper in 2012 and have a copy in my drive. Unfortunately, it’s not super impressive.

1 Like

anyone have this?

1 Like

@pdxcanna

Novel Δ8-Tetrahydrocannabinol Vaporizers Contain Unlabeled Adulterants, Unintended Byproducts of Chemical Synthesis, and Heavy Metals
https://www.researchgate.net/profile/Jiries-Meehan-Atrash/publication/356941066_Novel_D8-Tetrahydrocannabinol_Vaporizers_Contain_Unlabeled_Adulterants_Unintended_Byproducts_of_Chemical_Synthesis_and_Heavy_Metals/links/61b3a23d1d88475981df2178/Novel-D8-Tetrahydrocannabinol-Vaporizers-Contain-Unlabeled-Adulterants-Unintended-Byproducts-of-Chemical-Synthesis-and-Heavy-Metals.pdf

Here is the full-text supporting information:

Analytical methodology, structural identifications, relevant spectra, and full ICP-MS data

2 Likes

The study @pdxcanna just asked for isn’t available on Sci-hub. Either it’s too new, or sci-hub doesn’t have the login credentials to access the journal.

So, in that case, a good trick is to search for the paper in Google Scholar, then click “All x downloads” to see every link in Google Scholar’s database. In many cases, one of those links will be or bring you to the full text.

Another thing to try is searching on ResearchGate and europepmc.org directly. Europe MC is an archive of life sciences journal literature; sometimes, they have full-text available.

Here is how I found that study for @pdxcanna using Google Scholar, which isn’t on Sci-hub right now:

1. Search on Google Scholar for the title for DOI. Then click on the “All x versions” link:

2. Open each link. In this case, I found two versions of the full text (in yellow):

3. One version is the final edition published by the journal (and is on ResearchGate), and the other is the author’s manuscript (from europepmc.org).

The ResearchGate link is in the post above.

Here’s the Europe PMC link (and here’s a direct link to the author’s manuscript version):

7 Likes

Also, in the case of the paper @pdxcanna asked for, the full-text link was on the PubMed link he shared. PubMed Central (PMC) by US NIH and other journal literature archive services (like Europe PMC, PLOS ONE, etc.) will often provide links to the free full-text of the paper:

2 Likes

Fuck. Appreciate it, jumped the gun thinking it was more current.

Try sci-hub.st to find papers. If they don’t have it, you can request papers by providing them with link.
I usually upload my papers on researchgate. I hate the what these publishing companies do. They don’t pay anything to the researchers who does all the hard work. And then they sell that paper, making money off someone else’s work. I even get emails to review papers for free. And I’m like what are you guys doing then?

1 Like

Hey, check this out for chemistry books

:slight_smile: